of vol. 262, issue 5, pp. 455-459,1987 1987 by the inc. u.s.a. … · 2001-07-12 · dna...

5
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1987 by The American Society of Biological Chemists, Inc. Vol. 262, No. 1, Issue of January 5, pp. 455-459,1987 Printed m U.S.A. Determination of the Nucleotide Sequence for the Exonuclease I Structural Gene (sbcB) of Escherichia coli K12* (Received for publication, August 22, 1986) Gregory J. Phillips and Sidney R. KushnerS From the Department of Genetics, University of Georgia, Athens, Georgia 30602 The complete nucleotide sequence of the structural gene for Escherichia coli exonuclease I has been deter- mined. The coding region corresponds to a 465-amino acid protein with molecular weight of 53,174. The partial amino acid sequence of purified exonuclease I agrees with that predicted by the DNA sequence. Two putative weak promoters have been localized by S1 nuclease analysis. The sbcB coding sequence contains many non-optimal codons, characteristic of many poorly expressed E. coli genes. Exonuclease I, the product of the Escherichia coli sbcB gene, specifically catalyzes the degradation of single-stranded DNA. It acts processively in a 3‘ to 5’ direction, releasing mononu- cleotides (1). Two classes of exonuclease I-deficient mutants have been phenotypically characterized. sbcB mutations are able to indirectly suppress the deficiencies in both genetic recombination and DNA repair associated with recB and recC (exonuclease V) deficiency (2). The xonA mutation, however, is only able to suppress the sensitivity to DNA damaging agents, with the cells remaining recombinationally deficient (3). While the biochemical nature of the suppression is still poorly understood, it has been proposed that the loss of exonuclease I activity in recB and recC deficient strains allows an intermediate which is normally degraded to be utilized by the RecF recombinational pathway (2). In order to facilitate the purification and characterization of the exonuclease I protein, the sbcB genewas originally cloned on a 17-kb’ Hind111 fragment (4) and subsequently obtained on a smaller 7.6-kb EcoRI to BamHI fragment (5). Using this plasmid a new purification for exonuclease I was developed and the enzyme was physically characterized (6). Exonuclease I has a molecular weight of 55,000 and is active as a monomer (6). It has also been estimated that there are only 40-60 copies of exonuclease I protein per cell. Inthis report the complete nucleotide sequence of the exonuclease Istructural gene is presented along with the transcription start site as determined by SI nuclease analysis. Of particular interest are the high abundance of rare codons in the coding sequence along with a relatively poor promoter. * This work was supported in part by United States Army Research Fellowship D-AAG29-83-G-0111 (to G. J. P.) and National Institutes of Health Grant GM27997 (to S. R. K.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solelyto indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) J02641. $ To whom correspondence should be addressed. The abbreviations used are: kb, kilobase pair; bp, base pair; Pipes, 1,4-piperazineethanesulfonic acid. The validity of the open reading frame was confirmed by amino acid sequencing of the first 12 amino acids. The coding sequence contains only a limited numer of hydrophobic re- gions. EXPERIMENTAL PROCEDURES Bacterial Strains and Plosmids“SK4258 is C600 transformed with the runaway replication plasmid pDPK13 (5). pDPK2O is a pBR322 derivative plasmid containing the wild type sbcB gene (5). The M13 vectors mp18 and mp19 (7) were used for construction of all tem- plates, and JM103 (8) was used as thehost strain. Materials-All enzymes were purchased from Boehringer Mann- heim or New England Biolabs and used as specified by the manufac- turers. Radiochemicals ( [LY-~~SI~ATP and [T-~’P]ATP) were obtained from New England Nuclear. Agarose was obtained from FMC Corp.; acrylamide, bisacrylamide, ultra-pure urea, and ammonium persulfate from Bio-Rad; TEMED from Eastman-Kodak; antibiotics from Sigma; deoxynucleotides and dideoxynucleotides, 17-mer synthetic primer, and probe-primer from P-L Biochemicals. DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence, using the modifications recommended by Biggin et al. (10) for the use of [LY-~~SI~ATP. Templates were generated by a combination of forced subcloning into the M13 vectors mp18 and mp19 and the exonuclease 111 deletion procedure of Henikoff (11). SI Nuclease Mapping-A modification of the Berk and Sharp(12) procedure for S1 nuclease protection was used to determine the site of transcription initiation. RNA was extracted from SK4258 by the phenol method, employing modifications described by Markham et al. (13). End labeling of the 563-bp ClaI-EcoRI DNA fragment by T4 polynucleotide kinase and isolation of the radiolabeled fragment was as described by Maniatis et al. (14). 200 pg of RNA were hybridized with 80,000 cpm of labeled DNA fragment for 16 h at 52 “C, in a solution of 80% formamide, 100 mM Pipes, pH 6.8, 400 mM NaC1, and 10 mM Na’EDTA. The hybridization mixture was diluted into a 2 0 0 4 volume of S1 digestion buffer (100 mM sodium acetate, 300 mM NaCI, and 1 mM ZnSOJ, andthe RNA-DNA hybrids were digested with 500 units/ml of S1 nuclease (BM) for 30 min at 37 “C. Control experiments were done by incubating the identical 5’-end- labeled DNA fragment with 200 pg of tRNA under equivalent condi- tions prior to S1 digestion. No nonspecific protection was detected. Following S1 digestion, the RNA-DNA hybrids were phenol-ex- tracted, precipitated with 2 volumes of absolute ethanol, and sus- pended in 20 p1 of loading buffer (15) before electrophoresis on an 8% polyacrylamide/urea gel. Maxam and Gilbert (15) sequencing reactions were performed on the same 563-bp DNA fragment and run parallel to the S1 protection experiment. A correction factor of 1 to 1.5 bases (16) was used in determining the base at which transcription initiated. Nucleotide Sequence Analysis-The computer program DNASEQ (17), developed in the Department of Genetics, University of Georgia, was used for editing, translation, and homology searches. The Intel- ligenetics system was used for the hydropathy plot and for additional homology searches. Amino Acid Analysis-Exonuclease I protein was purified as pre- viously described (6). 100 picomoles of the protein was analyzed on an Applied Biosystems Amino Acid Sequencer. Other Methods-Strand-specific hybridization probes were made from M13 mp18 and mp19 derivative templates containing the 1.2- kb EcoRI-PuuII fragment (Fig. 1). Primer extension was carried out 455

Upload: others

Post on 05-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OF Vol. 262, Issue 5, pp. 455-459,1987 1987 by The Inc. U.S.A. … · 2001-07-12 · DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence,

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1987 by The American Society of Biological Chemists, Inc.

Vol. 262, No. 1, Issue of January 5, pp. 455-459,1987 Printed m U.S.A.

Determination of the Nucleotide Sequence for the Exonuclease I Structural Gene (sbcB) of Escherichia coli K12*

(Received for publication, August 22, 1986)

Gregory J. Phillips and Sidney R. KushnerS From the Department of Genetics, University of Georgia, Athens, Georgia 30602

The complete nucleotide sequence of the structural gene for Escherichia coli exonuclease I has been deter- mined. The coding region corresponds to a 465-amino acid protein with molecular weight of 53,174. The partial amino acid sequence of purified exonuclease I agrees with that predicted by the DNA sequence. Two putative weak promoters have been localized by S1 nuclease analysis. The sbcB coding sequence contains many non-optimal codons, characteristic of many poorly expressed E. coli genes.

Exonuclease I, the product of the Escherichia coli sbcB gene, specifically catalyzes the degradation of single-stranded DNA. It acts processively in a 3‘ to 5’ direction, releasing mononu- cleotides (1). Two classes of exonuclease I-deficient mutants have been phenotypically characterized. sbcB mutations are able to indirectly suppress the deficiencies in both genetic recombination and DNA repair associated with recB and recC (exonuclease V) deficiency (2). The xonA mutation, however, is only able to suppress the sensitivity to DNA damaging agents, with the cells remaining recombinationally deficient (3). While the biochemical nature of the suppression is still poorly understood, it has been proposed that the loss of exonuclease I activity in recB and recC deficient strains allows an intermediate which is normally degraded to be utilized by the RecF recombinational pathway (2).

In order to facilitate the purification and characterization of the exonuclease I protein, the sbcB gene was originally cloned on a 17-kb’ Hind111 fragment (4) and subsequently obtained on a smaller 7.6-kb EcoRI to BamHI fragment (5). Using this plasmid a new purification for exonuclease I was developed and the enzyme was physically characterized (6). Exonuclease I has a molecular weight of 55,000 and is active as a monomer (6). It has also been estimated that there are only 40-60 copies of exonuclease I protein per cell.

In this report the complete nucleotide sequence of the exonuclease I structural gene is presented along with the transcription start site as determined by SI nuclease analysis. Of particular interest are the high abundance of rare codons in the coding sequence along with a relatively poor promoter.

* This work was supported in part by United States Army Research Fellowship D-AAG29-83-G-0111 (to G. J. P.) and National Institutes of Health Grant GM27997 (to S. R. K.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) J02641.

$ To whom correspondence should be addressed. The abbreviations used are: kb, kilobase pair; bp, base pair; Pipes,

1,4-piperazineethanesulfonic acid.

The validity of the open reading frame was confirmed by amino acid sequencing of the first 12 amino acids. The coding sequence contains only a limited numer of hydrophobic re- gions.

EXPERIMENTAL PROCEDURES

Bacterial Strains and Plosmids“SK4258 is C600 transformed with the runaway replication plasmid pDPK13 (5). pDPK2O is a pBR322 derivative plasmid containing the wild type sbcB gene (5). The M13 vectors mp18 and mp19 (7) were used for construction of all tem- plates, and JM103 (8) was used as the host strain.

Materials-All enzymes were purchased from Boehringer Mann- heim or New England Biolabs and used as specified by the manufac- turers. Radiochemicals ( [ L Y - ~ ~ S I ~ A T P and [T-~’P]ATP) were obtained from New England Nuclear. Agarose was obtained from FMC Corp.; acrylamide, bisacrylamide, ultra-pure urea, and ammonium persulfate from Bio-Rad; TEMED from Eastman-Kodak; antibiotics from Sigma; deoxynucleotides and dideoxynucleotides, 17-mer synthetic primer, and probe-primer from P-L Biochemicals.

DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence, using the modifications recommended by Biggin et al. (10) for the use of [ L Y - ~ ~ S I ~ A T P . Templates were generated by a combination of forced subcloning into the M13 vectors mp18 and mp19 and the exonuclease 111 deletion procedure of Henikoff (11).

SI Nuclease Mapping-A modification of the Berk and Sharp (12) procedure for S1 nuclease protection was used to determine the site of transcription initiation. RNA was extracted from SK4258 by the phenol method, employing modifications described by Markham et al. (13). End labeling of the 563-bp ClaI-EcoRI DNA fragment by T4 polynucleotide kinase and isolation of the radiolabeled fragment was as described by Maniatis et al. (14). 200 pg of RNA were hybridized with 80,000 cpm of labeled DNA fragment for 16 h at 52 “C, in a solution of 80% formamide, 100 mM Pipes, pH 6.8, 400 mM NaC1, and 10 mM Na’EDTA. The hybridization mixture was diluted into a 2 0 0 4 volume of S1 digestion buffer (100 mM sodium acetate, 300 mM NaCI, and 1 mM ZnSOJ, and the RNA-DNA hybrids were digested with 500 units/ml of S1 nuclease (BM) for 30 min at 37 “C. Control experiments were done by incubating the identical 5’-end- labeled DNA fragment with 200 pg of tRNA under equivalent condi- tions prior to S1 digestion. No nonspecific protection was detected. Following S1 digestion, the RNA-DNA hybrids were phenol-ex- tracted, precipitated with 2 volumes of absolute ethanol, and sus- pended in 20 p1 of loading buffer (15) before electrophoresis on an 8% polyacrylamide/urea gel. Maxam and Gilbert (15) sequencing reactions were performed on the same 563-bp DNA fragment and run parallel to the S1 protection experiment. A correction factor of 1 to 1.5 bases (16) was used in determining the base at which transcription initiated.

Nucleotide Sequence Analysis-The computer program DNASEQ (17), developed in the Department of Genetics, University of Georgia, was used for editing, translation, and homology searches. The Intel- ligenetics system was used for the hydropathy plot and for additional homology searches.

Amino Acid Analysis-Exonuclease I protein was purified as pre- viously described (6). 100 picomoles of the protein was analyzed on an Applied Biosystems Amino Acid Sequencer.

Other Methods-Strand-specific hybridization probes were made from M13 mp18 and mp19 derivative templates containing the 1.2- kb EcoRI-PuuII fragment (Fig. 1). Primer extension was carried out

455

Page 2: OF Vol. 262, Issue 5, pp. 455-459,1987 1987 by The Inc. U.S.A. … · 2001-07-12 · DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence,

456 Exonuclease I of E . coli

as outlined by Hu and Messing (18). RNA dot blots were done as the 1 .2-H~ EcoRI-SmaI segment into the M13 vectors mp18 previously described (19). and mp19 (7). The probes were radiolabeled by the primer

RESULTS

Nucleotide Sequence of sbcB-We have previously reported the localization of the sbcB coding region (5), as shown in Fig. 1. The direction of transcription of sbcB was determined by hybridizing two strand-specific probes to total cellular RNA (data not shown). The probes were generated by subcloning

bph ‘ 460 ’ 860 ’ 12b0 Id00 I 20b0 ’ 24bO ’ I , . , . , . \ I

I I I c/o1 EcoRI Mlu! f k u n AVaI/SmclI PVUE

c c f c +j+”

+” +“ - - FIG. 1. Restriction map of the sbcB gene region and se-

quencing strategy. The protein coding region of the sbcB gene is shown by the hatched l ine . The direction of transcription is shown by the upper arrow. Each lower arrow represents the direction and length of readable DNA sequences determined by the Sanger dideoxy method (9).

TABLE I Amino acid composition of sbcB deduced from the nucleotide sequence

Total number of Codons: 465: total molecular weight: 53.174. Amino acid Total number

Ala Leu ASP Glu Arg Pro Val Asn Phe Thr Ile Gln G ~ Y LY s TYr His Ser Met Trp cvs

50 46 37 31 30 28 27 26 24 22 21 19 19 19 19 17 13 9 9 5

%I ATCGATCCGCGACTCCGACTAGAGATAACCCGTCATCAGCTTTGTCAGGCTGGCGGGATTGCGCTGTTGATGCTCATTACCGCCCTGAGGATCTAGACCGGTGGTGTAATTAATGATCAA

T A C C A G G A C C C G G C A T G A A T C T C T G G A G G C T C A G G T G A A A C G G C G ~ C

-400

-300 -200

AACAGCAAACCCTCAGGAGmCiUATAGCTGTTCmTTACGGAAATACCTTATGAACTGGCTGGAATAAGTGCAAG~TGTACCCTCTCATTTTTATCTGACATGATCTATTGCCA -100

C T C G C T G C C A A A T T G T G G C G C T A A A G ~ A G C A C G G T G A T A T T T ~ ~ G G C A G A C A G C A G A A A T A A C G G A T T T M C C T A A T G ATG AAT GAC GGT AAG CAA B A ??s

-35 -10 + I net % & AX Q &

50 CAA TCT ACC TTT TTG TTT CAC GAT TAC GAA ACC TTT GGC ACG CAC CCC GCG TTA GAT CGC CCT GCA CAG TTC GCA GCC ATT CGC ACC GAT

100

e Ser Thr Phe Leu Phe His Asp Tyr Glu Thr Phe Gly Thr His Pro Ala Leu Asp Arg Pro Ala Gln Phe Ala Ala I l e Arg Thr Asp

AGC a AAT GTC ATC QGC GAA CCC GAA GTC TTT TAC TGC AAG CCC GCT GAT GAC TAT TTA CCC CAG CCA GGA GCC GTA TTA ATT ACC F ~ ~ . 2. ~ ~ ~ l ~ ~ ~ i d ~ sequence of Ser Glu Phe Asn Val Ile Gly Glu Pro Glu Val Phe Tyr Cys Lys Pro Ala Asp Asp Tyr Leu Pro Gln Pro Gly Ala Val Leu Ile Thr

EcoRI 150 200

sbcB coding region and flanking GGT ATT ACC CCG CAG GAA GCA CCG GCG Iv\A GGA GAA M C GAA GCC GCG TTT CCC GCC CGT ATT CAC TCG CTT TTT ACC GTA CCG AAG ACC 250 300

quence corresponds to a protein of M, 53,174. The first 12 amino acids of the TGT ATT CTG GGC TAC AAC AAT GTC CGT TTC GAC CAC CAA GTC ACA CGC AAC ATT TTT TAT CCT AAT TTC TAC GAT CCT TAC GCC TGG AGC

qUenCeS. The deduced amino acid Se- ~ l y 11e Thr pro AI^ L~~ cly clU phe l le llis ser leu phe Thr val pro L~~ Thr

350 400

coding region which are underlined cor- Cys Ile Leu Gly Tyr Asn A m Val Arg Phe Asp Asp Glu Val Thr Arg Asn I l e Phe Tyr Arg As” Phe Tyr Asp Pro Tyr Ala Trp Ser

respond to the amino-terminal amino acids determined by analysis of purified exonuclease I protein. Two sites of tran- scriptional initiation, as identified by S1 nuclease protection analysis, are indi- cated by a dot at the +1 position. Two putative -10 promoter regions for these transcription start sites are boxed and are indicated by “A” and “B.” A -35 region corresponding to these two -10 regions is also indicated. A putative ri- bosome binding sequence (20) is under- lined starting at the +9 position. A re- gion of dyad symmetry, capable of form- ing a stem-loop structure, is underlined in the 3’-flanking sequences. This struc- ture is reminiscent of a rho-independent termination site (21). Restriction en- zyme recognition sites from Fig. 1 are also indicated.

TGG CAG CAT GAT AAC TCG CGC TGG GAT TTA CTG GAT GTT ATG CGT GCC TGT TAT GCC CTG CGC CCG GAA GGA ATA AAC TGG CCT GAA AI\T Trp Gln Hlr Asp A m Ser Arg Trp Asp Leu Leu Asp Val Met Arg Ala Cys Tyr Ala Leu Arg Pro Glu Gly Ile A m Trp Pro Glu Arn

450

500 GAT GiC GGT CTA CCG AGC TTT CGC CTT GAG CAT TTA ACC AAA GCG AAT GGT ATT dAA CAT AGC AAC GCC CAC GAT GCG ATG GCT GAT GTG Asp Asp Gly Leu Pro Ser Phe Arg Leu Glu HIS Leu Thr Lys Ala Asn Gly Ile Glu His Ser Asn Ala H I S Asp Ala net Ala ASP Val

550

TAC GCC ACT ATT ccQ ATG GCA AAG CTG GTA AAA A- CAG CCA CGC CTG TTT GAT TAT CTC T ~ T CCG TTG ATT GAT GTT CCG CAC ATG 600 MluI 650

Tyr Ala Thr I l e Ala Met Ala Lys Leu Val Lys Thr Arg Gln Pro Arg Leu Phe Asp Tyr Leu Phe Ala Leu Ile Asp Val Pro HIS net

AAA CCC CTG GTG CAC GTT TCC GGA ATG TTT GGA GCA TGG CGC GGC AAT ACC AGC TGG GTG GCA CCG CTG GCG TGG CAT CCT GAA AAT CGC Lys Pro Leu Val His Val Ser Gly net Phe Gly Ala Trp Arg Gly Asn T h m p Val Ala Pro Leu Ala Trp His Pro Glu Asn Arg

700 p v u I I 750

800 AAT GCC GTA ATT ATG GTG GAT TTG GCA GGA GAC ATT TCG CCA TTA CTG GAA CTG GAT AGC GAC ACA TTG CGC GAG CGT TTA TAT ACC GCA Asn Ala Val Ile Met Val Asp Leu Ala Gly Asp I l e Ser Pro Leu Leu Glu Leu Asp Ser Asp Thr Leu Arg Glu Arg Leu Tyr Thr Ala

850

AAA ACC GAT CTT GGC GAT AAC GCC GCC GTT CCG GTT AAG CTG GTG CAT ATC AAT AAA TGT CCG GTG CTG GCC CAG GCG AAT ACG CTA CGC Lys Thr Asp Leu Gly Asp Arn Ala Ala Val Pro Val Lys Leu Val HIS Ile AS” Lys Cys Pro Val Leu Ala Gln Ala Asn Thr Leu Arg

900

CCG GAA GAT GCC GAC CGA CTG GGA ATT AAT CGT CAG CAT TGC CTC GAT AAC CTG AAA ATT CTG CGT GAA AAT CCG CAA GAG CGC GAA AAA Pro Glu Asp Ala Asp Arg Leu Gly Ile Asn Arg Gln His Cys Leu Asp Asn Leu Lys Ile Leu Arg G h Asn Pro Glu Glu Arg Glu Lys

950 1000

GTG GTG GCG ATA TTC GCG GAA GCC GAA CCG TTT ACT CCT TCA GAT AAC GTG GAT GCA CAG CTT TAT AAC GGC TTT TTC AGT GAC GCA GAT Val Val Ala I l e Phe Ala Glu Ala Glu Pro Phe Thr Pro Ser Asp Asn Val Asp Ala Gln Leu Tyr Asn Gly Phe Phe Ser Asp Ala Asp

1050 1100

CGT GCA GCA ATG AAA ATT GTG CTG GAA ACC GAG CCG CGT AAT TTA CCG GCA CTG GAT ATC ACT m GTT GAT AAA CGG ATT GAA AAG CTG Arg Ala Ala M e t LyS I l e Val Leu Glu Thr Glu Pro Arg Asn Leu Pro Ala Leu Asp I l e Thr Phe Val Asp Lys Arg Ile Glu Lys Leu

1150 1200

TTG TTC AAT TAT CGG GCA CGC AAC TT-G ACG CTG GAT TAT GCC GAG CAG CAA CGC TGG CTG GAG CAC CGT CGC CAG GTC TTC ACG Leu Phe Asn Tyr Arg Ala Arg As” Phc Pro Gly Thr Leu Asp Tyr Ala Glu Gln Gln Arg Trp Leu Glu His Arg Arg Gln Val Phe Thr

SmaI 1250 I300

CCA GAG TTT TTG CAG GGT TAT GCT GAT GAA TTG CAG ATG CTG GTA CAA CAA TAT GCC GAT GAC AAA GAG AAA GTG GCG CTG TTA AAA GCA Pro Glu Phe Phe Gln Gly Tyr Ala Asp Glu Leu Gln Met Leu Val Gln Gln Tyr Ala Asp Asp Lys Glu Lys Val Ala Leu Leu Lyr Ala

1350

1400 CTT TGG CAG TAC GCG GAA GAG ATT GCT TAA TTTGAAGCCTCCCCGCTGGTA~ACAATGTTGAGTCAGGCTTTTTGAACGGTGATGCTCCAACTGCATGCCAAT Leu Trp Gln Tyr Ala G l u Glu I l e Ala STOP

1450 1500

Page 3: OF Vol. 262, Issue 5, pp. 455-459,1987 1987 by The Inc. U.S.A. … · 2001-07-12 · DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence,

Exonuclease I of E. coli 457

2 z I c >

0 n 2 n I

h ‘31

I I I I I I I I I I I I I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I I I I I I 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450

AMINO ACID NUMBER

FIG. 3. Hydropathy plot (22) for the sbcB coding region. The hydropathy plot was determined by the method of Kyte and Doolittle (22).

A A C G G A

G G+A T+C C SI

FIG. 4. Determination of the in vitro transcription start sites by S1 nuclease protection analysis. The 563-bp EcoRI-ClaI fragment, 5”end-labeled at the EcoRI site, was hybridized to 200 wg of total cellular RNA form SK4258, an exonuclease I overproducing strain (6). The hybridization conditions, S1 nuclease treatment, and electrophoresis were performed as described under “Experimental Procedures.” The SI-protected fragments were run in parallel with Maxam and Gilbert sequencing reactions (15) of the same ClaI-EcoRI fragment. The arrow indicates two potential transcription initiation sites.

extension procedure described under “Experimental Proce- dures.”

The DNA sequence of 1968 nucleotides was determined from the M13 clones shown in Fig. 1. Sets of nested deletions were produced by exonuclease I11 digestion of the templates employing the method of Henikoff (11). Using this procedure, greater than 95% of the gene was sequenced in both directions. Fig. 2 shows a potential ribosome binding site (20) located 9 nucleotides preceding an ATG codon which begins an open reading frame of 465 codons. In addition, a potential promoter sequence was also identified by visual inspection. No other

putative regulatory sequences, e.g. regions of dyad symmetry, were fcund upstream of the promoter region, consistent with the notion that sbcB is constitutively expressed. An additional feature of the sequence is a potential stem-loop struct.ure which is found between positions 1428 and 1449 following the open reading frame. This structure is potentially a rho-inde- pendent termination signal (21).

Amino Acid Sequence of sbcB-Fig. 2 also shows the pre- dicted amino acid sequence for a translational open reading frame of 1398 nucleotides. This open reading frame, which begins with an ATG initiation codon at position 25 and terminates with a TAA at position 1421, corresponds to a 465-amino acid protein with a molecular weight of 53,174. The predicted molecular weight is in close agreement with the experimentally determined value of 55,000 for the native protein (6) and 53,700 for the denatured polypeptide (5). The predicted amino acid composition of exonuclease I is shown in Table I.

To determine if the predicted amino acid sequence corre- sponded to that of the purified exonuclease I protein, the first 12 amino acid residues were determined as described under “Experimental Procedures.” These residues, as underlined in Fig. 2, corresponded to the predicted amino acid sequence.

I t is interesting to note that there are two ATG codons which initiate the open reading frame. Amino acid sequence analysis of the protein revealed that methionine is the amino- terminal amino acid. Presumably the first methionine is re- moved from the mature protein.

A hydropathy plot (22) of the exonuclease I sequence is shown in Fig. 3 and reveals several hydrophilic regions, con- sistent with its role as a nucleic acid specific enzyme.

Mapping the Transcriptional Start Site and Promoter Char- acterization-In order to determine the start of transcription of the sbcB gene, S1 nuclease protection experiments were done. Total cellular RNA was prepared from SK4258, a strain of E. coli which contains the cloned sbcB gene on a runaway replication plasmid (2). Previous measurements had shown that exonuclease I activity was amplified up to 400-fold in this strain. Total RNA was hybridized to a 563-bp Clal-EcoRI fragment, 5’ terminally labeled with [-y-:’ZP]ATP at the EcoRI site (Fig. 1). Fig. 4 shows the results of a t-ypical S1 protection experiment run parallel to a Maxam and Gilbert sequencing ladder. After consideration of the 1-1.5-bp correction factor necessary when comprising a DNA fragment with a Maxam and Gilbert seqdencing reaction (16), two potential sites of transcriptional initiation were identified. The same two pro- tected DNA fragments were also found in experiments using increased concentrations of S1 nuclease. Two putative -10 promoter regions upstream from these transcriptional start

Page 4: OF Vol. 262, Issue 5, pp. 455-459,1987 1987 by The Inc. U.S.A. … · 2001-07-12 · DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence,

458 Exonuclease I of E. coli

TABLE 11 Comparison of codon usage of sbcB

sbcB E . coli genes”

Codon Number % % Infrequently of synonymous synonymous used

codons codons used codons used codonsb

TTT (Phe) TTC TTA (Leu) TTG CTT CTC CTA CTG ATT (Ile) ATC ATA ATG (Met) GTT (Val) GTC GTA GTG TCT (Ser) TCC TCA TCG CCT (Pro) ccc CCA CCG ACT (Thr) ACC ACA ACG GCT (Ala) GCC GCA GCG TAT (Tyr) TAC

15 63 9 38 9 20 7 15 5 11 2 4 2 4

21 46 16 76 3 14 2 10 9 6 22 4 15 5 19

12 44 1 8 1 8 1 8 3 23 5 18 5 18 4 14

14 50 3 14

12 54 2 9 5 23 4 8

18 36 15 30 13 13 10 59 7 41

44 56 6 8 9 7 2

69 37 62 1

38 13 23 27 27 26 8

11 9 6

20 65 24 51 6

20 28 19 23 30 41 59

* * * * *

*

* * * *

* *

:h efficiently expressed E. coli genes sbcB E. coli genes“

Codon Number % % Infrequently of synonymous synonymous used

codons codons used codons used codonsb -

TAA (Stop) TAG TGA

CAT (His) CAC

CAA (Gln) CAG

ATT (Asn) ACC

AAA (LYS) AAG

GAT (Asp) GAC

GAA (Glu) GAG

T G T (Cys) TGC

TGG (TIT)

CGT (Arg) CGC CGA CGG

AGT (Ser) AGC

AGA (Arg) AGG

GGT (Gly) GGC GGA

1 0 0

6 7

6 13

15 11

13 6

27 10

22 9

3 2

9

11 15 1 3

1 6

0 0

5 6 7

46 54

32 88

58 42

68 32

73 27

71 29

60 40

38 50 3

10

8 47

0 0

26 31 37

88 4 8

39 61

27 73

24 76

77 23

51 49

73 27

42 58

58 35 2 3

6 22

1 0.25

48 41 5

*

*

*

*

* *

*

* *

* GGG 1 5 7 *

Codon usage observed in a compilation of 25 efficiently expressed, nonregulatory E. coli genes (25). *Asterisk indicates 23 codons which have been shown to be infrequently used in 25 efficiently expressed,

nonregulatory genes (25); 18 of these codons (TTA, TTG, CTT, CTA, ATA, TCG, CCT, CCC, ACA, ACG, CAA, AAT, AAG, GAG, CGA, CGG, AGT, GGA) are used more frequently in the sbcB sequence than in the 25 nonregulatory genes.

sites can be found. These sites, designated “ A and “B”, are shown in Fig. 2. Both -10 regions appear to share a common -35 region (Fig. 2); however, the spacing between -10 region “A” and -35 is 17 bases, a distance that is highly conserved among prokaryotic promoters (23).

It has been estimated from exonuclease I purification data that there are 40 to 60 molecules/cell of this protein in logarithmically growing cultures. One explanation for the poor expression of this gene is provided by an examination of the proposed promoter region. Using the method of Mulligan et al. (24) for comparing promoter strengths, a homology score of 44.4 using -10 for region “A” and 34.9 for -10 region “B” was obtained. Such homology scores are considered indicative of a weak promoter. These values compare with score of 38.2 for lacl and 51.5 for lexA, two genes known to encode regula- tory proteins present in low cellular abundance.

Codon Usage-A correlation has been observed between codon usage biases in E. coli and low cellular abundance, particularly for regulatory proteins (25). Table I1 shows the codon usage for the sbcB structural gene. Konigsberg and Godson (25) have identified 23 synonymous codons which are infrequently utilized in efficiently expressed, nonregulatory

TABLE I11 Incidence of 23 infrequently used codons in three reading frames of

sbcB, regulutory, and mnregulutory genes The incidence of infrequently used codons was determined using

the program DNASEQ (17). Rare codons in

Reading frame sbcB Regulatory Nonregulatory genes genes

1 (coding) 22.2 24.1 2

12.1 36.3 27.3

3 36.5

34.6 31.2 30.8

E. coli genes. The percent synonymous codon usage for these genes is also shown in Table 11. For 18 of the 23 codons designated as infrequently used, the sbcB gene has a higher than expected frequency of occurrence. In addition, it is known that these 23 rare codons occur at a higher frequency in the 2 out-of-frame sites than in the reading frame for highly expressed proteins, while for poorly expressed genes their occurrence is nearly equal across the three reading frames (25). Table I11 shows the distribution of the 23 rare codons in

Page 5: OF Vol. 262, Issue 5, pp. 455-459,1987 1987 by The Inc. U.S.A. … · 2001-07-12 · DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence,

Exonuclease I of E. coli 459

sbcB is very similar to that of the poorly expressed regulatory proteins.

Homology Comparisons-A comparison of the amino acid sequence of exonuclease I with that of the lambda exonuclease and the T7 gene 6 exonuclease revealed no significant homol- ogy. In addition, no significant homology was seen with E. coli DNA polymerase I, which possesses both 3‘ to 5’ and 5’ to 3‘ exonucleolytic activities (26).

DISCUSSION

We have reported the complete nucleotide sequence of the E. coli sbcB gene, encoding the enzyme exonuclease I. Char- acterization of the 5”flanking sequences indicate that sbcB is constitutively expressed from two possible inefficient pro- moters. In addition, similar codon usage biases with those determined for low abundance regulatory proteins have been observed. Since the ribosome binding site appears more than adequate, it is not clear at this time if the low abundance of the exonuclease I protein is the result of either poor transcrip- tion or translation or some combination of both factors. However, placing the sbcB structural gene under the control of a T7 promoter only resulted in a %fold increase in activity* over that previously reported (6).

Acknowledgment-We wish to thank Suzette Lay for assistance in preparation of the manuscript.

REFERENCES

1. Lehman, I. R., and Nussbaum, A. L. (1964) J. Biol. C h m . 2 3 9 ,

2. Kushner, S. R., Nagaishi, H., Templin, A., and Clark, A. J. (1971)

3. Kushner, S. R., Nagaishi, H., and Clark, A. J. (1972) Proc. Natl.

2628-2636

Proc. Natl. Acad. Sci. U. S. A. 68,824-827

Acad. Sci. U. S. A. 6 9 , 1366-1370

G. J. Phillips and S. R. Kushner, unpublished results.

4. Vapnek, D., Alton, N. K., Bassett, C. L., and Kushner, S. R.

5. Prasher, D., Kasunic, D. A., and Kushner, S. R. (1983) J. Bacte-

6. Prasher, D. C., Conarro, L., and Kushner, S. R. (1983) J. Bwl.

7. Norrander, J., Kempe, T., and Messing, J. (1983) Gene (Amst.)

8. Messing, J., Crea, R., and Seeburg, P. H. (1981) Nucleic Acids

9. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl.

10. Biggin, M. D., Gibson, T. J., and Hong, G. F. (1983) Proc. Natl.

11. Henikoff, S. (1984) Gene (Amst.) 28, 351-359 12. Berk, A. J., and Sharp, P. A. (1977) Cell 12 , 721-732 13. Markham, B. E., Harper, J. E., Mount, D. W., Sancar, G. B.,

Sancar, A., Rupp, W. D., Kenyon, C. J., and Walker, G. C. (1984) J. Mol. Biol. 178 , 237-248

14. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Labora- tory, Cold Spring Harbor, NY

15. Maxam, A. M., and Gilbert, W. (1980) Methods Enzymol. 6 5 ,

16. Sollner-Webb, B., and Reeder, R. H. (1979) Cell 18, 485-499 17. Arnold, J., Eckenrode, V. K., Lemke, K., Phillips, G. J., and

Schaeffer, S. W. (1986) Nucleic Acids Res. 14, 239-254 18. Hu, N., and Messing, J. (1982) Gene (Amst.) 17, 271-277 19. Thomas, P. S. (1983) Methods Enzymol. 101 , 255-266 20. Shine, J., and Dalgarno, L. (1974) Proc. Natl. Acad. Sci. U. S. A.

21. Rosenberg, M., and Court, D. (1979) Annu. Reu. Genet. 13,319-

22. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132 23. Hawley, D. K., and McClure, W. R. (1983) Nucleic Acids Res. 11 ,

24. Mulligan, M. E., Hawley, D. K., Entriken, R., and McClure, W.

25. Konigsberg, W., and Godson, G. N. (1983) Proc. Natl. Acad. Sci.

26. Kornberg, A. (1980) DNA Replication, W. H. Freeman and Co.,

(1976) Proc. Natl. Acad. Sei. U. S. A. 73,3492-3496

rwl. 153,903-908

Chem. 258,6340-6343

26,101-106

Res. 9,309-321

Acad. Sci. U. S. A. 74, 5463-5467

Acad. Sci. U. S. A. 80, 3963-3965

499-560

71,1342-1346

353

2237-2255

R. (1984) Nucleic Acids Res. 12 , 789-800

U. S. A. 80,687-691

San Francisco