new nucleotide and deduced polypeptide sequences of the … · 2008. 1. 6. · two stable hairpins...

9
Cell, Vol. 37, 949-957, July 1984, Copyright 0 1984 by MIT 0092-8674/84/070949-08 $02.00/O Nucleotide and Deduced Polypeptide Sequences of the Photosynthetic Reaction-Center, B870 Antenna, and Flanking Polypeptides from R. capsulata Douglas C. Youvan,** Edward J. Bylina,*+ Marie Alberti,* Helmut Begusch,* and John E. Hearst*+ *Division of Chemical Biodynamics Lawrence Berkeley Laboratory +and Department of Chemistry University of California Berkeley, California 94720 Summary The complete nucleotide sequence (8867 bp) and the deduced polypeptide sequence are given for 11 proteins from the photosynthetic gene cluster of Ft. capsulata (46 kb), including the photosynthetic re- action-center L, M, and H subunits and the B870a and 88708 polypeptides (light-harvesting I). These polypeptides bind bacteriochlorophyll, bacterio- pheophytin, carotenoids, and quinones that are in- volved in the primary light reactions of photosyn- thesis. Hydropathy plots indicate that the L and M subunits are transmembrane proteins that may cross the membrane five times, while the H subunit has only one hydrophobic section near the amino terminus, which may be transmembrane. The L and M subunits are homologous over their entire length and have a high degree of homology with the Q, protein from photosystem II of higher plants. An additional six genes were identified that may have some unknown role in bioenergetics since only mu- tations that affect the differentiation of the photo- synthetic apparatus are known to map to this gene cluster. Introduction Purple nonsulfur bacteria such as R. capsulata have a very diverse metabolism that includes the ability to grow pho- tosynthetically under anoxygenic conditions. At least eight polypeptides (three reaction-center and five light-harvest- ing polypeptides) are induced, along with the biosynthetic enzymes for carotenoid and bacteriochlorophyll biosyn- thesis during the differentiation of the respiratory mem- brane into the invaginated photosynthetic membrane. These polypeptides are necessary for the proper binding of the carotenoid and bacteriochlorophyll pigments, qui- nones, and other cofactors that absorb light, mediate primary charge separation, and generate reductant. An adjacent oxidoreductase complex utilizes reduced qui- nones to generate a transmembrane proton gradient. Many of the bacterial photosynthetic processes are analogous to or identical with processes that occur in the thylakoid membranes of higher plant chloroplasts. Because it is easier to study and manipulate photosynthetic bacteria, *Current address: Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724. we have chosen R. capsulata as a model organism for the study of the photosynthetic membrane. Several genetic techniques have been employed in the past ten years in the characterization of photosynthetic mutants and in the localization of the genes involved in the differentiation of the photosynthetic apparatus. Initially, the gene-transfer agent (GTA) was isolated from R. capsulata (Solioz et al., 1975; Yen et al., 1979) and was used to show that mutations in carotenoid and bacteriochlorophyll biosynthesis were linked (Scolnik et al., 1980). Cotrans- duction of these markers was used to generate a genetic map and mutations were correlated with intermediates in bacteriochlorophyll and carotenoid biosynthesis. Later these genes were mobilized for conjugation using a broad host range conjugative plasmid. R-prime plasmids were obtained carrying the photosynthetic gene cluster (Marrs, 1981). The 46 kb photosynthetic gene insert on the R- prime plasmid pRPS404 was shown to complement all known mutations in carotenoid and bacteriochlorophyll biosynthesis. Subsequently, the R-prime plasmid was shown to complement enhanced fluorescence mutants that lack reaction-center and B870 structural polypeptides. Marker-rescue experiments were used to localize further these structural polypeptides to two restriction fragments carried on the R-prime subclones: pRPSEB2 and pRPSE2 (Youvan et al., 1983; Taylor et al., 1983). In a preliminary report on the progress of nucleotide sequencing (Youvan et al., 1984) we utilized fragmentary polypeptide data to position the structural genes for B870a and B870/3 and the reaction-center L and M subunits within the nucleotide sequence of the Barn HI C-Eco RI B restriction fragment, and to position the reaction-center H subunit on the Barn HI F fragment (pRPSEB2 and pRPSE2 subclones, respec- tively). We proposed that the B870a, B870& and reaction- center L and M subunits are part of one operon and that the reaction-center H subunit is on a separate operon some 35 kb away, separated by the pigment biosynthetic genes. Hybridization of anoxygenically induced mRNA suggests that there is an oxygen-regulated promoter up- stream from the B870 genes (Clark et al., 1984). Results Gene Organization Two restriction fragments from the photosynthetic insert of the R-prime plasmid, pRPS404, complement all known enhanced fluorescence mutants of R. capsulata. Marker rescue of the enhanced fluorescence mutants was used to map the reaction-center and B870 structural genes. Here we report the complete nucleotide sequence of the Barn HI C-Eco RI B fragment and the Barn HI F fragment, which are known to carry the rxcA and rxcB loci, respec- tively. In a previous communication of these data we have reported the positions of the nucleotide sequences coding for the B87Op and B870a polypeptides from the light- harvesting I antenna and the positions of the reaction- center L, M, and H subunits. A complete map of the Barn

Upload: others

Post on 14-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Cell, Vol. 37, 949-957, July 1984, Copyright 0 1984 by MIT 0092-8674/84/070949-08 $02.00/O

Nucleotide and Deduced Polypeptide Sequences of the Photosynthetic Reaction-Center, B870 Antenna, and Flanking Polypeptides from R. capsulata

Douglas C. Youvan,** Edward J. Bylina,*+ Marie Alberti,* Helmut Begusch,* and John E. Hearst*+ *Division of Chemical Biodynamics Lawrence Berkeley Laboratory +and Department of Chemistry University of California Berkeley, California 94720

Summary

The complete nucleotide sequence (8867 bp) and the deduced polypeptide sequence are given for 11 proteins from the photosynthetic gene cluster of Ft. capsulata (46 kb), including the photosynthetic re- action-center L, M, and H subunits and the B870a and 88708 polypeptides (light-harvesting I). These polypeptides bind bacteriochlorophyll, bacterio- pheophytin, carotenoids, and quinones that are in- volved in the primary light reactions of photosyn- thesis. Hydropathy plots indicate that the L and M subunits are transmembrane proteins that may cross the membrane five times, while the H subunit has only one hydrophobic section near the amino terminus, which may be transmembrane. The L and M subunits are homologous over their entire length and have a high degree of homology with the Q, protein from photosystem II of higher plants. An additional six genes were identified that may have some unknown role in bioenergetics since only mu- tations that affect the differentiation of the photo- synthetic apparatus are known to map to this gene cluster.

Introduction

Purple nonsulfur bacteria such as R. capsulata have a very diverse metabolism that includes the ability to grow pho- tosynthetically under anoxygenic conditions. At least eight polypeptides (three reaction-center and five light-harvest- ing polypeptides) are induced, along with the biosynthetic enzymes for carotenoid and bacteriochlorophyll biosyn- thesis during the differentiation of the respiratory mem- brane into the invaginated photosynthetic membrane. These polypeptides are necessary for the proper binding of the carotenoid and bacteriochlorophyll pigments, qui- nones, and other cofactors that absorb light, mediate primary charge separation, and generate reductant. An adjacent oxidoreductase complex utilizes reduced qui- nones to generate a transmembrane proton gradient. Many of the bacterial photosynthetic processes are analogous to or identical with processes that occur in the thylakoid membranes of higher plant chloroplasts. Because it is easier to study and manipulate photosynthetic bacteria,

*Current address: Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724.

we have chosen R. capsulata as a model organism for the study of the photosynthetic membrane.

Several genetic techniques have been employed in the past ten years in the characterization of photosynthetic mutants and in the localization of the genes involved in the differentiation of the photosynthetic apparatus. Initially, the gene-transfer agent (GTA) was isolated from R. capsulata (Solioz et al., 1975; Yen et al., 1979) and was used to show that mutations in carotenoid and bacteriochlorophyll biosynthesis were linked (Scolnik et al., 1980). Cotrans- duction of these markers was used to generate a genetic map and mutations were correlated with intermediates in bacteriochlorophyll and carotenoid biosynthesis. Later these genes were mobilized for conjugation using a broad host range conjugative plasmid. R-prime plasmids were obtained carrying the photosynthetic gene cluster (Marrs, 1981). The 46 kb photosynthetic gene insert on the R- prime plasmid pRPS404 was shown to complement all known mutations in carotenoid and bacteriochlorophyll biosynthesis. Subsequently, the R-prime plasmid was shown to complement enhanced fluorescence mutants that lack reaction-center and B870 structural polypeptides. Marker-rescue experiments were used to localize further these structural polypeptides to two restriction fragments carried on the R-prime subclones: pRPSEB2 and pRPSE2 (Youvan et al., 1983; Taylor et al., 1983). In a preliminary report on the progress of nucleotide sequencing (Youvan et al., 1984) we utilized fragmentary polypeptide data to position the structural genes for B870a and B870/3 and the reaction-center L and M subunits within the nucleotide sequence of the Barn HI C-Eco RI B restriction fragment, and to position the reaction-center H subunit on the Barn HI F fragment (pRPSEB2 and pRPSE2 subclones, respec- tively). We proposed that the B870a, B870& and reaction- center L and M subunits are part of one operon and that the reaction-center H subunit is on a separate operon some 35 kb away, separated by the pigment biosynthetic genes. Hybridization of anoxygenically induced mRNA suggests that there is an oxygen-regulated promoter up- stream from the B870 genes (Clark et al., 1984).

Results

Gene Organization Two restriction fragments from the photosynthetic insert of the R-prime plasmid, pRPS404, complement all known enhanced fluorescence mutants of R. capsulata. Marker rescue of the enhanced fluorescence mutants was used to map the reaction-center and B870 structural genes. Here we report the complete nucleotide sequence of the Barn HI C-Eco RI B fragment and the Barn HI F fragment, which are known to carry the rxcA and rxcB loci, respec- tively. In a previous communication of these data we have reported the positions of the nucleotide sequences coding for the B87Op and B870a polypeptides from the light- harvesting I antenna and the positions of the reaction- center L, M, and H subunits. A complete map of the Barn

Page 2: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Cell 950

Figure 1. Photosynthetic Gene Cluster and Orga- nization of Sequenced Regions

Eco RI and Barn HI restriction sites are shown within the 46 kb photosynthetic gene cluster from R. capsulata carried by the R-prime plasmid pRPS404. The Barn HI C-Eco RI B fragment is shown below on an expanded scale. The reversed Barn HI F fragment is shown above. Solid arrows indicate the coding regions for known structural polypeptides: reaction-center L, M, and H subunits and 8870 a and p subunits. Open arrows indicate putative protein-codrng sequences.

HI C-Eco RI B and Barn HI F fragments is shown in Figure 1, and the complete nucleotide sequence is given in Figures 2 and 3.

Notable features of the gene organization include the contiguous arrangement of the B870fi, B870a, L subunit, and M subunit structural genes and the separation of this cluster from the H subunit structural gene. We believe that the P-C-L-M gene cluster is polycistronic and transcribed from an oxygen-repressed promoter near the Eco RI Q- Eco RI B junction. This interpretation is consistent with the previous observation (Clark et al., 1984) of anoxygenically induced mRNA, which hybridizes strongly to the first two Sal I fragments in the Barn HI C-Eco RI B fragment. The H subunit structural gene is in a different operon since it is separated from the P-a-L-M cluster by 35 kb and is tran- scribed in the opposite direction.

Searches for E. coli-like consensus RNA promoters have failed over the entire 9 kb of sequenced DNA. However, two strong hairpins (Figure 4) have been detected by computer analysis, which lie in potentially important posi- tions for regulation of the photosynthetic gene cluster. The a-L hairpin (-46.8 kcal), located in the region between the 8870 structural genes and the L and M structural genes, may be involved in regulating the stoichiometry between the B870 complex and the reaction center. The two gene products preceding the hairpin, B870P and B870a, are present in the membrane in a I:1 ratio. The two gene products following the hairpin, the L and M subunits, are also present in the membrane in a I:1 ratio (Nieth et al., 1975). The B870 complex and the RC are present in a 1 O- 2O:l ratio under varying growth conditions (Drews et al., 1984). The RNA hairpin may serve as a termination signal (Farnham and Platt, 1981) and limit the amount of tran- scriptional read-through into the L and M structural genes. This hairpin is the correct size to serve as an attenuator or

terminator for E. coli RNA polymerase according to the bridged RNA-DNA intermediate model of Gamper and Hearst (1982). The size of the unwound region of DNA in the R. capsulata RNA polymerase complex has yet to be measured. A second hairpin is located in the region be- tween the H structural gene and the start of the F3981 putative protein (-34.0 kcal). The transcriptional signifi- cance of this structure is difficult to ascertain without knowing the identity of the F3981 protein. Further experi- ments, including Sl mapping and in vitro transcription translation, are necessary to achieve a better understand- ing of the function of these secondary structures.

Putative Peptides Putative peptides in the Barn HI C-Eco RI B fragment and in the Barn HI F fragment are also shown in the gene organization (Figure 1). These peptides were identified by the criterion that they possess a Shine-Dalgarno ribosome binding site, (A/C)GGAG(A/G)N,_,,ATG, which is comple- mentary to the 3’ terminus of R. capsulata 16s rRNA prior to the open reading frame (Youvan et al., 1984). This sequence, with a variable number (three to ten) of unspec- ified residues preceding the ATG start codon, is a suffi- ciently stringent search criterion to obtain only nine sites in the 9 kb of sequenced DNA. Five of these nine sites occur at the beginning of the structural genes for B870P and B870o1, and the L, M, and H subunits of the reaction center. The other four sites precede open reading frames, which, we believe, strongly indicates the existence of other protein coding regions in the sequenced DNA. The putative proteins have been named by their start site in the DNA beginning with the first nucleotide corresponding to the ATG start codon. This number is preceded by a C or an F, depending on whether the protein is encoded by the Barn HI C-Eco RI B fragment or the Barn HI F fragment,

Frgure 2. Nucleotide Sequence and Deduced Polypeptide Sequences for the Barn HI C-Eco RI B Restriction Fragment from pRPS404

The nucleotide sequence begins at the Eco RI site at the border of the Eco RI Q fragment and proceeds through the structural genes for B87Op, B870a, reaction-center L subunit, reaction-center M subunit, and through genes for putative polypeptides C2397 and C2814. The sequence ends at the Barn HI site bordering the Barn HI B fragment. Shine-Dalgarno sequences and all restriction sites given in Figure 1 are underlined.

Page 3: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Ngleotlde and Deduced Polypeptide Sequences

Page 4: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Cdl 952

The nucleotide sequence begrns at the Barn HI site at the junction with the Barn HI D fragment and proceeds through the genes for putative polypaptides F108, F460, F1025, and F1696 and through the structural gene for the H subunit of the photosynthetic reaction center, and finally tha beginning of the gene for putative polypeptide F3981. The sequence ends at the Barn HI site on the border of the Barn HI K fragment. Shine-Dalgarno sequences and all restriction sites given in Figure 1 are underlined.

respectively. Putative proteins C2397, C2814, F108, and search. These nine protein-coding sequences cover 85% F3981 are obtained, in addition to the five structural protein of the Barn HI C-Eco RI B fragment but only about 25% encoding sequences, for the (A/C)GGAG(A/G)N,-,,ATG of the Barn HI F fragment.

Page 5: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Nucieotrde and Deduced Polypeptide Sequences 953

0) .G\

4 ,Y G C A-” G.C

Figure 4. Hairprns

Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are displayed: (a) the a-L hairpin, which is located between the structural genes for the B870n polypeptide and the L subunit of the reaction center; (b) the H hairpin, which is Immediately after the stop codon of the reaction center H subunit. The free-energy values used were tabulated by Salser (1977) and modified by Cech et al. (1983).

Our analysis has failed to detect any protein-coding sequence in the Barn HI F fragment in the region extending from 338 to 3145. However, the F1696 protein constitutes the largest open reading on the Barn HI F fragment and possesses the core GGAG Shine-Dalgarno sequence pre- ceding the ATG start codon. An analysis of 74 sequences recognized by E. coli ribosomes indicates that RNA-RNA interactions frequently involve the GGAG sequence (Steitz, 1980).

Two more large open reading frames that follow se- quences with some complementarity to the 3’ terminus of 16s rRNA preceding an ATG start codon cover this region and encode two more putative proteins: F460 and F1025. That this arrangement is correct is supported by the unlikely overlap of the F1025 termination codon with the F1696 start codon, the near overlap of the F460 termination codon with the F1025 start codon, and by analysis of codon usage (see the next section).

Other polypeptides are contained within and are possi- ble alternatives to some of the putative proteins. For example, F904 is internal to F460, F1166 is internal to F1025, and C3462 and C4197 are internal to C2814. In terms of discriminating between alternatives, we favor the longer reading frames since the criterion of length would have given the correct answer in cases where there are additional Shine-Dalgarno sequences that precede internal methionines within the known reaction-center gene se- quences (e.g., C785 within the L subunit). These polypep- tide coding sequences may be the consequence of ran-

B87op 148 297 49 5466 +0.476 887ow 311 487 58 6594 iQ.937 L SUBUNIT 6M 1468 282 31565 +0.525 M S"&"N,T IUI 2384 307 34440 +0.372 c2397 2397 2633 78 8568 +a.223 c2814 2814 4739 641 67939 -0.017 F108 108 338 76 8090 -0.594 F460 46ll 1023 187 20666 -0.136 FlO25 1025 1699 224 25223 -0.210 F16% 1696 3129 477 50369 +0.767 H SusuNlr 3145 3909 254 28534 -0.387 F3981 398, - ,14 - -

Figure 5. Summary of the Deduced Polypeptides

Sequence positions for the start and stop codons (first and last nucleotides in codon, respectively) total lengths in amino acids (including N-terminal methionine), calculated molecular weights, and mean hydropathies are given for each deduced polypeptide sequence.

dom sequences resembling the Shine-Dalgarno sequence, which happen to precede internal methionines. Alterna- tively, the Shine-Dalgarno sequences may be the evolu- tionary remnants of ancestral gene fusions.

Only two polypeptides were found encoded in the re- verse direction (complementary strand) on the two frag- ments: C2829 is 29 residues long; F1697 is 19 residues long. The codon usage of these two polypeptides is quite different from the other proteins, Some of the forbidden codons are used (see codon usage). Based on their unusual codon usage, small size, and reverse orientation, these peptides are not considered further.

Analysis of Deduced Peptide Sequences Figure 5 summarizes the salient features of the polypep- tides found in the 9 kb of sequenced DNA from pRPS404. These include the structural polypeptides from the photo- synthetic reaction center and the B870 antenna and six putative peptides of unknown function. The L and M subunits of the reaction center and the B870/3 and B870a polypeptides are very hydrophobic, as indicated by posi- tive hydropathy values (see the next section). The calcu- lated molecular weights of the polypeptides are anomalous when compared to the apparent molecular weights deter- mined by SDS-PAGE. B87Op and B870a have been re- ferred to as the 8 kd and 12 kd LHI polypeptides (Drews, 1978). The molecular weights calculated from their de- duced polypeptide sequences are actually 5,466 and 6,594 daltons, respectively. This anomaly was previously observed by protein sequencing (Tadros et al., 1984). The deduced molecular weights of the reaction-center L and M subunits are 31,565 and 34,440 daltons. These are much higher than their apparent molecular weights from SDS-PAGE, which are 21,000 and 24,000 daltons, respec- tively. The H subunit of the reaction center, which is not hydrophobic, has a deduced molecular weight of 28,534 daltons. This is very close to the apparent molecular weight from SDS-PAGE, which is 28,000 daltons. It is probable that anomalous binding of SDS by the L and M subunits makes them appear to be smaller than the H subunit by gel electrophoresis when, in fact, the contrary is true.

To date, all mutations that have been mapped on the R-

Page 6: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Cell 954

Figure 6. Hydropathy Plots for the Reaction-Center Subunits

Calculated hydropathy values (Kyte and Doolittle, 1982) for the reaction- center L, M, and H subunits using a window of 19 ammo acid residues. Y values extend from -2.25 to +2.25 for each box. The solid line is drawn through Y = 0, and the dotted line is drawn through the mean hydropathy value.

prime plasmid, pRPS404, are involved in the light reactions of photosynthesis. This suggests an associated role in photosynthesis for the putative peptides C2397, C2814, F108, F460, F1025, and F1696. C2397 may be part of the 0-a-L-M operon since it begins only 13 nucleotides after the M subunit termination codon and because anoxygeni- tally induced mFiNA hybridizes to the Sal I restriction fragment beginning at sequence position 2328 (Clark et al., 1984). The average hydropathy of C2397 indicates that it may interact with membranes. C2814 has a deduced M, of 67,939 and a mean hydropathy index that indicates it is probably not membrane-bound. F108, F460, and F1025 are all hydrophilic, whereas F1696 (deduced M, 50,369) is extremely hydrophobic (average hydropathy of +0.767). F1696 is more hydrophobic than the most hydrophobic reaction-center subunit. This strongly implies that F1696 is a large intrinsic membrane protein. Since it is contiguous to the H subunit, it is possible that the two proteins are in the same operon.

The most obvious structural feature that can be seen in the deduced L and M peptide sequences is the grouping of hydrophobic residues forming extremely hydrophobic segments. This feature is best displayed by hydropathy plots where the individual hydropathy values for each amino acid are averaged over a variable window (Kyte and Doolittle, 1982). This window is set to a span of 19 residues for putative membrane-spanning proteins since this gives the best results for known membrane-spanning proteins such as bacteriorhodopsin. Physically, 19 residues in LY helical conformation would be the correct length to span

Figure 7. Total Codon Usage

Codon usage compiled from a total of 2844 codons including the B870 polypeptides, reaction-center subunits, and all putative polypeptides. Values given correspond to the total number of occurrences of each triplet.

the membrane once. Figure 6 shows hydropathy plots for the L, M, and H subunits of the reaction center. The Y axis measures average hydropathy, with a possible range of -4.5 to +4.5. The negative values are assigned to hydro- philic residues (-4.5 for arginine), which increase with hydrophobicity to +4.5 for isoleucine. Oscillations in the hydropathy plot are indicative of possible multiple mem- brane crossings. Hydropathy plots for the Land M subunits of the reaction center suggest that they span the mem- brane five times. The L and M hydropathy plots are remarkably similar to each other, which can be seen by shifting the L subunit plot approximately 25 residues to the right. A similarity in the hydropathy plots of the L subunit and the Qg protein has also been noted. These similarities are a result of extensive homology between the proteins (see Discussion). The reaction-center H subunit has a lower average hydropathy with only one segment very near the amino terminus (maximum hydropathy = 1.6) with hy- dropathy comparable to the L and M transmembrane segments. Based on the plot, we would predict a mem- brane-bound amino terminus for the H subunit that anchors the polypeptide to the membrane.

CodonUsage The frequencies of each triplet in the genetic code for 1 I proteins on the Barn HI C-Eco RI 6 fragments have been tabulated (Figure 7). First, we considered only the five known structural sequences: B870@, B870o(, L, M, and H subunits, which total 955 codons. Second, another 1689 codons were considered (a grand total of 2644 codons) by including the putative proteins: C2397, C2814, F108,

Page 7: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Nucleotide and Deduced Polypeptide Sequences 955

F460, F1025, and F1696. We found that the inclusion of the putative protein codons in the codon-usage calculation did not significantly change the values previously calcu- lated using only the known structural proteins. Of 64 codons, 50 changed by 10% or less, 11 changed by 1 I%-15%, and three changed by greater than 15%. Comparing the two sets of percentages (955 codons vs. 2644 codons, respectively) histidine CAT increased from 19% to 48%. Correspondingly, CAC decreased from 81% to 52%, and arginine CGT decreased from 33% to 15%. We believe that the larger sampling (2644 codons) gives a statistically more significant basis to the codon-usage calculation. Certain codons are forbidden in our sampling: leucine TAA and CTA, valine GTA, cysteine TGT, arginine AGA and AGG, and stop TAG. Twelve other codons are used at a frequency less than or equal to 3%. The fact that the codon-usage percentages are largely unaffected by the inclusion of the putative protein data and that certain codons remain forbidden or rare is further evidence for the correct deduction of the putative proteins.

When possible, the bias in the genetic code favors C or G in the third position of the codon. Considering the 955 known codons from the structural genes, G or C is used in the third position of the codon 84%. Considering all 2644 codons the frequency of G or C usage in the third position is unchanged (84%). Sixteen out of 19 forbidden or rare codons end in A or T. This strongly suggests that the GC richness (65%) of the DNA sequence is the cause of the codon bias.

Discussion

Homology A comparison of the R. capsulata M subunit sequence and the R. sphaeroides M subunit sequence (Williams et al., 1983) is shown in Figure 8. The entire polypeptide sequence is highly conserved (76.5% homology). A region of 157 contiguous residues covering 51% of the M subunit shows a homology of 92.4% (R. capsulata residues 139- 296 and R. sphaeroides residues 141-298). The possibility of a frame-shift error in the determination of the deduced peptide sequence in either organism’s M subunit is very unlikely in view of this homology. This conservation of amino acid sequence indicates that little divergence has taken place between the M subunit in the two organisms. Overall interspecies homology between L subunits or H subunits cannot be evaluated since sequence data is currently available for only the amino termini of the R. sphaeroides proteins. We have previously shown (Youvan et al., 1984) that the amino-terminal sequences are con- served between the two organisms.

The protein sequence of the R. capsulata M subunit, L subunit, and the 32 kd thylakoid membrane protein (Q, protein) from spinach (Zurawski et al., 1982) were searched for regions of homology. The results of a com- puter analysis by Doolittle (1981) are shown in Figure 9. The data indicate that the L and M subunits and the Qe protein may share a common precursor. Using the R.

R. CAPSULATR M SW”N,T : R. SWHEROIDES M SUBUNIT 235 MATCHES 16.5 % 210

Figure 8. M Subunit Homology

Deduced polypeptide sequences are compared and aligned for maximum homology between the reaction-center M subunits from R. capsulata and R. sphaeroides (Williams et al., 1983). Stars above the sequence indicate a match.

capsulata L subunit for reference, the overall homology in terms of the number of perfect matches of amino acid residues is 30.5% with the M subunit, 25.2% with the Qe protein, and 18.9% with the H subunit. Since the homolo- gies extend over the length of the peptides, we believe that this is strong evidence that all peptides have been correctly deduced in the proper reading frame. An espe- cially strong region of homology is found in the M sequence (residues 196-221) and the QB protein, which has also been noted in the case of R. sphaeroides (Williams et al., 1983). One model proposes that this sequence of amino acids is involved in quinone binding and function (Hearst and Sauer, 1984).

Several possibilities exist for the identity of the six puta- tive polypeptides. Because of their proximity to reaction- subunit structural genes, C2397 and F1696 are especially interesting. C2397 is moderately hydrophilic and similar in size to a cytochrome, while F1696 is extremely hydropho- bic and very large (50,000 daltons). F1696 is most likely a large intrinsic membrane protein. The close proximity of C2397 and F1696 to the reaction-center structural genes may suggest that the polypeptides are functionally related and in close proximity within the membrane. One possibility is that F1696 and C2397 proteins are part of the Reiske cytochrome bc oxidoreductase complex, which is involved in the Q cycle. Although highly speculative, this is a working model to be considered in mutagenesis experiments or in vitro transcription-translation experiments aimed at identi- fication. If the putative peptides have a role in both respi- ration and photosynthesis, then mutagenesis may be le- thal. Alternatively, these proteins may be involved in regu- lation of the reaction-center and B870 antenna structural polypeptides. Experimentally, an anoxygenicaily induced, pigment-binding protein has been observed in chromato- phore membranes from R. capsulata at an apparent mo- lecular weight of 45,000 daltons (Schumacher and Drews, 1978). The 50,000 dalton F1696 protein may be expected to electrophorese anomalously fast at this apparent molec- ular weight due to excessive binding of SDS, as observed

Page 8: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Cell 956

MALLSFERKYR”P6GTLIG~sLFDFW”6~FY”6FF6”TTIF~*TLGFLLTILWGAAMQGTWNP MTAILERRESESLWGRFCNWI TSTENRLY~~WF~VLMI PTLLTATSVF~IAFIAAPPV~IDGIREPVSGSLLYGNNI ISGAI

MAEYQNFFNOVOVAGAPEMGLKEDVDTFER TPAGMFNIL~WM~NAPIGPIYLGIAGTVSLAFGAAWFFTIGVWYWYGAGFDP FI

WTHLDWVSNTGYTYB~FHYW~Y~GISLFFTTAWAL~ALVLS AANPVKGKTM:TPDHEDT YFRDLYGYS SGTFNFMIVFOAEH NlLMHPFHYL6VAGVF6GSLFSA~HGSLVTSSLlRETTENESANEGYRFGO~~~lYNlVAAHG YFGRL IFOY FSHLDWTNOFSLDH6NLFYNPFHGLSIAALYBSILLFIYn6ATlLA VTRF6GERELEOlVDRGTASERAALFWRWTMGFN

VGTLBIHRLGLLL?LNAVFWSACCM:VSB?IYFDL WSkWYWWVNMPFWAD MYGGING ASFNNSRSLHFFLAAWPVVGIWFTALGISTMAFNLNGFNFNOSVVDSOGRVIN TWADI I NRANLGMEVMHERNAHNFPLDLAAI EAPSTNG ATME6IHRWAIWMAVMVTLTGGlGlLLS61 VVDNWYVWA OVHGYAPVTP

L SUBUNIT (282) OB-PROTEIN (353) M SUBUNIT (3071

MATCHES: L:M 89 (31.6%) L:OB 69 (24.5%) M:OE 66 (21.5%)

Figure 9. L, M, Qg Homology

Deduced polypeptide sequences are compared and aligned for maximum homology among the reaction-center L and M subunits and the OS protein from photosystem II of spinach. Homology between two proteins is indicated by boldface type; homology among all three proteins is indicated by boldface type plus a darkened box (I) directly above the sequence.

for the other hydrophobic membrane proteins: reaction- center L and M subunits.

Experimental Procedures

The nucleotide sequence of the Barn HI F and BAM HI C-Eco RI B fragments from the photosynthetic gene cluster on the R-prime plasmid pRPS404 was determined by the shotgun dideoxy method (Sanger et al., 1977) using a 15 nucleotide synthetic primer (P-L Biochemicals). Taq I, Sau 3A, and Hpa II fragments from pRPSEB2 and pRPSE2 were cloned into M13mp8 and M13mp9 (Messing et al., 1981). Double digests using com- binations of Eco RI, Pst I, Barn HI, Bgl II and Sal I were used to obtain larger fragments, which were cloned into M13mp8 and M13mp9. The shotgun dideoxy method was also supplemented by the reverse-primer sequencing method of Hong (1981) using a 17 nucleotide primer (P-L Biochemicals). The sequence data were organized and analyzed using the programs of Pustell and Kafatos (1982).

A consensus sequence was obtained by comparing the results from an average number of six individual sequence determinations of each region of the two fragments presented here. A list of the clones used was supplied to the reviewers and is available from the authors upon request. Both strands of the Barn HI F fragment were sequenced on 99.5% of the fragment. Both strands of the Barn HI C-Eco RI B fragment were sequenced on 94.4% of the fragment.

Acknowledgments

We thank Dr. Barry Marrs for his initial help in the conceptualization and planning of experiments aimed at the isolation of reaction-center genes and for his gtft of the R-prime plasmid and subclones. We thank Mr. John Hubbard for assistance in computer analysis of the sequence data, and Dr. Russell Doolittle for his assistance in polypeptide homology analysis, We thank Drs. Zuber, Tadros, and Drews for communication of 8870 polypeptide sequence data prior to publication. D. C. Y. would like to thank Chris Keller at Cold Spring Harbor Laboratory for assistance in computer programming, and Exxon Research and Engineering Liaison for support during manuscript preparation. This work was supported by the office of Energy Research, Office of Basic Energy Sciences, Biological Energy Research Division of the U.S. Department of Energy, under Contract DE- AC03-76SF0098, and by the National Institutes of Health, under Grant GM30786.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC. Section 1734 solely to indicate this fact.

Received February 17, 1984; revised April 13, 1984

References

Cech, T. R., Tanner, N. K., Tinoco, I,, Weir, B. R., Zuker, M., and Perlman, P. (1983). Secondary structure of the Tetrahymena ribosomal RNA inter- vening sequence: structural homology with fungal mitochondrial intervening sequences. Proc. Nat. Acad. Sci. USA 80. 3903-3907.

Clark, W. G., Davidson, E., and Mans, B. L. (1984). Variation of levels of mRNA coding for antenna and reaction center polypeptides in Rps. cap- sulara in response to changes in Oz concentration. J. Bacterial. 157, in press.

Doolittle, R. F. (1981). Similar amino acid sequences: chance or common ancestry? Science 214, 149-159.

Drews, G. (1978). The bacterial photosynthetic apparatus. Curr. Topics Bioenerget. 8, 161-207.

Drews, G., Peters, J., and Dierstein, R. (1984). Molecular organization and biosynthesis of pigment-protein complexes of Rhodopseudomonas cap- sulata. Ann. Microbial. 7348, 151-158.

Farnham, P. J., and Platt, T. (1981). Rho-independent termination: dyad symmetry in DNA causes RNA polymerase to pause during transcription in vitro. Nucl. Acids Res. 9, 563-577.

Gamper, H. B., and Hearst, J. E. (1982). A topological model fortranscription based on unwinding angle analysis of E. coli RNA polymerase binary, initiation, and ternary complexes, Cell 29, 81-90.

Hearst, J., and Sauer, K. (1984). Protein sequence homologies between portions of the Land M subunits of reaction centers of Rhodopseudomonas capsulata and the C&protein of chloroplast thylakoid membranes: a pro- posed relation to quinone-binding sites. Zeitschrift ftir Naturforschung, in press. Hong, G. R. (1981). A method for sequencing single-stranded cloned DNA in both directions. Bioscience Reports I, 243-252.

Kyte, J., and Dolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105-132.

Page 9: New Nucleotide and Deduced Polypeptide Sequences of the … · 2008. 1. 6. · Two stable hairpins that may be involved in the regulation of the photosyn- thetic gene cluster are

Nucleotide and Deduced Polypeptide Sequences 957

Mans, B. L. (1981). Mobilization of the genes for photosynthesis from Rhodopseudomonas capsulata by a promiscuous plasmid. J. Bacterial. 746, 1003-1012.

Messing, J., Crea, Ft., and Seeburg, P. H. (1981). A system for shotgun DNA sequencing. Nucl. Acids Res. 9, 309-321.

Neith, D. F., Drews, G., and Feick, R. (1975). Photochemical reaction centers from Rhodopseudomonas capsulata. Arch. Microbial. 705, 43-45.

Pustell. J., and Kafatos, F. C. (1982). A high speed, high capacity homology matrix: zooming through SV40 and polyoma. Nucl. Acids Res. 70, 4765 4702.

Ryan, T., and Chamberlain, M. J. (1983). Transcription analyses with heteroduplex trp attenuator templates indicate that the transcript stem and loop structure serves as the termination signal. J. Biol. Chem. 258, 4690- 4693.

Salser, W. (1977). Globin mRNA sequences: analysis of base pairing and evolutionary implications. Cold Spring Harbor Symp. Quant. Biol. 42, 985 1002.

Sanger, F., Nicklen, S., and Coulson, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Nat. Acad. Sci. USA 74, 5463-5467.

Schumacher, A., and Drews, G. (1978). The formation of the bacteriochlo- rophyll-protein complexes of the photosynthetic apparatus of Rhodopseu- domonas capsulata during early stages of development. Biochim. Biophys. Acta.501, 183-194.

Scolnik, P. A., Walker, M. A., and Marrs, B. L. (1980). Biosynthesis of carotenoids derived from neurosporene in Rhodopseudomonas capsulata. J. Biol. Chem. 155, 2427-2432.

Solioz, M., Yen, H.-C., and Marrs, B. L. (1975). Release and uptake of gene transfer agent by Rhodopseudomonas capsulata. J. Bacterial. 123, 651- 657.

Steitz, J. (1980). RNA-RNA interactions during polypeptide chain initiation. In Ribosomes: Structure, Function and Genetics, G. Chambliss et al., eds. (Baltimore: University Park Press) pp. 479-495.

Tadros, M. H., Suter, F., Seydewitz, H. H., Witt, I., Zuber, H., and Drews, G. (1984). Isolation and complete amino-acid sequence of the small poly- peptide from light-harvesting pigment-protein complex I (8870) of Rhodo- pseudomonas capsulata. Eur. J. Biochem. 138, 209-212.

Taylor, D. P., Cohen, S. N., Clark, W. G., and Marrs, B. L. (1983). Alignment of genetic and restriction maps of the photosynthesis region of the Rho- dopseudomonas capsulata chromosome by a conjugation-mediated marker rescue technique. J. Bacterial. 154, 580-590.

Williams, J. C., Steiner, L. A., Ogden, R. C., Simon, M. L., and Feher, G. (1983). Primary structure of the M subunit of the reaction center from Rhodopseudomonas sphaeroides. Proc. Nat. Acad. Sci. USA 80, 6505- 6509.

Yen, H. C., Hu, N. T., and Marrs, B. L. (1979). Characterization of the gene transfer agent made by an overproducer mutant of Rhodopseudomonas capsulata. J. Mol. Biol. 737, 157-168.

Youvan, D. C., Hearst, J. E., and Marrs, B. L. (1983). Isolation and characterization of enhanced fluorescence mutants of Rhodopseudomona capsulata. J. Bacterial. 754, 748-755.

Youvan, D. C., Alberti, M., Begusch, H., Bylina, E. J., and Hearst, J. E. (1984). Reaction center and light-harvesting I genes from Rhcdopseudom- onas capsulata. Proc. Nat. Acad. Sci. USA 87, 189-192.

Zurawski, G., Pohnert, H. J., Whitfield, P. R., and Bottomley, W. (1982). Nucleotide sequence of the gene for the M, 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of M, 38,950. Proc. Nat. Acad. Sci. USA 79, 7699-7703.