structure and complete nucleotide sequence of the chicken ... · the journal of biological...

12
THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue of July 5, pp. 8965-8976 1986 Printed in ir.S.A. Structure and Complete Nucleotide Sequence of the Chicken &-Smooth Muscle (Aortic) Actin Gene AN ACTIN GENEWHICH PRODUCES MULTIPLEMESSENGER RNAs* (Received for publication, February 14, 1986) Steven L. Carroll$$, Derk J. BergsmaSll, and Robert J. SchwartzS)) From the $Department of Cell Biology and the ((Program in Neuroscience, Baybr College of Medicine, Houston, Texas 77030 The @-smooth muscle (aortic)actin gene is a distinct memberofthe actin multigene family which is ex- pressed in vascular smooth muscle cells. We have de- termined the complete nucleotide sequence of l l kilo- base pairs of genomic DNA encoding the chicken a- smooth muscle actin gene. This single copy gene spec- ifies a protein identical in sequence to themajor a- actin from bovine aorta. The protein-coding sequences are interrupted by seven introns which are at codons specifying amino acid residues 41/42,84/85,121/122, 150, 204, 267, and 3271328. An eighthintron was foundinthe mRNA 5’ untranslatedregion.The 5’ flanking sequences contain elements which are con- served in other chicken muscle actin genes. Additional sequences at the 5’ end of the gene may be conserved in at least one human actin gene. We have identified at least four messenger RNAs ranging in size from ap- proximateIy 1370 to 2700 nucleotides (excluding poly(A)tails) which are transcribed from the a-smooth muscle actin gene. These RNAs differ in the length of their 3’ untranslated regions, probably as a result of the utilization of alternative polyadenylation signals. This is the first report of an actin gene with multiple mRNA transcripts. Actins are highly conserved proteins which are found ubiq- uitously in eukaryoticcells. Amino acid sequencing data have demonstrated the presence of several distinct actin isotypes in vertebrates which can generally be classified aseither “cytoplasmic” or “muscle” actins. Cytoplasmic actins are found in nonmuscle cells where they are utilized to form the cellular microfilaments which function in cell motility and mitosis (1). The number of cytoplasmic isoforms ranges from at least two in mammals (p and y (1)) to threeor even more in birds’ and amphibians (2, 3). Muscle actins are essential components of the contractile apparatus of muscle cells and are subdivided into either striated or smooth isoforms, based * This work was supported by United States Public Health Service Grant NS-15050 from the National Institutes of Health and bya grant from the Muscular Dystrophy Association. The costs of publi- cation of this article were defrayed in part by the payment of page charges. This article must therefore beherebymarked“aduertise- ment” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. § Supported by National Research Award HD07165-07, the A. W. Mellon Foundation, and the Baylor Medical Scientist Training Pro- gram. tl Supported by Grant G-381 which was named in honor of Hazel Affiliate. Hickey and awarded by the American Heart Association, Texas ‘S. L. Carroll, D. J. Bergsma, and R. J. Schwartz, unpublished data. on the muscle cell type in which they predominate. The striated muscle isoforms may becoexpressed in a tissue under at least some circumstances (4,5) with a-skeletal muscle actin representing the predominant form in adult skeletal muscle and a-cardiac muscle actin prevailing in adult cardiac tissue (6, 7). The smooth muscle actins appear to be similarly coexpressed (8). In the genital and gastrointestinal tracts, y- smooth muscle actin predominates, while in vascular tissue, such astheaorta,a-smooth muscle actin is the primary isotype (7,9, 10). The isolation of actin genes has revealed that in most species these proteins are encoded by multigene families whose members are differentially regulated temporally (11- 17) and spatially (8, 15, 18) during the ontogeny of the organism. The high degree of sequence conservation between actin proteins from a plethora of organisms argues strongly that this multigene family arose by duplication and subse- quent divergence from a common ancestral gene. In the course of these events, certain regulatory and structural features of the loci presumably diversified to produce the specialized genes presently extant. Analyses of the structure of these genes are thus essential both for reaching an understanding of the processes influencing the evolution of this family and identifying the regulatory elements controlling the expression of individual loci. Toward these ends, several representatives of the vertebrate striated muscle (19-24) and cytoplasmic (3, 25-27) actin gene subfamilies have been structurally charac- terized. There have been, however, no earlier reports concern- ing the structure of smooth muscle actin genes, with the exception of an incomplete description of the human a- smooth muscle actin gene (28). Thus, this report represents the first complete description of a smooth muscle actin gene. Our laboratory recently isolated 6 of a potential 8-10 actin genes present in the chicken genome, one of which was the a-smooth muscle actin gene (29). We have determined the complete nucleotide sequence of this gene, including its flank- ing regions, and examined the RNA transcripts produced by this gene. Interestingly, this single-copy gene is a template for multiple mRNA species which differ from one another in the lengths of their 3’ untranslated regions, probably as a result of the utilization of alternative polyadenylation sites. To the best of our knowledge, this is the first report of an actin gene which produces multiple transcripts. These data, considered in combinationwith the exon/intron organization and structure of the 5‘ flanking region of this gene, have important implications both for the regulation of this locus and for the evolutionary history of the actin multigene family, EXPERIMENTAL PROCEDURES Materials-Restriction endonucleases were purchased from New England Biolabs, Boeihringer Mannheim, or Amersham Corp.; Esch- 8965

Upload: others

Post on 23-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc

Vol. 261, No. 19, Issue of July 5, pp. 8965-8976 1986 Printed in ir.S.A.

Structure and Complete Nucleotide Sequence of the Chicken &-Smooth Muscle (Aortic) Actin Gene AN ACTIN GENE WHICH PRODUCES MULTIPLE MESSENGER RNAs*

(Received for publication, February 14, 1986)

Steven L. Carroll$$, Derk J. BergsmaSll, and Robert J. SchwartzS)) From the $Department of Cell Biology and the ((Program in Neuroscience, Baybr College of Medicine, Houston, Texas 77030

The @-smooth muscle (aortic) actin gene is a distinct member of the actin multigene family which is ex- pressed in vascular smooth muscle cells. We have de- termined the complete nucleotide sequence of l l kilo- base pairs of genomic DNA encoding the chicken a- smooth muscle actin gene. This single copy gene spec- ifies a protein identical in sequence to the major a- actin from bovine aorta. The protein-coding sequences are interrupted by seven introns which are at codons specifying amino acid residues 41/42,84/85,121/122, 150, 204, 267, and 3271328. An eighth intron was found in the mRNA 5’ untranslated region. The 5’ flanking sequences contain elements which are con- served in other chicken muscle actin genes. Additional sequences at the 5’ end of the gene may be conserved in at least one human actin gene. We have identified at least four messenger RNAs ranging in size from ap- proximateIy 1370 to 2700 nucleotides (excluding poly(A) tails) which are transcribed from the a-smooth muscle actin gene. These RNAs differ in the length of their 3’ untranslated regions, probably as a result of the utilization of alternative polyadenylation signals. This is the first report of an actin gene with multiple mRNA transcripts.

Actins are highly conserved proteins which are found ubiq- uitously in eukaryotic cells. Amino acid sequencing data have demonstrated the presence of several distinct actin isotypes in vertebrates which can generally be classified as either “cytoplasmic” or “muscle” actins. Cytoplasmic actins are found in nonmuscle cells where they are utilized to form the cellular microfilaments which function in cell motility and mitosis (1). The number of cytoplasmic isoforms ranges from at least two in mammals ( p and y (1)) to three or even more in birds’ and amphibians (2, 3). Muscle actins are essential components of the contractile apparatus of muscle cells and are subdivided into either striated or smooth isoforms, based

* This work was supported by United States Public Health Service Grant NS-15050 from the National Institutes of Health and by a grant from the Muscular Dystrophy Association. The costs of publi- cation of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertise- ment” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§ Supported by National Research Award HD07165-07, the A. W. Mellon Foundation, and the Baylor Medical Scientist Training Pro- gram.

tl Supported by Grant G-381 which was named in honor of Hazel

Affiliate. Hickey and awarded by the American Heart Association, Texas

‘S. L. Carroll, D. J. Bergsma, and R. J. Schwartz, unpublished data.

on the muscle cell type in which they predominate. The striated muscle isoforms may be coexpressed in a tissue under at least some circumstances (4 ,5) with a-skeletal muscle actin representing the predominant form in adult skeletal muscle and a-cardiac muscle actin prevailing in adult cardiac tissue (6, 7). The smooth muscle actins appear to be similarly coexpressed (8). In the genital and gastrointestinal tracts, y- smooth muscle actin predominates, while in vascular tissue, such as the aorta, a-smooth muscle actin is the primary isotype (7,9, 10).

The isolation of actin genes has revealed that in most species these proteins are encoded by multigene families whose members are differentially regulated temporally (11- 17) and spatially (8, 15, 18) during the ontogeny of the organism. The high degree of sequence conservation between actin proteins from a plethora of organisms argues strongly that this multigene family arose by duplication and subse- quent divergence from a common ancestral gene. In the course of these events, certain regulatory and structural features of the loci presumably diversified to produce the specialized genes presently extant. Analyses of the structure of these genes are thus essential both for reaching an understanding of the processes influencing the evolution of this family and identifying the regulatory elements controlling the expression of individual loci. Toward these ends, several representatives of the vertebrate striated muscle (19-24) and cytoplasmic (3, 25-27) actin gene subfamilies have been structurally charac- terized. There have been, however, no earlier reports concern- ing the structure of smooth muscle actin genes, with the exception of an incomplete description of the human a- smooth muscle actin gene (28). Thus, this report represents the first complete description of a smooth muscle actin gene.

Our laboratory recently isolated 6 of a potential 8-10 actin genes present in the chicken genome, one of which was the a-smooth muscle actin gene (29). We have determined the complete nucleotide sequence of this gene, including its flank- ing regions, and examined the RNA transcripts produced by this gene. Interestingly, this single-copy gene is a template for multiple mRNA species which differ from one another in the lengths of their 3’ untranslated regions, probably as a result of the utilization of alternative polyadenylation sites. To the best of our knowledge, this is the first report of an actin gene which produces multiple transcripts. These data, considered in combination with the exon/intron organization and structure of the 5‘ flanking region of this gene, have important implications both for the regulation of this locus and for the evolutionary history of the actin multigene family,

EXPERIMENTAL PROCEDURES

Materials-Restriction endonucleases were purchased from New England Biolabs, Boeihringer Mannheim, or Amersham Corp.; Esch-

8965

Page 2: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

8966 Chicken a-Smooth Muscle Actin Gene

erichia coli DNA polymerase I large (Klenow) fragment was obtained from Bethesda Research Laboratories or Boehringer Mannheim; avian myeloblastosis virus reverse transcriptase was bought from Life Sciences Associates; other DNA modifying enzymes were purchased from New England Biolabs or Boehringer Mannheim. All enzymes were used according to the manufacturer's recommendations. Human nuclear DNA prepared from peripheral blood leukocytes was a gift from Dr. Anthony G. DiLella (Baylor College of Medicine). ["PI Radionucleotides were obtained from Amersham Cop. or ICN Ra- diochemicals. All other chemicals used were of the highest purity available.

Preparation of Plasmid and Viral DNA-Plasmids were grown in E. coli K12 strain RR1. Plasmid DNA was prepared using an alkaline lysis procedure (30) and subsequent purification on cesium chloride- ethidium bromide gradients (31). M13 clones were propagated in E. coli K12 strain JMlOl and used to prepare single-stranded template DNA as described (32).

All cloning experiments were carried out under P1 physical and EK1 biological containment precautions as outlined in the National Institutes of Health guidelines for recombinant DNA research (7/29/ 80).

Strategy for Sequencing the Chicken a-Smooth Muscle Actin Gene- We have previously described the isolation of the a-smooth muscle actin gene from a XCharon 4A chicken genomic library (29). Two EcoRI fragments (7.7 and 3.3 kbp') encompassing the actin gene coding and flanking sequences were purified from genomic clone XAC3 by preparative agarose gel electrophoresis and cloned into the EcoRI site of pBR322 to produce, respectively, pAC377 and pAC333. Plasmids pAC377 and pAC333 were mapped by single and double restriction enzyme digestions. Appropriate restriction fragments were isolated from low-melting-point agarose gels, cloned into either M13mp18 or mp19 (33), and sequenced by the dideoxy chain termi- nation method (34). Regions which proved refractory to dideoxy sequencing were sequenced by the technique of Maxam and Gilbert (35). DNA sequence data was processed with the aid of an IBM- compatible Sperry microcomputer and the software of Queen and Korn (36).

Preparation of Eukaryotic Nucleic Acids-Nuclei were prepared from frozen adult chicken liver as described by Lawson et al. (37) and used to prepare nuclear DNA as described by Chang et al. (29). Total cytoplasmic RNA was isolated from frozen chicken tissues by the method of Schwartz and Rothblum (38). Polyadenylated RNA was isolated from total cytoplasmic RNA by two passages over oligo(dT)- cellulose (39).

Southern Blot Hybridization Analysis of Nuclear DNA-Purified nuclear DNA was digested with either EcoRI or Hind111 restriction endonucleases and fractionated by electrophoresis in 0.8% agarose gels. DNA was transferred unidirectionally onto nitrocellulose filters (BA85; Schleicher 8z Schuell) as described by Southern (40) using the buffer system of Smith and Summers (41). Blots were baked for 2.5 h at 80 "C. Prehybridization of the Southern blots was for 4 b at 68 "C in 6 X SSC (1 X SSC = 0.15 M NaC1, 0.015 M sodium citrate), 10 X Denhardt's buffer (without bovine serum albumin (BSA)), 50 mM sodium phosphate (pH 7.0), 0.5% SDS. Blots were then hybrid- ized to single-stranded 32P-labeled DNA probes prepared from M13 templates (3) for 16-20 h at 68 "C in hybridization buffer containing 3 X SSC, 5 X Denhardt's buffer (without BSA), 25 mM sodium phosphate (pH 7.0), 0.25% SDS. Blots were washed four times at 68°C in 500 ml of either 2 X SSC, 0.5% SDS or 1 X SSC, 0.5% SDS (for the wash conditions employed for a particular blot, see figure legends), rinsed briefly a t room temperature in 1 X SSC, air dried, and then exposed to Kodak XAR-5 x-ray film at -70 "C with a Cronex Lightning Plus intensifying screen (DuPont Co.).

RNA Blotting and Hybridization-RNA was denatured with glyoxal, fractionated by electrophoresis on 1% agarose gels, and blotted onto Pall nylon transfer membrane (Biodyne A; ICN Radi- ochemicals) essentially as described by Thomas (42, 43). Blots were baked for 2.5 h at 80 "C. RNA blots were prehybridized for 4 h at 68 "C in 6 X SSC, 10 X Denhardt's buffer (without BSA), 50 mM sodium phosphate (pH 7.0), 0.5% SDS. Blots were then hybridized to 32P-labeled single-stranded probes for 16-20 h at 68 "C in hybridiza- tion buffer containing 3 X SSC, 5 x Denhardt's buffer (without BSA), 25 mM sodium phosphate (pH 7.0), 0.25% SDS. Blots were washed four times at 68 "C in 500 ml of 2 X SSC, 0.5% SDS, then twice at 68 "C in 500 ml of 1 X SSC, 0.5% SDS, air dried, and subsequently

The abbreviations used are: kbp, kilobase pairs; bp, base pairs; b, bases; SDS, sodium dodecyl sulfate; BSA, bovine serum albumin.

exposed to Kodak XAR-5 x-ray film as described above. The auto- radiograms were scanned with a Quik-Scan densitometer (Helena Laboratories), and the relative areas under the hybridization band peaks were determined with an Electronic Graphics Calculator (Hu- monics Corp.).

Primer Extension Analysis-An 865-bp fragment containing the first coding exon (exon 11) of the chicken a-smooth muscle actin gene was isolated by preparative agarose gel electrophoresis from an AuaI digestion of pAC377. This fragment was truncated with BstNI, ren- dered blunt ended with DNA polymerase I (Klenow fragment), and cloned into the SmaI site of M13mp19. M13 clone 7Av3Bs-31 was determined by sequence analysis to be properly oriented for the synthesis of a primer complementary to the a-smooth muscle actin mRNA. Single-stranded 7Av3Bs-31 DNA was used to produce a 32P- end-labeled primer as described by Bergsma et al. (3) except that DdeI was utilized as the truncating enzyme. This I l l - b single- stranded fragment contained 36 b of M13 sequence at its 5' end and 75 b complementary to nucleotides 2389-2463 (encoding amino acids 16-38) of the a-smooth muscle actin gene. The primer was coprecip- itated with 200 pg of 10-day chicken embryo poly(A)+ RNA and subsequently hybridized and extended with avian myeloblastosis virus reverse transcriptase as described by Bergsma et al. (3). The extension products were isolated on a denaturing gel (3) and sequenced by the technique of Maxam and Gilbert (35).

RESULTS

Nucleotide Sequence of the Chicken a-Smooth Muscle Actin Gene-We have previously described the isolation of six dif- ferent actin genes from a XCharon 4A chicken genomic library (29). In this earlier study, it was established that two actin genomic clones, XAC3 and XAC17, contained an a-smooth muscle actin gene. The 7.7- and 3.3-kbp EcoRI restriction fragments were gel purified from XAC3 and cloned into the EcoRI site of pBR322 to produce, respectively, subclones pAC377 and pAC333. Plasmids pAC377 and pAC333 were sequenced in their entirety as schematically illustrated in Fig. 1. The complete nucleotide sequence of 11,007 bp, which includes the encoded amino acid sequence and adjacent flank- ing sequences, is shown in Fig. 2.3

Analysis of the Actin-coding Region-The DNA sequences encoding the actin protein were identified by comparing the sequences of pAC377 and pAC333 to the published sequence of the chicken a-skeletal muscle actin gene (19). The exons within pAC377 and pAC333 which were identified in this manner were found to specify a polypeptide composed of 377 amino acid residues. This encoded protein begins with a Met- Cys dipeptide, a feature which has been noted in several invertebrate actin genes (44-46) as well as all of the vertebrate muscle actin genes analyzed to date (20-24, 28, 47). This dipeptide is apparently removed post-translationally from these actin proteins (21,47). Comparison of the remainder of the amino acid sequence (denoted as amino acids 1-374 in Fig. 2) deduced from the chicken a-smooth muscle actin gene to that of the major a-actin species from bovine aorta (7) demonstrates that the chicken and bovine proteins are abso- lutely identical, thereby confirming our initial identification of this gene as an a-smooth muscle actin gene (29).

Comparison of the amino acid sequence deduced from the chicken a-smooth muscle actin gene to the partial amino acid sequence derived from the human a-smooth muscle actin gene (28) indicates that the chicken and human proteins are also identical in equivalent regions with the exception of amino acid residue 309, which is valine in the human protein and alanine in both the chicken and bovine polypeptides. It should be noted, however, that this difference may not be found in the normal human protein since the point mutation causing this substitution may have been introduced by a chemical

This sequence has been submitted to GenBank, Los Alamos National Laboratory.

Page 3: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

Chicken &-Smooth Muscle Actin Gene 8967

A. 6"3'

EcoRl

pAC377 .Ad333

B. " -.- """ " c_" 4""- " " - " - ". c "C -

c " - - - .- ATG

6' 3'

6 Acc I

1000 2000 3000 4000 6000 eo00 7000

PVU II ' stu I

C.

" " " ".-c"

"" " - 4 d Codlng SeQuence -c " 0 Trmn8crlbed Untrmaleted SeQuenCe

1'7 p0R322 Sequence

6" -+-I:-3, ACharon 4 A Sequance

1000 2000 3000 J

Ace I EcoRl t

Hlnd 111 Kpn I Pat I 0

am. I PVU II

spa I

FIG. 1. Physical map and sequencing strategy for the chicken a-smooth muscle actin gene. Vector sequences, protein-coding sequences, and sequences encoding the transcribed untranslated regions of mature mRNAs are as denoted. The extreme 3' end of the 3' untranslated region is denoted as a broken box because the exact location of the largest mRNA's 3' end has not been determined. Thin lines represent introns and flanking sequences. Positions of the CCAAT and TATA homologies, the ATG initiation codon, and the TAA termination codon are indicated. A , physical map of genomic clone XAC3. Plasmids pAC377 and pAC333 were constructed by subcloning the indicated fragments into the EcoRI site of pBR322. E and C, mapping and sequencing of pAC377 and pAC333, respectively. In B and C, arrows indicate the direction and distance of sequencing. Whenever possible, sequences were obtained using the indicated mapped restriction sites. Sequences whose origins do not coincide with indicated major restriction sites were obtained by isolating a larger restriction fragment, digesting it with EstNI. DdeI. or Suu96I. and shotmn cloning the resulting fragments into M13mp19.57.6% of the total sequence was determined from both strands.

-

mutagen (28). The chicken and human genes are somewhat more divergent at the nucleotide level, with the coding regions of the genes being only 85.5% homologous, although again it is possible that some of these differences may be the result of chemical mutagenesis within the human gene.

The protein-coding sequences of the gene are separated into eight exons by seven intervening sequences of highly variable size. The lengths and locations of the introns and exons are summarized in Table I. A comparison of the intron/exon organization of the human and chicken a-smooth muscle actin genes indicates that comparable structural regions are inter- rupted identically by intervening sequences. Unfortunately, only a small amount of the intron sequence of the human a- smooth muscle actin gene is available, and, therefore, we are unable to determine whether there are any significant homol- ogies between the introns of the human and chicken genes. We also noted the presence of a simple repetitive sequence ((CA)2CG(CA)21C) in intron eight (nucleotides 7742-7790). Alternating copolymers of this form are known to be middle repetitive elements which are evolutionarily conserved in a wide variety of eukaryotic genomes, including that of the chicken (48).

The 5' Untranslated and Flunking Sequences-For the purpose of identifying the genomic sequences encoding the

mRNA 5' untranslated region, an end-labeled primer was annealed to a-smooth muscle actin mRNA, extended to the mRNA cap site, and then sequenced. The single-stranded DNA primer used for extension was complementary to a portion of the first coding exon (exon 11) encompassing nu- cleotides 2395-2463 (amino acids 16-38; see Fig. 3, B and C, for a more complete description).

Sequence analysis of the extension product revealed that the genomic sequences encoding the 5' untranslated region of the a-smooth muscle actin gene were interrupted by a single intervening sequence 2255 bp in length (Fig. 3A). The se- quences at the borders of this intron are in agreement with the GT-AG consensus sequence noted at the borders of the introns of several protein-coding genes (49). The extension product sequence is homologous to nucleotides 1-45 and 2301- 2343 of Fig. 2. The location of the first exon has been further confirmed by demonstrating that a small PstI restriction fragment containing nucleotides 1-45 is specifically labeled by the fully extended cDNA, but not by the unextended primer (data not shown). The precise nucleotide corresponding to the mRNA cap site could not be accurately determined due to the presence of a few bands of undegraded cDNA at the top of the sequencing gel (50). Based on our estimation of band spacing, we chose the A nucleotide shown in Fig. 2 as the

Page 4: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

8968 Chicken a-Smooth Muscle Actin Gene CAATICAICCCC~~T~TCAAITTCIACICCIIICACAICCACTIICCACAICCTAAITICICAICTCI~CTACIACIICA

(-1000)

( - 9 2 0 )

( - 8 4 0 )

( -760)

( - 6 8 0 )

( - 6 0 0 )

( -120)

( - 4 4 0 )

(-160)

( - 2 8 0 )

(-200)

( - 1 2 0 )

( 4 0 )

ACGCCAATCTAIICTIACICICAAACCCCICCICAICACACACACICICCCTACACAACACCICCCICCACCCAAAIAAA

ICCACICCICIGAAAAIAGCICAIACAIICA~AACCIIICCIIIACII~CIAAAAAIAICCICACCCCAAACCIACCTAC

ACC~IATCAAAITCACCAACIIIA~IAICAAICITIICACAIACCACIIIACAACIICICICCAICACICCAAITCACAC

IACCATCAACCITCICCACACIICCIACICCIIICCAAAIAAACICAICCACACACCAIAIICAT~CICACCCAITACAC

CCIACCCCCACCA1AACAACCICTIACCACAACCIIIACACACCCIICAAACACCCIACCAICAACCCIAICCAACACCA

GCIACIICIIITACIAICCCCAACICCACACCITIIAACICAATITCTCCCAAAAIICACIACCTCITIACCIICCCCAA

ACTAT~C~CA~ICC~IICCICCAAAICTIIAACAAA~CCAAACTCICTCCIIAAAAACACTTICCCTATTACAAAICACI

CCICTTTCACIITICACICTCCCICIICCAICIICCICIC~ACCCCACCCCCICICTCICIICIITCAACCICICCTCII

CCTCACACACCCTCTCTCICCCACCCAC~CTTITCTICCTGCAIIIIACCAACIICICCAC~CTITAICIIACACACCIC

AAACICICCICCTCTTTCATCACCTCICCCITCCAATCCACICCAACCCACICACCCCCICTCGACCCACAIIACACCII

TI~CTAATAACCTCCCTAIATC~IIIICIIACACACTTCCCCTCICICICTCICAICICICCTCCITC~CACGCTC

C T C C C A C C A C A A C A C C T C A A ~ C C C C I ~ C C C I C C ~ C C I I I ~ ~ C A I A C A C A ~ C

CAGCCAAGCACICICACC I CTAACTGCCACICAACCAATACICCCAIIIAIACTTTICICCAICACIITAATIAAGTA ( 1 )

ATCTCACAICCAACC~AITCACCACCAICIACICCIAICCI~CAC~IICCITACCCAIIACTIACTACAACICAATICCI

AAAATACTITCAATCICIACACICACTICIAIIICIIIIAAACCACIIITCAAICCCAAAIACCTCICAICAIITTCCCC

- ( 1 2 0 )

(200)

(280)

(]bo)

( 4 1 0 )

0 2 0 )

1600)

(6110)

(760)

(1140)

( 9 2 0 )

(1000)

(1080)

( 1 1 6 0 )

( 1 2 4 0 )

( 1 3 2 0 1

( 1 4 0 0 )

AITCCACCAACACICCAACCCIAAIATAAACACACACACICTIIAAICCCACACCICCAAIITAACACAACCIGICICCC

C C ~ C I C C A C IIACCITICCACA~AACA~A~IICCI~AAI~~~T~AAIII~CACACI~ICTACT~CI~ICICACCACCIII

CCTICACTCCICTCAAAAICACCIIAAAIICACCIAACICIIIIC~ACICICACICCAACAACACCICCAACAICCAAAA

IACCACTAICIAAICACAICCAAICAC~AICCAICTCIAITCAIICCICICICCAIACA?AIGAAACCICICCICICCAA

AAC~CCCAAIAT~IIAIIAAA~AICACAITAIACACACACT~CCITCICACCCIGCACITCITC~CCTCATACCAICCTC

IAGACCCICCCCAACICAIICCITCTCIIICACICTAAACCACCIACAACIAACACCCIAAAIACTCTAITAATTCCICC

CCTCAATAICTCCTT~AIIC~~CA~ICI~A~ICACTIIICCI~CI~~ACCAICIIACIACICCAI~CCAICCCACTCCTT

CACCICTAACICACTCCITCCAACACACICTCCTCCCICAATCAIIIACCIIICAIITIIACCIIIIICICCICIAII~C

TAAAIAICCIIIICAT~ACACICCICCAACCIACAAAICCACCCIIIICCACCICCCICCICICCCCICCCCCAACICAI

I T ~ I C C C A ~ T ~ C A T I C T C T C C A I I ~ ~ I I I ~ A ~ C A A A C C C C C ~ ~ A C C I C ~ A A C I C T ~ I ~ C C A A A A A C A C A C C C T T C C A A A C

CAIAITTCCIAAIIACCAAA~~~II~CICIAAACCACICI~~ICATICAI~IIA~AIAACAATI~~ACICCAIAC~CTAA

AI~CTTAAA~A~AAACACCCICTTTICCCAAAACIIIAACAAACICCCAAAAAITCCAACCIACTTTCCTTTICTCCI~~

IAAICACIIAAIATCICCACIACAICAACCTCCCAT~ICCCICICCATCCCTICICCTCCCACCTACTCIAICCAICCAC

AACICCACCCICACAACCACTCCACACCICCCICCICCICCCICICAACCCICICCACICACACCCCIIICCCCCACCCI

CTCCCICCIACCICCAICIAICCICCACCCICAACCCIICCCICAICCICCCAICCTCAAICACCCAICCCICICACIGA

CCCCCACTCCCCCTTCCACGICCACCCCACCACCCCCAICCCACCACCIIIICAICCTTCCCGACCACCCACCTCCGCTC

CICCATCACACCCICTCTCCCIIICCCICIIAACACAIICCAAITTCIICACCICICCAICCAACTCCACCCICCCACCT (1480)

(ISbO)

( 1 6 4 0 )

( 1 7 2 0 )

(1800)

( 1 8 8 0 )

ACICCACICATTCCCTAACACACIAIAAAIICICCCCAAAIACIICICCICCACTCCICCIAITCCCCCCIICTITCCCT

AAT~CACII~AA~ACAAACCC~ITAAACCAIIITTATTACCCTT~ITATCTTCCTCIICCCTCACCACAAAAACAAITIC

TCTTI~ACICAACCACCCAGCCA~CAIAAAIIACI~I~ICAI~C~ACAAAI~CA~CIIATI~CC~~CTTTCAAATCAICA

TCCACCACACACTATCCACACIIICAAAACACAICCICICCTICAITCCAIITIAAACTCACCATAICAICITICIACCI

CCACCACICTICTCACCAICAIACACCCACCCCACAICAIACICAATCICAICCAAACACACCTTICTIITICCACCICT

CACICCACCACTCTICTI~AICICCCACCIACCCCICACICCTCCATTICCCTCCCCCAAITTACAIAAACAIICGCTGI ( 1 9 6 0 )

(2010)

( 2 1 2 0 )

( 2 2 0 0 )

( 2 2 8 0 ) I S 10 IS

CTCAAACCICTAATCATCTCI~IIICCAICCACCACCACTCCAAIACIICCATCCACIACIITCAAIIAICICTCCIGCA

IACACCCTTCCICCCIATCACITCCITITAIACCCCATICICICICITACACCCICCTICICCTACCTCACIIICITCII

TTTIIIICAATTAIICITITCITCCTCITICCAIAC 1 CGCTCTTTTCTCCTICCACCACCICACCACATTCACATAG~

AC AIC TCT G A C G A G G A G GAC ACC ACT CCC CTT CTT TGT CAC AAI GCC TCA CCC CTC IC1 n e t Cy, Clu G l u G L u A.p S e r I h r Ala Lou V a l Cy. Amp Asn C I Y S c r C L Y Leu CY, -

20 2 s 30 3s AAA C C I CCC TIT CCC CCC CAI CAT CCT CCA ACA CCA GI1 T I C CCT TCC ATC CTC CCT CUT Ly. AI. C1y Phe AI. G I y A.p Asp Ala P r o h r a Ala V a l Phe Pro S e t Ilc V a l C L Y A r 8

CCC ACC CAC CAC I CTCATCAAATCCTIAACIIATTCCAI~CICCCCIICCACACAICCCCCAIICCCAACTC~CC P r o Ara H i . C l n AICCCCGACCIACCTIITCCICCAICICACCCTAACCCACCCCIACAIACTCICICIAAICICCIAIIIATACCCAAAAI

CTCTCCACCACICCACICACCTCACICAAACCACATCI~IACCCTICTCAAACCAAICCICAACTTTTCCACITCCIICT

AICCTCICTICICCCTCTACITTCCCTCACCCAC~CAACITCCICACIACACACACTTIAAAACCAAACAATCCCACACA

ACIATCIGCACITTTICACTCCCCIAAAACTCAICATACAICAICTCCACCCAACIAACCCACCAACCACAACCTG~C~C

ICIGTITCCC~TCITCTTCTIIICACICTCICACACACAG~CACCTCTCICCCCTIICI~CCCCTCCIIIIIITCCTTTI

40

( 2 S 5 0 )

( 1 6 3 0 )

( 2 1 1 0 )

( 2 1 9 0 )

( 2 8 1 0 )

( 2 9 5 0 )

0030)

( 3 1 1 0 )

CCCITTITITTTIAATITTIIIITIACCICCAACCIIIAC~TIICAACATAICATAIATICICAICCIIIIIAACACTTI

CCAAACICTAIIIAITACCICACTICITAAATACAAGCIICCICACCCAIAICAACATTACIGACATCIACACCACCTTA

CCTACCCCAIAACAAACI~ACCICCCICCTCCCTIIIAAA~ICCACCTAAIIICIICICTCIC~CCICACCACCICTTCA

AAICACCACACCICCACICCCTCIAAGCTGTCCCACIIA~TCCACCCAC~T~AIAACAAAACAACCCATICACCCCAACI

CAAAIATCTCCCAAAACCIA~ACCCCCA~~ACCICIAAII~AA~CCIC~~AACAI~ACCAIAAIIIAAAAAIAIAIAIAI

TCAACICTCATCIICCAT~CACAC~ACCAAIIACAIIAAI~~CICA~AAA~AAAICAIA~~CICACACTACAAAAAIAIA

C I C C A C A I T C I T A A A A I C T C I I I ~ I I C C ~ ~ A I C A I ~ ~ A I I A I ~ ~ A A ~ A C A I I A C A C C I C I ~ A A ~ ~ A A C ~ ~ I C A I C C C A I C

ATAAACTCAAACTCTCCACCTCCII~IIICCAACCIACAACIICIIACACAACCCATCTIIICCACCAIAAAATCACIIA

( 3 1 9 0 )

( 3 2 1 0 )

( 1 1 5 0 )

( 1 4 1 0 )

(Islo)

( 3 5 9 0 )

0 6 7 0 )

( 1 7 S O )

(11110)

( 3 9 1 0 )

(3990)

(4070)

AGTTC~CCTAACACTCTTATCIACTCAAACCATCCTACACCATACIAITTIIIAACACIAIIICIAACIIAAACCAIACA

CACTACTCTATAICTGCA~TACTITCCAACTICIACCCCAI~CACITCCAAAATCTCAAIICICIAITCCTCCATCACCC

CCCTGGTTACATCCCACAATCTCCCCAICICCTITCCTAICCITCACCACACACCICATCCCIAACTAACAAACAACAII

AAAACAACCCTATCATTCTACTICAICITCAICACIIICACATTCIACTICTICTCTCICCAICCAACTGAATAICATAC

AAAAAAAITCIATTTCITTGCIGTIIACIIAACICIACTCCAAAAAT~AITAAITCICC~CICAIIACIAIIIAIAAACT

GTTTTCTAATTACICTCACICCCAACICAAIIIAICICACCTCCTCTAACIACIIICICACACCICICCITICAAACCIA

A C I C T A T T T C T C T I A I I T T ~ I A C I CCT CTC ATC C I T CCI AIC CCI CAA A A A GAC ACC TAT CIA c l y V a l Wet V a l Cly Wet C l y C l n Lys' A S P S e c I y r V a l

CGI CAI CAC CCT CAA ACC &AC ACA CGA ATC CTC ACC I T C AAA TAC CCC ATA C A A CAT CCC Cly Asp Clu Ala CLn Ser L y s A r g C L y I l e L e u I h r L e u Ly. T y r Pro I I e Clu Hl. C l Y

A I C ATT ACC &AC ICG CAT CAC ATC GAG AAC I CIACCTCACTCTICTAATAAICIGCATCCCTGCIACA I l e I 1 c I h r Aan Trp A 1 p Asp Wct Clu Ly.

4s so

S I 6 0 6 1 70

7) ea

CAATICCIT~AACCGAATCTITAACACATIACTICCTACAIIIIAC~ICAAACICTICICCICTCACCICTGCIITAAIC

~CAACCTTCITACCACCAACCICCAATIAACICIAICCACCICICICTCCTATICCAACATTAIAIATAIITICAITAAT

~TCTG~ACCCIITCAACAA~ICCTIIAICAITAACACTCAIAIIICTAACCICACCAIACACICTTICCITCICCACTIA

( b 2 6 0 )

(4340)

(4420)

( 4 1 0 0 )

( 4 S 8 0 )

CCICCGTIAIIICTACCAACCCAATAACAACAAIICCAICTCAAACCTIICTCACCCCTCICTAAACICCIATAACCACA

CTGC~CCACCATIAAACCACTCI~AAACICAIICICACCICCIACITCAICCTAIIIITCICCIIICCCIICTGCCATTC

8 5 90 9s 100 TCCTCAC I ATC IGC C A I CAC TCC I T C TAT AAT CAS C T C C G I GTT CCA C C I GAC GAG C I C

Ilc Trp H I , H I S S c r P h e T y r Asn G l u L e u Ira V a l A l a P1o CIu Glu HI. 10s

CCA ACT CTC CTC ACT C A A CCA CCA CTC AAT CCC A A A CCC A I 1 CCA CAC AAA ATC ACC CAC

I C~AAICTCACCCTGAAGCIAT~~AAIAAIICCAACI~ICT~CAIICA~CA~~AICICAACAICATCATTICACITACI Pro T h r Leu Leu Thr Glu AI. Pro L e u Asn Pro L Y S Ala A.n Arg C1u L Y s Wet Thr CLn

C A I C A C A A C I T C A T I C I T C T ~ C A ~ I ~ I C ~ A I C I C I ~ A ~ I A A C ~ C A A C A I A ~ A A A C T C C A C A C I C A C T C I C I ~ I C ~ C A T T T

CTATCIIACIACATITAIIICCAAAACICICICIATCACITIICCATCICACAACICICCICTCCCCCAACICITCTCAC

TIICATCI~ICTGICICT~I~ICICICICCAC~CCIACIACAACAICAICTIIAATCICICICIATIAAIACAITCIACA (4940)

ICTAGAI~~~ATACCGAC~CCICA~AAACCAC~AA~~T~ICAIIIAAA~~IAIIIICCIICCIICIIAIIACCAACCAIC 0 0 2 0 )

ACTAAAIACAAAAAICTT~IAII~ICCCII~CCIIICCI~IAA~CAIACAAC~AAACAAAAAICCCIACTCCIAACCIAA (1100)

(SI(l0) 1 1 s I 1 0 I 1 5

110 1 1 s I 1 0

0 7 8 0 )

( 4 8 6 0 )

I I C I C T I A A T I C C A I I I C A C I AI1 ATC I11 GAG A C T T I C M I CIC CCA CCC *IC TAT CIA CCT llt Wet Phe Clu Thr Q h e Amn V a l Pro AI. M e t T Y r V a l AI.

I." I"

AI1 CAA CCT CIT CIC TCC CTG TAT C C I I C 1 CCC C C I ACT ACA C I C I A C C C C T C T C C A G C I C C C 11. Cln AI. V a l Leu Scr L e u T y r AI. Ser CIy AI# T h t I h r C

ACICCICIICAAAACACITCICICICACCACCICACCCICAAICAICICIIAICIICTCAAAC I CC AIT G T C C I T IS0

._I .-_

Iy 11. V.1 L." 11s LbO 165 110

CAC I C T CCG CAI C C 1 C I C ACC CAC AAC CIC CCC &TI TAT CAA C C C I A C C C I TIC CCA CAT Amp 5.r CI, Asp C l y V a l Thr HI. A** V a l P r o 11. I y r Clu C l y T y r AI. L e u ?ro HI.

CCC ATC AIC CCT C I C CAC C 7 C C C I CCC CCT C I C CTG ACA CAC TAC C I C ATC AAC ATC C I G AI. I l C Wet Ar8 Leu Asp Leu AI. G I y A r g Asp LEU Thr A.p I y r L.u N e t Ly. 11. L e u

ACC C A I C C I CCC TAC TCC T I T CIC ACC ACT C / CIAACCCCAICCICCATACCICCACCACCCCACIT

17) 180 18) 190

19s 200

CITTAATCICCAATCTTICIACICCCCIIACAACCAAIACCACICCIICICCACIICIIIICAIIAACAACAAACAAAIC Thr Glu Ara C I y I y r Ser Ph. V a l I h r I h r A

AACACCAAAAIACITTCTCTCCACCAIICCCACACACCAICTCICIAAAAICCACACCAICTICCACAAICAAAAICICC

CTCACICCCAITCIAAAAACICA~ICTICICAACAC 1 C C CAG C C I CAA AI1 C I C C C T GAC A I C LAC

0 6 4 0 )

( 1 1 2 0 ) 2 0 s 2 1 0

(11100) La Glu Arn Clu 11. V a l A r l A m p 11. L Y * 2 1 5 220

GAG AAA C I C IC1 TAT CTC GCC C I C CAC I11 CAA AAC GAG ATC CCC ACT C C I CCC I C 1 TCC Clu LIS Leu Cy. I y r Val AI* I.ru A s p Ph. Clu Asn Clu X l t A I . T h r AI* AI. Ser 5.1

I C C T C I C I C C I A A A A ACC TAT CAC C I I C C I CAI CCC CAC C I C ATC ACC AT1 CCA AAT CAA 5 - 1 Slr L e u Clu Ly. Ser I y r Clu Leu 110 A.p C l y C l n V a l LIe I h r 11. C I y AIn Clu

CCC T I C C G C I C C CCA CAC ACT CTC TIC CAC CCA I C 1 I T C AI1 C 1 C I A C C I I C A A I C A C I A T I C 260 2 6 1

Ar, Ph. Acg C y , Pro Clu Thr Leu Ph. Gln Pro S*r Ph. 11. G

2 2 s 210

214. 2 4 0 1 0 1 s o

2ss

TCCTIICCTAACCICCTCCTAAAIACCACCTCICIAIIICCITCICAIICTAICCCACICCICCCICATIIACAICIAIA

CCCAAGIAICICCCAC~CCCIAIIAIAAAAIAAIC~IICACAACCCA~CACTAIIAAIACACICCCACCAACIIIAICTC

AAACACIAAACCAAAA~CCITICCCAAAAIAAACICC~ACI~IITAIIICI~CCIIICACCCCAICACAT~CICIAIAII

AT~~AT+A~~IACIATCAAAATICIC~CICCACAAACACIIIICTAAAICCTAICACCCIIATACICTCAAICICCIICC

AGCCACGCCIIAAAAAACAACCCAACAAAAA~CAC~ACIITIAI~CI~CI~AAACAACICCCCCAAAICICAAICCACAC

ICCATCACATITCCACA~CACAICAAI~CICAICCACCCAACCIICCACCAACCICACCIAAAACCIACIIICATCCAAI

ATCACCICAIAAATACAICAACIA~CI~CAIACIIA~IIAAACCTAITCC~ACAIAIIIICACIAIII~IIICICIAACC

TCCCICACIITAAAAACCCACAAAAAAAAIA~ACC~AAAAAAAAAAACIAACAACAACAAAACAAAICAITCCACCACCA

TATACCAA~TCTICCCACAIAICACCIICICICCAAAACCCIACCCCAAACICCIACCIACICCAIACCCCICCAITITC

( 6 0 5 0 )

( 6 1 1 0 )

( 6 1 1 0 )

( 6 2 9 0 1

( 6 1 1 0 )

( 6 b 5 0 )

(6SIO)

( 6 b 1 0 )

( 6 6 9 0 )

FIG. 2. Nucleotide sequence of the chicken a-smooth muscle actin gene. Numbers in parentheses below the sequence indicate nucleotide sequences positions relative to the mRNA cap site which is designated (1). Encoded amino acids are indicated below their respective codons and are numbered above the sequence beginning with the third amino acid residue so as to agree with the numbering of the sequence of the mature protein (7). Genomic sequences encoding mRNA-untranslated regions are underscored; the underscored 3' untranslated region sequence extends to the 3' end of the region in which the largest detected mRNA's 3' terminus is located. The CCAAT box, TATA box, and potential polyadenylation signals are underscored twice.

mRNA cap site. It has been observed that most, but not all, mRNA cap site revealed the presence of the sequence eukaryotic gene transcripts are initiated with an A residue TATATAA at positions -27 to -21. This heptamer is similar flanked by pyrimidines (49). The mRNA cap site of this gene both in sequence and location to the "TATA" homology which is an A residue flanked by a pyrimidine and a purine and, is frequently found 25-30 bp upstream from eukaryotic therefore, is not wholly inconsistent with these observations. mRNA transcription start sites and is believed to be necessary

Inspection of the sequences immediately upstream from the for accurate initiation of transcription by RNA polymerase I1

Page 5: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

Chicken a-Smooth Muscle Actin Gene 8969

~ G T C C ~ T T C T A C A T A T T T I C ~ C I C ~ A C C C I C C C I C T I C T A C I A A A T C A I C C C C I C C I A A C I C C A C C A C I C A C C C T C C C I C ( 1 1 8 0 )

( 1 2 6 0 )

( 1 1 4 0 )

( 1 4 2 0 )

( 1 5 0 0 )

( ? S U O )

( 1 6 6 0 )

(7110)

( 1 8 2 0 )

( 1 9 0 0 )

(79110)

CTAAAACA~CACCGCCCCTTCTCICCICTCACTTCCTTICTCATCCTCTCTTCIGCAAACTCACC7ACC~ACCCCAAACA

A C A A T A C C A T C C A A T C A C C A A C C I T C A T T T T I C I A C A C C T C C A G I I T C I I C A C C A A I T T A I C C C T ~ A I C C C A I T C A C C T C

T A G C A I A T C C A C C T T C T G T C C I T C C C I C A C A A C I I A A C A A A A A G A T I C I A ~ C A A C I T A I T C A T C C T I C I C C I C C I C C ~ T C

T C T C C C C T T C C T T G C T T T C C A A A C A I I T I C I C A C C I C I A C I C A T A C C A A C ~ I I C C A I C C C I C A I I C A I T C I I C C A A C C C C

C C C C C C C C T A E C ~ T T C ~ A T C I I T C I C I C A T C C I A T C A C A C I C C C I C C C I C I C C I I C C T C I C I T I I C T C C T T A T C A A T C C A

C A C G C ~ C A C A C A C ~ C A C A C A C A C A C A C A C A C A C A C A C A C A C A C A C A C C T I C T T C T C I I C C C C A C A C T C A T T C A A C C A C ~ C

C A ~ C T G A C C C T C T A G C C T C A C C C I C A C C C A A C A C C T T T C T A A C I I C A T C C C T I C C T C C A A I C G C I ~ A C T C C A A C A C A C T A

C T G C ~ C A T G T A C A C I C I T T T ~ A C T A A T A C A A A C T A I C A C T C C I I I A A T A C T T C I C I A I A A C A A A I T C C I A C C T T A A C ~ T T

C C A T T T A A T G C I T I T A T T T C T T A A A C A C 1 ATC A T I CCT C C T CCT G A A CCT A A C T A G TCC crc TCC 110 335

360 l l e [LC A l a Pro P r o G l u Arg Lys I y c S L r V a l T r p

ATT GGA CGC IC1 A T 1 CII CCC TCC CIC TCC ACC TIC C&C CAC A I C TGC AIC ACC A A A CAG I l r C1y CIy Ser t l c Leu Ala Ser Leu S C I Thc Phe G l n Cln Met T r p 11. SII LIB C l n

1 1 5 310 355

C A A TAT CAT CAC CCT CCC CCA TCC ATC CTT CAC CCC A A A ICC TIT I A A AICCCITTATACITC Clu T y r Amp Clu A l a Cly Pro Ser 11- V a l HLI Arg Lym C y . P h r End

~ A T c T ~ T T T I I A C C A T A I I A C T C I C A A I C I A T I C A C C A A A A I A C A T T C I I A A A A I T T C A T I C C A I ~ A A I C I T I C A I C C T ~ l S l I 0 ,

(49,51). In addition to the TATA homology, a second consen-

sus sequence best described4 as :G:CAAT has been identi- A

fied as a moderately conserved element found approximately 70-80 bp upstream from the transcription start site of several protein-coding genes (49, 51). Examination of the equivalent region of the a-smooth muscle actin gene showed that while there were no elements similar to that "CCAAT" consensus in its usual orientation, the sequence CCAAA (-60 to -64) was present in an orientation inverted relative to that in which CCAAT boxes are normally found. A comparison of the 5' flanking regions of the chicken a-skeletal' (19) and a- cardiac (22, 23) actin genes to that of the a-smooth muscle actin gene further revealed that the inverted CCAAT of the a-smooth muscle actin gene forms a portion of a 16-bp ele- ment which is conserved between these three genes (Fig. 4A), although the smooth msucle actin gene motif is inverted

' M.-J. Tsai, personal communication. D. J. Bergsma, S. L. Carroll, and R. J. Schwartz, manuscript in

preparation.

relative to those of the striated muscle actin genes. Upstream from the 16-bp oligonucleotide, between nucleotide positions -150 to -135, is a repeat of this element having 50% homology in a reverse orientation (Fig. 4B). Similar upstream repeats are found, although in an inverse orientation, in the 5' flank- ing regions of the a-skeletal and a-cardiac muscle actin genes (Fig. 4, C and D). Besides these 16-bp repeats and the TATA motif, no significant homologies were found between the 5' flanking regions of the chicken a-smooth muscle actin gene and other chicken actin genes examined to date.

Southern Blot Hybridization Analysis of Total Cellular DNA-In order to facilitate a complete definition of the structure of the a-smooth muscle actin gene, it was necessary to construct probes which would hybridize specifically to both the gene and the mRNA transcripts derived from it. Such probes cannot be derived from the protein-coding sequences of an actin gene due to the high degree of conservation of these sequences among the genes encoding the different actin isoforms. Several studies indicate, however, that a high degree of sequence conservation is not found between the 5' and 3' untranslated regions of mRNAs encoding different actin iso- forms (22, 23, 27, 52-54). We, therefore, decided to construct a-smooth muscle actin gene sequence specific probes by sub- cloning portions of the genomic sequences encoding the 5' and 3' untranslated regions of the a-smooth muscle actin mRNA.

In order to verify the specificity of our 5' and 3' untrans- lated region probes and determine the number of loci homol- ogous to the XAC3 actin gene, a Southern blot hybridization analysis of chicken nuclear DNA was performed. Chicken liver genomic DNA was digested with either EcoRI or HindIII restriction endonucleases, fractionated by electrophoresis on agarose gels, and then unidirectionally blotted onto nitrocel- lulose. Blots were hybridized with either a general actin probe (Fig. 5D), the 3' untranslated region probe (Fig. 5E), or the 5' untranslated region probe (Fig. 5F). None of the restriction enzymes used to digest the genomic DNA had restriction sites within the regions of the genomic DNA which were comple- mentary to the 5' and 3' untranslated region probes. Both the 5' (Fig. 5B) and 3' (Fig. 5C) untranslated region probes hybridized to only a single band/lane (8.6 and 3.3 kbp EcoRI fragments, respectively). The EcoRI and HindIII fragments recognized by both untranslated region probes are consistent with the sizes predicted by the sequence analysis of this gene. The general actin probe detects EcoRI bands migrating at the same positions as those detected by the untranslated region probes (Fig. 5A), which is consistent with the proposal that both our 5' and 3' untranslated region probes are hybridizing specifically to the a-smooth muscle actin gene. Since two distinct untranslated region probes detect only single bands in genomic DNA digested with either of two different en- zymes, we also conclude that this gene represents a unique member of the chicken actin multigene family.

The 3' Untranslated Region-Northern blot analyses were utilized for the identification of genomic sequences encoding the 3' untranslated region of the a-smooth muscle actin mRNA. For the purposes of these experiments, it was neces- sary that we first identify an RNA population rich in the desired mRNA. Since a-smooth muscle actin is found in vascular smooth muscle cells (9), aorta RNA would appear to be an obvious choice for these experiments. Unfortunately, aorta tissue is difficult to obtain in the quantities needed to prepare poly(A)+ RNA for both these experiments and the primer extension analyses. We, therefore, investigated the possibility that poly(A)+ RNA prepared from total 10-day-old chicken embryos could be utilized since chicken embryos are

Page 6: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

8970 Chicken a-Smooth Muscle Actin Gene

TABLE I

Exon/intron Organization of the chicken a-smooth muscle actin gene

Nucleotides Size Encoded region of mRNA

Exon I Intron 1 Exon I1 Intron 2 Exon 111 Intron 3 Exon IV

Exon V Intron 4

Exon VI Intron 5

Exon VI1 Intron 6

Intron 7 Exon VI11 Intron 8 Exon IX

1-45 46-2300

2301-2472 2473-4077 4078-4206 4207-4650 4651-4761 4762-5259 5260-5344 5345-5426 5427-5588 5589-5819 5820-6011 6012-6980 6981-7162 7163-7991

Begins at 7992

nucleotides 45

2255 172

1605 129 444 111 498 85 82

162 231 192 969 182 829

Variable

45b 5' untranslated sequence

43b 5' untranslated sequence to AA41"

AA42-AA84

AA85-AA121

AA122-AA150

AA150-AA204

AA204-AA267

AA267-AA327

AA328-AA374 and 3' untranslated sequences a AA, amino acid.

available in large quantities and the major vessels of chicken embryos at this stage of development are known to be both well differentiated and abundant (55). Aorta and 10-day chicken embryo poly(A)+ RNAs were denatured with glyoxal, fractionated by electrophoresis on agarose gels, and blotted onto Pall nylon filters. The blot was then hybridized with the a-smooth muscle actin 3' untranslated region probe described above. Four distinct size classes of mRNA species were de- tected with estimated sizes of 1370, 1900, 2000, and 2700 b (excluding poly(A) tails; Fig. 6B). Three of these species (1370, 1900, and 2700 b) were present in approximately equal abun- dance, while the fourth species (the 2000-b RNA) was present at a level lower than that of each of the other messages. A comparison of the positions of the bands detected with the 3' untranslated region probe to that of the actin messengers observed when an identical blot is hybridized to the general actin probe (Fig. 6A) demonstrates that the species hybridiz- ing to the 3' untranslated region probe comigrate with mRNAs detected by the coding probe. This observation, con- sisted in light of our previous demonstration of the specificity of the 3' untranslated region probe, argues strongly that the mRNAs detected with this probe are transcribed from the a- smooth muscle actin gene. A comparison of the lanes in Fig. 6B also reveals that the embryonic tissue contains only slightly less a-smooth muscle actin mRNA than the aorta, thereby demonstrating the value of the 10-day embryo as a source of this RNA.

Although the initial RNA blot demonstrated that the a- smooth muscle actin gene produces multiple mRNAs, this experiment gave no clue as to how this variability is produced. Since it was conceivable that multiple messengers are ob- tained via the utilization of alternative promoters located further 3' in the same gene, it was necessary to first determine if the transcription of each species is initiated within the same region by probing a RNA blot of 10-day chicken embryo poly(A)+ RNA with the 5' untranslated region probe. The same four a-smooth muscle actin mRNAs were detected with this probe (Fig. 6C). Since the primer extension analysis demonstrated the presence of only a single extension product terminating within the sequences corresponding to the first exon of the gene, the result of this RNA blot analysis is consistent with the hypothesis that all of the detected a- smooth muscle actin mRNAs share at least some common 5' end sequences and are products of a single transcription unit.

Alternatively, the differences in mRNA size could be due to inequalities in the lengths of their 3' untranslated regions. We, therefore, decided to map the potential polyadenylation site or sites of the mRNAs by hybridizing a set of identical Northern blots of 10-day chicken embryo poly(A)+ RNA to a series of single-stranded 32P-labeled probes derived from ge- nomic sequences located progressively further 3' from the TAA termination codon (Fig. 6E). Interestingly, four distinct regions were identified in which mRNA colinearity with ge- nomic sequences was lost (Fig. 6D). These sites map to positions between the two DraI sites (nucleotides 813443274), the SpeI and DdeI sites (nucleotides 849843767), the DdeI and AccI sites (nucleotides 8768-9048), and the PuuII and the last (3' most) BglII site (nucleotides 9330-9507) denoted in Fig. 6E. Potential polyadenylation signals (AATAAA (56) or the variant AATTAA (57)) were found in each of these regions (Figs. 2 and 6E). If these polyadenylation sites are utilized in uivo, mRNAs of approximately 1370, 1850, 1970, and 2550 b (excluding poly(A) tails) should be produced. These predicted sizes are consistent with the estimated sizes of the four mRNAs. Therefore, these experiments demonstrate that the four observed a-smooth muscle actin mRNAs differ primarily in the lengths of their 3' untranslated regions and further suggest that these variations result from the utilization of alternative polyadenylation sites.

Identification of Evolutionarily Conserved Sequences-Al- though a high degree of sequence homology is not found between the untranslated regions of vertebrate actin mRNAs encoding different isoforms, several investigators have noted that 5' and 3' untranslated regions are frequently evolution- arily conserved among mRNAs encoding the same actin iso- form (22-24,27,52-54,58). Additionally, a comparison of the nucleotide sequences of several vertebrate actin genes has revealed the presence of extensive homologies between the 5' flanking sequences of actin genes which direct the synthesis of the same protein (24, 26, 27, 59). Since these regions have not been analyzed previously for any smooth muscle actin gene, it was not possible to directly compare nucleotide se- quences in order to determine if these elements were similarly conserved in the a-smooth muscle actin gene. As an alterna- tive approach, Southern blots of human and chicken nuclear DNA were hybridized to a series of probes encompassing the 5' flanking sequences and the 5' and 3' untranslated regions of the a-smooth muscle actin gene. Screening with probes

Page 7: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

Chicken a-Smooth Muscle Actin Gene 8971

A.

B.

C.

L

E G C

2 G A T C + + ._

U lOObp

M13 17b Sequmclng

mRNA Prlmer ......... CAP Slte AT0

6' Prlmer K-Smooth Actin mRNA

3' Prlmer K-Smooth Actin mRNA Extenslon

D Codlng Soquence 0 6' Untrrnrlated Sequence

Primer Sequence M13 Sequence

- FIG. 3. P r i m e r extension analysis of t h e chicken a-smooth

muscle actin gene. A, sequence of the 5' primer extension product of the chicken a-smooth muscle actin mRNA. The autoradiogram of the sequencing gel is shown to the left, with the nucleotide sequence indicated to the right. Sequence was determined by the method of Maxam and Gilbert (35). The unextended primer is shown in the far left lane of the sequencing gel. The splice junction, start of extension, and terminal sequence of the primer are indicated. Nucleotides in- ferred from the genomic sequence are in parentheses. R, derivation of M13 clone 7Av3Bs-31, which was used as the template for the synthesis of the end-labeled single-stranded primer as described under "Experimental Procedures." The positions of the ATG initiation codon, the codon for the 43rd amino acid (residue 41 in Fig. 2), and the M13 sequencing primer site are indicated. C, the end-labeled single-stranded primer was annealed to a-smooth muscle actin mRNA and extended with avian myeloblastosis virus reverse tran- scriptase to produce a cDNA copy of the 5' end of the message.

derived from the 5' end of the gene indicated that only probe 7P6-6 hybridized appreciably to the lanes containing human DNA (Fig. 7B, blot 2). This probe recognized a single band in both the human and chicken lanes (5.6 and 8.6 kbp, respec- tively). Unfortunately, the size of the genomic EcoRI fragment containing the 5' end of the human a-smooth muscle actin gene is unknown and so it cannot be definitively stated that

A' (-92) L A C C C A A A T A T fl C 5_8 ( -77)

(-58) C T C C C A A A C A A 6 G A G C (-73)

(-93) L G G U-UJJJ A fl A U (-78)

. . . . . . . . . . .

. . . . . . . . . . . .

B. (-58) C T C C C A A A C A A G G A G C (-73)

(-150) G A C C C A G A T 1 A G A G t T (-135) . . . . . . . .

C. (-122) U C C C A A A G A A f l C S J , (-137)

(-150) G A C C C A G A T 1 A 6 A G G T (-135)

(-198) U G t C A A A T A G f l A L 1 (-213)

. . . . . . . . . . . . . . .

D. 5' 3 ' - CCAAA TATAA

___tf__rc CCAAA ATAAA - , 6Obp , CCAAA TATAA

oc-Skeletd Murcle

rc-Smooth Muacle

4-Cardlac Murcle

&-Smooth Muacle

o<-Smooth Murcle

o<-Skeletal Murcle

O(-Smooth Muacle

oC-Cardlac Muacle

a-Smooth Murcle

oc-Cardlac Murcle

FIG. 4. Inverted repetitive elements associated with the CCAAT homologies of the chicken muscle actin genes. A, a comparison of the sequences surrounding the CCAAT boxes of the chicken a-skeletal,' a-cardiac (22, 23). and a-smooth muscle actin genes. Nucleotide positions relative to the mRNA cap site are indi- cated in parentheses. Asterisks indicate nucleotides which are homol- ogous between the a-smooth muscle actin motif and that of the other two sequences. Underlined nucleotides are identical in the a-skeletal and a-cardiac muscle actin gene elements. B, a comparison of the sequence surrounding the CCAAT homology of the a-smooth muscle actin gene to a moderately conserved inverted repeat of the same element which is located further 5' in this gene. Nucleotide positions and homologies are as indicated in A. C, a comparison of the inverted upstream repeats of the chicken a-skeletal, a-cardiac, and a-smooth muscle actin genes. Symbols are as in A. D, a schematic diagram of the positions and orientations of the CCAAT-associated inverted repeats of the chicken muscle actin genes. The genomic sequences encoding the mRNA 5' untranslated region are represented by open boxes. The positions of the CCAAT and TATAA homologies are indicated. The arrows above representative upstream sequences in- dicate the 5' to 3' orientation of the repeats.

this probe is hybridizing to the human gene although it is encouraging that the fragment detected with this probe comi- grates with a fragment detected with a general actin probe (Fig. 7A). These experiments suggest that extensive homology at the 5' end of the gene is limited to a region containing the first 45 bp of the 5' leader, a small portion of the first intron (103 bp), and 233 bp of 5' flanking sequence. I t is also notable that we could detect no homologous sequences in the human DNA with any of our 3' untranslated region probes (Fig. 7C). Although it is entirely possible that relatively small regions of homology might be missed with this analysis, we note that the 3' untranslated region of the chicken a-skeletal muscle actin gene hybridizes appreciably to the human a-skeletal muscle actin gene under even more stringent conditions,' which suggests that the 3' untranslated region of the a- smooth muscle actin gene does not contain the extensive regions of evolutionarily conserved sequences characteristic of the 3' untranslated region of the a-skeletal muscle actin gene.

Page 8: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

8972

A. t i3 RI

Kb

23.1- - W

m s 9.4- Ir

6.6- zc 4.4-

6. t i3 RI

Kb 23.1-

9.4-

6.6-

4.4-

2.3- 2.0-

0.6-

2.0- 2.3-

0.6-

D. A T G Xba I (TAG) AATAAA

Xba I

1 I

1OObp U

F.

Chicken a-Smooth Muscle Actin Gene

‘’ H3 RI

F” Kb

23.1-

9.4-

6.6-

4.4-

2.3- 2.0-

E. Ora I

( T A A ) Ora I A”- -”

U 100bp

-3V3Dr-B

Psl I CAT 0 Coding Sequence

0 Transcribed Untranslated Sequence 7P6-6

U 1OObp

FIG. 5. Southern blot of chicken nuclear DNA. Chicken nu- clear DNA (15 pg/lane) was digested to completion with either Hind111 (H3) or EcoRI (Rn, electrophoresed in 0.8% agarose gels, and transferred onto nitrocellulose. Filters were hybridized to single- stranded ”‘P-labeled probes and then washed in 1 X SSC, 0.5% SDS as described under “Experimental Procedures.” A, autoradiogram of a filter hybridized to the general actin probe described in D. B, autoradiogram of a filter hybridized to the a-smooth muscle actin 3’ untranslated region probe described in E. C, autoradiogram of a filter hybridized to the n-smooth muscle actin 5’ untranslated region probe described in F. In A, R, and C, numbers next to arrows indicate the positions of standards. D, schematic of the chicken nonmuscle type 5 actin gene and the region encompassed by the general actin probe, NMX1. E , schematic of the last exon of the chicken a-smooth muscle actin gene and the region encompassed by the 3’ untranslated region probe 3V3Dr-9. F, schematic of the first exon of the chicken a-smooth muscle actin gene and the region encompassed by the 5’ untranslated region probe 7P6-6. Coding and mRNA-transcribed untranslated sequences are as denoted. Thin lines represent intron and flanking sequences. Solid bars represent areas from which the probe were derived. Putative regulatory signals (CAT, TATAA, AATAAA), ini- tiation (ATG) codons, termination codons, and restriction sites used in the construction of the probes are indicated.

DISCUSSION

Our laboratory has previously reported that the chicken actin multigene family is represented by 8-10 coding loci (29). Both for the purposes of the present study and our future investigations of the regulation of the a-smooth muscle actin gene, it was necessary that we determine what number of these actin-related sequences are functional a-smooth muscle actin genes. Several lines of evidence argued strongly that the XAC3 actin gene is the sole functional a-smooth muscle actin gene. First, we have demonstrated by multiple genomic South- ern blot analyses that the XAC3 actin gene is uniquely repre- sented in the chicken genome, i.e. probably only once per

haploid genome. Second, we can identify no features of this gene which would suggest that it is nonfunctional (e.g muta- tions within the protein-codingsequences), and we have found that sequences found uniquely in this gene are transcribed into poly(A)+ RNA. Third, in an earlier report, our laboratory described the isolation of genomic clones for six different actin genes (29), and we have recently isolated cDNA clones for two further distinct actin isoforms.’ These clones, there- fore, allow us to account for most, if not all, of the chicken actin multigene family. Based on this evidence, we conclude that the XAC3 actin gene is the only functional a-smooth muscle actin gene present in the chicken genome.

At least four a-smooth muscle actin messenger RNAs are expressed in both 10-day chicken embryos and the aorta. These transcripts are apparently products of the same gene and share common, probably identical, 5’ untranslated region and coding sequences while differing from one another in the length of their 3‘ untranslated regions. Our data suggest that this heterogeneity results from the utilization of alternative polyadenylation signals. Analogous situations have been de- scribed for several other gene systems. Examples include the genes for mouse dihydrofolate reductase (60), a-amylase (61) and &-microglobulin (62), and chicken pro-a2(1) collagen (63) and vimentin (64,65), all of which have multiple polyadenyl- ation signals in their 3’ untranslated regions which are ap- parently utilized with various efficiencies to produce multiple RNAs encoding the same protein. We cannot exclude the possibility, however, that our “polyadenylation” sites are ac- tually differentially utilized splice sites which serve as accep- tors for a small exon or exons located outside the limits of our sequence.

At least two mechanisms can be envisioned by which the a-smooth muscle actin gene could utilize alternative polyad- enylation signals. One possibility is that the polyadenylation sites of the gene correspond to termination sites for the RNA polymerase. Alternatively, transcription of the primary tran- script could proceed across all potential polyadenylation sig- nals and then be cleaved and polyadenylated at any of several sites. We have no data which would preferentially support either one of these models for the case of the a-smooth muscle actin gene, although the second possibility is more likely since the transcription of several genes has been demonstrated to proceed well past single (66-69) or even multiple functional polyadenylation signals (70-72) prior to endonucleolytic proc- essing and subsequent polyadenylation.

We have been unable to identify any regions within the 3’ untranslated sequences of the a-smooth muscle actin mRNAs which are evolutionarily conserved between the chicken and man. It is entirely possible, however, that relatively small conserved sequences would escape our attention with this method, and, therefore, we suggest only that the 3’ untrans- lated region of the a-smooth muscle actin gene does not demonstrate the extensive evolutionary conservation ob- served in the 3‘ untranslated regions of the a-skeletal(24,53, 54,59), a-cardiac (22, 23), and &cytoplasmic actin genes (54, 58). The biological significance of 3’ untranslated region conservation in these genes is unclear, and, therefore, we can make no assessment of the significance of a lack of such conservation in the 3’ untranslated region of the a-smooth muscle actin gene. I t is tempting to speculate, however, that unlike these other genes, the 3’ untranslated region of the a- smooth muscle actin gene is no longer required for a regulatory function and as a consequence is free to utilize multiple polyadenylation sites with the accompanying variations pro- duced.

An examination of the sequences immediately upstream

Page 9: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

Chicken a-Smooth Muscle Actin Gene

6 7

8973

8

a

E. Dra I

(TAA)Dra lPvu II Spe I Dde I Acc I Pvu I I Acc I

I

AATAAA A A T A A A AXTTAA AATAAA

-3V4Dr-8 -3V4-3. Dra I Truncated

3V3Ac-8 3V3Ac-8. Spe I Truncated 3V3Ac-8. Ode I Truncated

3Ac3-1

3Ac3-1, 801 I I Truncated I 200bp , 3 ~ ~ 3 - 1 . Pvu II Truncated

FIG. 6. Northern blot analysis and 3' end mapping of a-smooth muscle actin mRNAs. Poly(A)+ RNA from aorta or 10-day chicken embryos (1 pgllane) was denatured with glyoxal, fractionated by electrophoresis on 1% agarose gels, and transferred onto nylon membranes. Blots were hybridized to single-stranded "P-labeled probes as described under "Experimental Procedures." A, autoradiogram of a filter hybridized to the general actin probe NMX4 (reverse complement of NMX1; see Fig. 5). B, autoradiogram of a filter hybridized to the a-smooth muscle actin 3' untranslated region probe 3V4Dr-8 (reverse complement of 3V3Dr-9; see Fig. 5). C, autoradiogram of a blot of 10-day chicken embryo poly(A)+ RNA hybridized to the a-smooth muscle actin 5' untranslated region probe 7P6-6 (see Fig. 5). D, autoradiograms of a set of identical blots of 10-day chicken embryo poly(A)+ RNA hybridized to the probes described in E. Probes used for a particular blot are as follows: 1,3V4Dr-8; 2,3V4-3, DraI truncated; 3, 3V3Ac-8; 4, 3V3Ac-8, SpeI truncated; 5, 3V3Ac-8, DdeI truncated; 6, 3Ac3-1; 7, 3Ac3-1, PuuII truncated; 8, 3Ac3-1, BglII truncated. Following this experiment, blot 8 was hybridized to the general actin probe NMX4 and found to produce a pattern identical to that of the embryo blot in A. In B, C, and D, arrows indicate the positions of a-smooth muscle actin mRNAs. E, schematic diagram of the 3' half of plasmid pAC333 and the regions encompassed by the probes used in D. The 5' end of the sequence is to the left. Solid areas, coding sequence; hatched area, pBR322; dashed line, intron 8; thin line, 3' untranslated and flanking sequences; solid bars, areas encompassed by the probes. The positions of the restriction sites used in the construction and truncation of the probes, the TAA termination codon, and potential polyadenylation signals are indicated.

from the mRNA cap site of the chicken a-smooth muscle actin gene revealed that although this gene has a canonical TATA box in the expected position, the only potential CCAAT homology which could be identified was inverted relative to the orientation in which these elements are typi- cally found. This is in contrast to the other chicken actin genes which have been analyzed to date, all of which contain CCAAT boxes in the usual orientation (3, 19,22, 23, 25). An inverted CCAAT sequence may still be functional, however, since it has recently been demonstrated that an inverted CCAAT box in the herpes simplex virus thymidine kinase gene is necessary for optimal transcription efficiency (73, 74) and interacts with a CCAAT-binding transcription factor (74). Jones et al. (74) found that the thymidine kinase gene CCAAT box was asymmetrically positioned within an area which was protected in a DNase-footprinting experiment, which led them to suggest that the signal recognized by this factor may be larger than simply the CCAAT homology pre- viously recognized. If this hypothesis is correct, it could ex- plain the observation that sequences surrounding the CCAAT

boxes of the a-skeletal, (59), a-cardiac, and a-smooth muscle actin genes are conserved. Alternatively, these additional conserved nucleotides could be involved in the binding of a distinct transcription factor which recognizes all three of these chicken muscle actin genes. We have also noted that the chicken a-skeletal, a-cardiac, and a-smooth muscle actin genes contain upstream sequences which resemble moderately conserved inverted repeats of the region surrounding their respective CCAAT boxes. Deletion of a portion of this up- stream repeat in the a-skeletal muscle actin gene has been found to cause a severe reduction in developmental activation and induction in primary myoblast cultures, but does not influence the efficiency of transcription in Xenopus oocytes (75). We are currently attempting to determine both whether deletion of the upstream repeats has a similar effect in the a- cardiac and a-smooth muscle actin genes and whether these elements interact with CCAAT-binding transcription factors. Finally, we would point out that these sequences are not of a sufficient size to account for the hybridization which we observed when sequences at the 5' end of the chicken a-

Page 10: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

a974

A. =* 1 2

C C Q ) c

.o E O I x 3

c”v Kb . Kb

23.1- 23.1-

9.4-

6.6-

4.4- 6.6-

4.4“:. .

2.3- . f 2.0-

2.0- I .

0

C Q ) c Y l a

c 3 .o E O I

C a, Y 0 c .- 0

2 C

I 3

C a, Y 0 x .- 0

C a, Y 0 x .- 0

C

2 I 3

C

Y a,

0 c .- 0

23.1-k

9.4-

6.6-

4.4“

1 I

2.3- 2.0-

FIG. 7. Identification of evolutionarily conserved sequences in the chicken a-smooth muscle actin gene. Chicken or human nuclear DNA (10 qg/lane) was digested to completion with EcoRI, electrophoresed on 0.8% agarose gels, and transferred onto nitrocellulose. Filters were hybridized to single-stranded “’P-labeled probes and washed in 2 X SSC, 0.5% SDS as described under “Experimental Procedures.” Numbers next to arrows indicate the positions of standards. A , autoradiogram of a filter hybridized to the general actin probe NMX4. B, autoradiograms of filters hybridized to 1,5‘ flanking region probe 7P4-1 and 2,5‘ flanking and untranslated region probe 7P6-6. C, autoradiograms of filters hybridized to 3’ untranslated region probes 1, 3V4Dr-8; 2, 3V4-3, DraI truncated; 3,3V3Ac-8; and 4,3Ac3-1.7P4-1 is derived from a 629-bp PstI fragment which is immediately 5’ to the region encompassed by 7P6-6. Other probes are described in Figs. 5 and 6.

smooth muscle actin gene were used to probe human DNA. The strength of hybridization noted in these experiments suggests the presence of additional features which are in- volved in transcriptional or post-transcriptional regulation of this gene.

Structural characterizations of representative genes from several vertebrate multigene families have led to the obser- vation that, in many cases, intron positions (but not neces- sarily sequences) are conserved (49). However, examination of actin genes isolated from a wide variety of organisms has revealed that while intron positions are somewhat conserved in deuterostomes (19, 21), such conservation is much less apparent in protostomes (44). These observations have led to much disagreement about whether the intron positions found in modern actin genes are the result of (a) the loss of some introns from a common ancestral actin gene which originally had many introns, (b) insertion of new introns into an intron- less primordial actin gene, or (c) some combination of intron insertion and deletion. A comparison of the intron positions in deuterostome actin genes to those found in the a-smooth muscle actin gene sheds new light on this controversy. We have demonstrated that the structural sequences of the chicken a-smooth muscle actin gene are interrupted by eight introns. Examination of the intron positions in vertebrate a- cardiac (20, 22, 23), a-skeletal (19, 21, 24), and cytoplasmic (3, 25-27) actin genes as well as those found in sea urchin

actin genes (45,46) revealed that the intron positions in these genes represent subsets of the intron positions found in the chicken a-smooth muscle actin gene (Table 11). Our demon- stration of an actin gene which contains all of the intron positions found in three other distinct deuterostome actin gene lineages (vertebrate striated muscle, vertebrate cyto- plasmic and echinoderm) is most consistent with a scheme involving the loss of introns from common ancestral sites. We, therefore, concur, at least for the case of the deuterostome actin genes, with the suggestion that intron deletion has been the dominant process influencing the placement of introns in modern actin genes (21, 76).

The chicken a-smooth muscle actin gene contains one intron (IV3, between the codons for amino acids 84/85) which has previously been found only in the human a-smooth muscle actin gene (28). These authors, noting that an intron at this position had not been previously reported among either ver- tebrate or invertebrate actin genes, proposed that this intron was inserted relatively late in evolution. This proposal must now at the least be modified. Since this intron is present in both the human and chicken genes, it seems reasonable to suggest that it was present in the gene found in their last common ancestor, the stem reptiles (about 300-350 million years (77)). Additionally, it has been reported that the exon/ intron organization of the human y-smooth muscle actin gene is identical to that of the human a-smooth muscle actin gene

Page 11: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

Chicken a-Smooth Muscle Actin Gene 8975

TABLE I1 A comparison of the intron positions of the chicken a-smooth muscle actin gene to those of other

deuterostome actin genes Intron positions - Actin gene Organism

5' UTR" 41/42 84/85

a-Smooth a-Smooth a-Skeletal a-Cardiac a-Cardiac @-Cytoplasmic Nonmuscle type 5 SpG28 SpG17 SfA

Chicken Human Chicken, rat, mouse Chicken Human Chicken, rat, human Chicken Sea urchin Sea urchin Sea urchin

X X X ? X X X X X X ? X X X X

X

Untranslated region.

c-Skelet.31 Muscle H-Cardlac Murcle Actin Qene Actin Qene

"- 7 1 2 "_ introns 3 and 4 Lost

Primitive Strlsted Vertebrete

Muscle Actin Gene

121/122 150 204

X x x X x x

x x x x x x

X

X X X X X X

267 327/328 Refs.

X X X

This paper X 28

X X 19, 21, 24 X X 22. 23 X X 20' X X 25-27

3 X 45

45 46

.c-Smooth MurcIe 7-Smooth Murcle

(No intron. Lost) (No introns Lort?) Actln Qene Actln Qene

Chordate Muscle Actin Progenitor Qene I

"

Primordlai Deuterostome Actin Qene End

Chicken Nonmuscie Type I Actln Qene - - - - - Cytopiarmic Actin

Chordate Echinoderm Actin - - -- ---I 4 6

(Ail Introns Loot Except 1) Progenltors Qene(s) Progenitor Qene(s) "

I . N introns 1.2,3.5.7 and 0 Lost

I " " " _- SpQ17. StATyp.

introns 3,5 and 0 Lost

1 Introns 1.3.5 and 0 Loat

Actin Qene

1 p-Cytopiaamlc Actln Qens SpQ28 Type Actln Qene

FIG. 8. A hypothetical "family tree" for the deuterostome actin genes based on intron positions found in modern actin genes. Solid arrows indicate evolutionary relationships which seem well established. Dashed arrows indicate possible alternative evolutionary pathways. Coding regions are represented by solid areas, mRNA- untranslated regions by open burs, flanking DNA by dashed lines, and introns by thin lines. Numbers above schematic diagrams indicate intron numbers which are based on the intron-numbering scheme presented in Table I. For the primordial deuterostome actin gene, small numbers beneath the schematic diagram indicate the amino acid codon either directly preceding or interrupted by the intron. The positions of the ATG initiation codon and the termination codon are also indicated for the primordial deuterostome actin gene. All schematic diagrams are oriented with the 5' end of the gene to the left.

(28). This would imply that this intron was in place in the primitive smooth muscle actin gene prior to its divergence into a- and y-isoforms, an event which has been proposed to have occurred either in the stem reptiles or their amphibian ancestors (10). These data, therefore, provide an estimate of the latest possible date a t which this intron could have been inserted, if it in fact originated in such a manner.

Although an intron site between codons 84 and 85 has at this time only been identified in smooth muscle actin genes, this does not necessarily indicate that it represents a recently inserted sequence. It is at least equally possible that this intron is an ancient sequence which has survived among the

deuterostome actins only in the smooth muscle actin gene lineage. The demonstration of an identically positioned intron in a distantly removed actin gene would be persuasive evi- dence that this intervening sequence was not the result of an insertion in the primitive smooth muscle actin gene. We would suggest, therefore, that the intron positions found in modern vertebrate and echinoderm actin loci could have arisen by the excision of introns from descendants of a primordial deuter- ostome actin gene having a minimum of eight introns (Fig. 8).

Such a "family tree" cannot at present be constructed for protostome actin genes. The reasons for the wide range in

Page 12: Structure and Complete Nucleotide Sequence of the Chicken ... · THE JOURNAL OF BIOLOGICAL CHEMISTRY Q 1986 by The American Society of Biological Chemists, Inc Vol. 261, No. 19, Issue

8976 Chicken a-Smooth Muscle Actin Gene

intron positions found in these genes are not yet clear, al- though the same possibilities of intron loss versus insertion can be considered for these genes as well. A determination of whether one of these hypotheses or a combination of the two is applicable to these genes will probably require the exami- nation of actin genes from a wider variety of protostomic organisms. We would also emphasize that our description of deuterostome actin gene evolution is based solely on a consid- eration of intron placements. Data derived from a comparison of protein primary structures (10) or the utilization of amino- terminal dipeptides as evolutionary markers (3, 47) raise serious questions about the evolutionary relationships be- tween the actin genes of protostomes and deuterostomes which hopefully will be clarified as genes from a wider variety of organisms are analyzed.

Acknowledgments-We gratefully acknowledge Dr. Anthony G. DiLella for the gift of human genomic DNA, Dr. Gayle Slaughter for performing the densitometer scan, David Scarff for the preparation of illustrations, Jim Grichnik for assistance with computer software, and Ruth Decker for expert technical assistance.

REFERENCES 1. Vandekerckhove. J.. and Weber. K. (1978) Proc. Natl. Acad. Sci. U. S. A .

2.

3.

4.

5. 6. 7. 8. 9.

10. 11.

12.

13.

14. 15.

, . . , . 7 5 , 1106-1110

1 5 2 , 413-426 Vandekerckhove, J., Franke, W. W., and Weber, K. (1981) J. Mol. Biol.

Bergsma, D. J., Chang, K. S., and Schwartz, R. J. (1985) Mol. Cell. Biol. 5 ,

Gunning, P., Ponte, P., Blau, H., and Kedes, L. (1983) Mol. Cell. Biol. 3 ,

Hayward, L. J., and Schwartz, R. J. (1986) J. Cell Biol. 102,1485-1493 Vandekerckhove, J., and Weber, K. (1978) J. Mol. Biol. 126,783-802 Vandekerckhove, J., and Weber, K. (1979) Differentiation 14 , 123-133 Vandekerckhove, J., and Weber, K. (1981) Eur. J. Biochem. 113,595-603 Gabbiani, G., Schmid, E., Winter, S., Chaponnier, C., de Chastonay, C.,

Vandekerckhove, J., Weber, K., and Franke, W. W. (1981) Proc. Natl.

1151-1162

1985-1995

ad. sei. u. s. A. 78,298-302 Vandekerckhove, J., and Weber, K. (1984) J. Mol. Biol. 179,391-413 Storti, R. V., Horovitch, S. J., Scott, M. P., Rich, A., and Pardue, M. L.

Sahorio, J. S., Segura, M., Flores, M., Garcia, R., and Palmer, E. (1979) J.

Ordahl, C. P., Tilghman, S. M., Ovitt, C., Fornwald, J., and Largen, M. T.

Schwartz, R. J., and Rothblum, K. N. (1981) Biochemistry 20,4122-4129 Fyrherg, E. A,, Mahaffey, J. W., Bond, B. J., and Davidson, N. (1983) Cell

(1978) Cell 13,589-598

Biol. Chem. 254,11119-11125

(1980) Nucleic Acids Res. 8 , 4989-5005

33. 115-123 16. Minty, A. J., Alonso, S., Caravatti, M. J. L., and Buckingham, M. E. (1982)

17. Garcia, R., Paz-Aliaga, B., Ernst. S. G.. and Crain. W. R.. Jr. (1984) Mol.

", ~~~ ~~-

Cell 30,185-192

con ~ ; ~ r 4 mn-md.5 18. Bmskin, A. M., Tyner, A. L., Wells, D. E., Showman, R. M., and Klein, W.

" .... YIY". *, Y" -_- H. (1981) Deu. Biol. 87.308-318

19. Fornwald, J. A,, Kuncio,-G., Peng, I., and Ordahl, C. P. (1982) Nucleic

20. Hamada, H., Petrino, M. G., and Kakunaga, T. (1982) Proc. Natl. Acad.

21. Zakut, R., Shani, M., Givol, D., Neuman, S., Yaffe, D., and Nudel, U. (1982)

22. Chane. K. S.. Rothblum. K. N.. and Schwartz. R. J. (1985) Nucleic Acids

Acids Res. 10,3861-3876

Sci. U. S. A . 79,5901-5905

Nature 298,851-859

ResY'13,1223-1237 '

, ,

23. Eldridge, J., Zehner, Z., and Paterson, B. M. (1985) Gene 36,55-63 24. Hu, M. C.-T., Sharp, S. B., and Davidson, N. (1986) Mol. Cell. Riol. 6 , 15-

25 25. Kist. T. A.. Theodorakis. N.. and Hughes, S. H. (1983) Nucleic Acids Res.

1 i, 8287-8301 . . - . . .

26. Nudel, U., Zakut, R., Shani, M., Neuman, S., Levy, Z., and Yaffe, D. (1983)

27. Ng, S.-Y., Gunning, P., Eddy, ,R., Ponte, P., Leavitt, J., Shows, T., and Nuclezc Acids Res. 1 1 , 1759-1771

Kedes, L. (1985) Mol. Cell. Bml. 5,2720-2732

28. Ueyama, H., Hamada, H., Battula, N., and Kakunaga, T. (1984) Mol. Cell.

29. Chang, K. S., Zimmer W. E., Jr., Bergsma D. J. Dodgson, J. B., and

30. Birnboim, H. C., and Doly, J. (1979) Nucleic Acids Res. 7 , 1513-1523 31. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular Cloning: A

Laboratory Manual, p. 93, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Biol. 4 , 1073-1078

Schwartz, R. J. (1984) Mol. Cell. Biol. 4 , 2498-2568

32. Messing, J., and Vieira, J. (1982) Gene 19,269-276 33. Yanisch-Perron, C., Vieira, J., and Messing, J. (1985) Gene 33 , 103-119 34. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U.

35. Maxam, A. M., and Gilbert, W. (1980) Methods Enzymol. 65,499-560

37. Lawson, G. M., Knoll, B. J., March, C. J., Woo, S. L. C., Tsai, M.-J., and 36. Queen, C., and Korn, L. J. (1984) Nucleic Acids Res. 12 , 581-599

38. Schwartz, R. J., and Rothblum, K. N. (1980) Biochemistry 19.2506-2514 39. Aviv, H., and Leder, P. (1972) Proc. Natl. Acad. Sci. U. S. A. 6 9 , 1408-

S. A. 74,5463-5467

OMalley, B. W. (1982) J. Bid. C h m . 257,1501-1507

1412 40. Soithirn, E. M. (1975) J. Mol. Biol. 9 8 , 503-517 41. Smith, G. E., and Summers, M. D. (1980) Anal. Biochern. 109 , 123-129 42. Thomas, P. S. (1980) Proc. Natl. Acad. Sci. U. S. A. 77,5201-5205 43. Thomas, P. S. (1983) Methods Enzyrnol. 100 , 255-266 44. Fyrberg, E. A., Bond, B. J., Hershey, N. D., Mixter, K. S., and Davidson,

45. Cooper, A. D., and Crain, W. R., Jr. (1982) Nucleic Acids Res. 10, 4081-

46. Foran, D. R., Johnson, P. J., and Moore, G. P. (1985) J . Mol. Euol. 2 2 ,

N. (1981) Cell 24,107-116

4092

1nR-116 47. Gunning, P., Ponte, P., Okayama, H., Engel, J., Blau, H., and Kedes, L.

48. Hamada, H., Petrino, M. G., and Kakunaga, T. (1982) Proc. Natl. Acad.

49. Breathnach, R., and Chambon, P. (1981) Annu. Rev. Biochern. 60, 349-

"_ "-

(1983) Mol. Cell. Biol. 3, 787-795

Sci. U. S. A. 79, 6465-6469

2Q2

50. Ghosh, P. K., Reddy, V. B., Piatak, M., Lebowitz, P., and Weissman, S. M.

51. Shenk, T. (1981) Curr. Top. Microbiol. Immunol. 9 3 , 25-46 52. Ponte, P., Gunning, P., Blau, H., and Kedes, L. (1983) Mol. Cell. Biol. 3 ,

"VU

(1980) Methods Enzyrnol. 65,580-595

17Ql-1791

53. Gunning, P., Mohun, T., Ng, S.-Y., Ponte, P., and Kedes, L. (1984) J. Mol.

54. Yaffe, D., Nudel, U., Mayer, Y., and Neuman, S. (1985) Nucleic Acids Res.

"" &.". Euol. 20,202-214

13.3732-3737 --, ~ ~~

55. Hughes, A. F. W. (1943) J. Anat. 77, 266-287 56. Proudfoot, N. J., and Brownlee, G. G. (1976) Nature 263,211-214 57. Birnstiel, M. L., Busslinger, M., and Strub, K. (1985) Cell 4 1 , 349-359 58. Ponte, P., Ng, S.-Y., Engel, J., Gunning, P., and Kedes, L. (1984) Nucleic

59. Ordahl, C. P., and Cooper, T. A. (1983) Nature 303,348-349 60. Setzer, D. R., McGrogan, M., Nunberg, J. H., and Schimke, R. T. (1980)

61. Tosi, M., Young, R. A., Hagenbuchle, O., and Schibler, U. (1981) Nucleic

62. Parnes, J. R., Robinson, R. R., and Seidman, J. G. (1983) Nature 302 ,

63. Aho, S., Tate, V., and Boedtker, H. (1983) Nucleic Acids Res. 11 , 5443-

64. Zehner, Z. E., and Paterson, B. M. (1983) Proc. Natl. Acad. Sci. U. S. A .

65. Capetanaki, Y. G., Ngai, J., Flytzanis, C. N., and Lazarides, E. (1983) Cell

Acids Res. 12 , 1687-1696

Cell 22,361-370

Acids Res. 9,2313-2323

449-452

5450

8 0 , 911-915

35,411-420

144.377-386 66. Nevins, J. R., Blanchard, J.-M., and Darnell, J. E., Jr. (1980) J. Mol. Biol.

67. HofeCE;, and-Darnell, J. E., Jr. (1981) Cell 23,585-593 68. Weintraub, H., Larsen A. and Groudine, M. (1981) Cell 24,333-344 69. Sheffery, M., Marks, P. A,, and Rifkind, R. A. (1984) J. Mol. Bid. 172 ,

41 7-436 70. Mather, E. L., Nelson, K. J., Haimovich, J., and Perry, R. P. (1984) Cell

". ".

26. R29-RRR 71. Frayne, E. G., Leys, E. J., Crouse, G. F., Hook, A. G., and Kellems, R. E.

72. Amara, S. G., Evans, R. M., and Rosenfeld, M. G. (1984) Mol. Cell. Biol. 4 ,

","_ _" (1984) Mol. Cell. Biol. 4,2921-2924

md.51-31fin 73. McKnight, S. L., Kingsbury, R. C., Spence, A., and Smith, M. (1984) Cell

75. Bergsma, D. J., Grichnik, J. M., Gossett, L. M. A., and Schwartz, R. J. 74. Jones, K. A., Yamamoto, K. R., and Tjian, R. (1985) Cell 42,559-572

(1986) Mol. Cell. Biol., in press 76. Blake, C. (1983) Nature 306,535-537 77. Romer, A. S., and Parsons, T. S. (1977) The Vertebrate Body, 5th Ed., pp.

"" "__ 37,253-262

34-91, W. B. Saunders Co., Philadelphia