structure and developmentally regulated expression of a

5
Proc. Nati. Acad. Sci. USA Vol. 83, pp. 3351-3355, May 1986 Developmental Biology Structure and developmentally regulated expression of a Strongylocentrotus purpuratus collagen gene (sea urchin/exon/intron/development/R loop) MALABI VENKATESAN*, FLORA DE PABLO*, GABRIEL VOGELIt, AND ROBERT T. SIMPSON* *Laboratory of Cellular and Developmental Biology, National Institute of Arthritis, Diabetes, and Digestive and Kidney Diseases, and tLaboratory of Developmental and Molecular Biology, National Eye Institute, National Institutes of Health, Bethesda, MD 20892 Communicated by Elizabeth D. Hay, January 22, 1986 ABSTRACT We have isolated and partially characterized an ca. 20-kilobase-pair Strongylocentrotus purpuratus genomic clone, using a mouse al (type IV) collagen cDNA probe. A 1-kilobase-pair HindM fragment of the clone hybridizes strongly to the probe; this has been subcloned and sequenced. It contains 212 base pairs of sequence coding for (Gly-Xaa- Yaa). (where Xaa and Yaa are different unspecified amino acids), characteristic of all known collagen genes. There is a single point of discontinuity within the repeating pattern in this exon, similar to the genomic structure of mouse type IV collagen. The (Gly-Xaa-Yaa)n-encoding element is flanked by consensus splicing sequences, and the intervening sequences on either side of it have multiple in-phase termination codons. Electron microscopy of R loops between the phage X recom- binant clone and poly(A)+ RNA reveals multiple short exons, a feature also seen in vertebrate collagen genes. The (Gly-Xaa- Yaa). protein-encoding sequence hybridizes to a developmen- tally regulated 9-kilobase mRNA; the message appears during the morula stage, rises sharply in abundance at the blastula stage, and decreases in proportion to total RNA later in development. Collagens are a set of closely related structural proteins present in all metazoan connective tissues. They provide the extracellular framework for most tissues and organs and are known to have inductive and morphogenetic functions in development. Besides the well-characterized interstitial collagens (types I-III), basement membrane collagen (type IV) and type V collagen isolated from placenta and cornea (for review, see refs. 1 and 2), several other structurally distinct types of collagen proteins have been recently de- scribed. The genomic structure of a number of vertebrate collagen genes reveals a complex organization with multiple small exons separated by large introns. The total length of the chicken pro-a2 (type I) gene, which serves as a prototype for a vertebrate collagen gene, is about 39 kilobase pairs (kbp) of DNA containing 51 introns. The highly unusual organization of collagen genes in vertebrates poses interesting questions about the evolution of this gene family (3). Comparative studies, particularly those involving nonvertebrate collagens, may reveal proteins with structural modifications compatible with the basic definition of collagen that serve different functions over a wide range in the evolutionary scale. Collagen proteins and genes have been isolated from Drosophila melanogaster (4-6) and the nematode Caeno- rhabditis elegans (7-9). Collagen-like proteins are also known to occur in other invertebrates (10). The sea urchin embryo has provided one of the best systems for the study of early embryogenesis, from the unfertilized egg to the free- swimming/feeding larval stage called the pluteus. The gen- eral pattern of embryonic development in echinoderms, to which the sea urchin belongs, parallels in some respects the events in early development of higher deuterostomes, includ- ing vertebrates (11). The manner in which gastrulation occurs as well as the origin and disposition of mesoderm are similar for the two groups. Therefore, from the point of view of both evolution and developmental biology, the presence or ab- sence of the collagen protein family in the sea urchin is of interest. There is some evidence to suggest that collagen is present in the developing sea urchin embryo. Based on incorporation of labeled amino acids, collagen synthesis is thought to begin late during cleavage and to undergo an increase of several fold during gastrulation and formation of the larval endoskeleton (12). Inhibitors of collagen biosynthesis inhibit development (13). Other studies have implicated collagen- like proteins as constituents of the organic matrix of the skeletal structure (14). Morphological observations also have been consistent with the presence of collagen during urchin embryogenesis. Fibrils, identified as collagen on the basis of characteristic banding periodicity, have been observed in the larval blastocoele (15), and unstriated 50-A microfibrils, thought to be unassembled collagen, have been observed associated with the basement membrane in Arbacia punc- tulata (16). Antibodies to types I, III, and IV collagen from vertebrates react with regions of Lytechinus variegatus embryos (17). However, to date, there is no firm biochemical evidence demonstrating the presence of the protein in any species of sea urchin. MATERIALS AND METHODS Embryo Cultures. Sea urchins (Strongylocentrotus purpuratus) were purchased from Pacific Biomarine (Venice, CA). Gametes were collected after intracoelomic injection of 0.5 M KCl. Eggs were fertilized and incubated at 16°C in Instant Ocean artificial sea water at a concentration of 107 per liter until the blastula stage. For subsequent stages, the cultures were diluted to a concentration of 2 x 106 per liter, and a mixture of penicillin and streptomycin was added to concentrations of 25 units/ml and 25 ,ug/ml, respectively. When possible, embryos from a single-pair mating were used to obtain all of the stages of development. Isolation and Characterization of a A Phage Recombinant Containing an S. purpuratus Collagen Gene. Size-fractionated, 15- to 23-kbp, partially Mbo I-digested S. purpuratus DNA was ligated into BamHI-digested XJ1 (18). The library was screened (19) by using a nick-translated mouse al (type IV) cDNA clone (20). Single positive plaques were obtained after four cycles of screening. Restriction enzymes were purchased from Bethesda Re- search Laboratories or New England Biolabs and were used according to the manufacturers' protocols. M13 vectors, mp8 and mp9. were used to generate several recombinants for Abbreviations: kbp, kilobase pair(s); kb, kilobase(s); bp, base pair(s). 3351 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: vuongthu

Post on 07-Jan-2017

238 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Structure and developmentally regulated expression of a

Proc. Nati. Acad. Sci. USAVol. 83, pp. 3351-3355, May 1986Developmental Biology

Structure and developmentally regulated expression of aStrongylocentrotus purpuratus collagen gene

(sea urchin/exon/intron/development/R loop)

MALABI VENKATESAN*, FLORA DE PABLO*, GABRIEL VOGELIt, AND ROBERT T. SIMPSON**Laboratory of Cellular and Developmental Biology, National Institute of Arthritis, Diabetes, and Digestive and Kidney Diseases, and tLaboratory ofDevelopmental and Molecular Biology, National Eye Institute, National Institutes of Health, Bethesda, MD 20892

Communicated by Elizabeth D. Hay, January 22, 1986

ABSTRACT We have isolated and partially characterizedan ca. 20-kilobase-pair Strongylocentrotus purpuratus genomicclone, using a mouse al (type IV) collagen cDNA probe. A1-kilobase-pair HindM fragment of the clone hybridizesstrongly to the probe; this has been subcloned and sequenced.It contains 212 base pairs of sequence coding for (Gly-Xaa-Yaa). (where Xaa and Yaa are different unspecified aminoacids), characteristic of all known collagen genes. There is asingle point of discontinuity within the repeating pattern in thisexon, similar to the genomic structure of mouse type IVcollagen. The (Gly-Xaa-Yaa)n-encoding element is flanked byconsensus splicing sequences, and the intervening sequences oneither side of it have multiple in-phase termination codons.Electron microscopy of R loops between the phage X recom-binant clone and poly(A)+ RNA reveals multiple short exons,a feature also seen in vertebrate collagen genes. The (Gly-Xaa-Yaa). protein-encoding sequence hybridizes to a developmen-tally regulated 9-kilobase mRNA; the message appears duringthe morula stage, rises sharply in abundance at the blastulastage, and decreases in proportion to total RNA later indevelopment.

Collagens are a set of closely related structural proteinspresent in all metazoan connective tissues. They provide theextracellular framework for most tissues and organs and areknown to have inductive and morphogenetic functions indevelopment. Besides the well-characterized interstitialcollagens (types I-III), basement membrane collagen (typeIV) and type V collagen isolated from placenta and cornea(for review, see refs. 1 and 2), several other structurallydistinct types of collagen proteins have been recently de-scribed. The genomic structure of a number of vertebratecollagen genes reveals a complex organization with multiplesmall exons separated by large introns. The total length ofthechicken pro-a2 (type I) gene, which serves as a prototype fora vertebrate collagen gene, is about 39 kilobase pairs (kbp) ofDNA containing 51 introns. The highly unusual organizationof collagen genes in vertebrates poses interesting questionsabout the evolution of this gene family (3). Comparativestudies, particularly those involving nonvertebrate collagens,may reveal proteins with structural modifications compatiblewith the basic definition of collagen that serve differentfunctions over a wide range in the evolutionary scale.

Collagen proteins and genes have been isolated fromDrosophila melanogaster (4-6) and the nematode Caeno-rhabditis elegans (7-9). Collagen-like proteins are alsoknown to occur in other invertebrates (10). The sea urchinembryo has provided one of the best systems for the study ofearly embryogenesis, from the unfertilized egg to the free-swimming/feeding larval stage called the pluteus. The gen-eral pattern of embryonic development in echinoderms, to

which the sea urchin belongs, parallels in some respects theevents in early development of higher deuterostomes, includ-ing vertebrates (11). The manner in which gastrulation occursas well as the origin and disposition of mesoderm are similarfor the two groups. Therefore, from the point of view of bothevolution and developmental biology, the presence or ab-sence of the collagen protein family in the sea urchin is ofinterest.There is some evidence to suggest that collagen is present

in the developing sea urchin embryo. Based on incorporationof labeled amino acids, collagen synthesis is thought to beginlate during cleavage and to undergo an increase of severalfold during gastrulation and formation of the larvalendoskeleton (12). Inhibitors of collagen biosynthesis inhibitdevelopment (13). Other studies have implicated collagen-like proteins as constituents of the organic matrix of theskeletal structure (14). Morphological observations also havebeen consistent with the presence of collagen during urchinembryogenesis. Fibrils, identified as collagen on the basis ofcharacteristic banding periodicity, have been observed in thelarval blastocoele (15), and unstriated 50-A microfibrils,thought to be unassembled collagen, have been observedassociated with the basement membrane in Arbacia punc-tulata (16). Antibodies to types I, III, and IV collagen fromvertebrates react with regions of Lytechinus variegatusembryos (17). However, to date, there is no firm biochemicalevidence demonstrating the presence of the protein in anyspecies of sea urchin.

MATERIALS AND METHODSEmbryo Cultures. Sea urchins (Strongylocentrotus

purpuratus) were purchased from Pacific Biomarine (Venice,CA). Gametes were collected after intracoelomic injection of0.5 M KCl. Eggs were fertilized and incubated at 16°C inInstant Ocean artificial sea water at a concentration of 107 perliter until the blastula stage. For subsequent stages, thecultures were diluted to a concentration of 2 x 106 per liter,and a mixture of penicillin and streptomycin was added toconcentrations of 25 units/ml and 25 ,ug/ml, respectively.When possible, embryos from a single-pair mating were usedto obtain all of the stages of development.

Isolation and Characterization of a A Phage RecombinantContaining an S. purpuratus Collagen Gene. Size-fractionated,15- to 23-kbp, partially Mbo I-digested S. purpuratus DNAwas ligated into BamHI-digested XJ1 (18). The library wasscreened (19) by using a nick-translated mouse al (type IV)cDNA clone (20). Single positive plaques were obtained afterfour cycles of screening.

Restriction enzymes were purchased from Bethesda Re-search Laboratories or New England Biolabs and were usedaccording to the manufacturers' protocols. M13 vectors, mp8and mp9. were used to generate several recombinants for

Abbreviations: kbp, kilobase pair(s); kb, kilobase(s); bp, basepair(s).

3351

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Structure and developmentally regulated expression of a

3352 Developmental Biology: Venkatesan et al.

sequencing as well as for use as probes (21). Sequencing wasperformed by the chain-termination method (22) using a-35S-labeled deoxyadenosine 5'-[a-thio]triphosphate (dATP[a-35S]).

Isolation of RNA and Blot Hybridizations. Total RNA wasisolated by using the guanidinium thiocyanate method (23).RNA was fractionated on formaldehyde-agarose gels andtransferred to nitrocellulose filters in 20x SPE buffer (lxSPE = 0.18M NaCl/10mM sodium phosphate, pH 7.7/1 mMEDTA). The 550-base-pair (bp) HindIII-HincII fragmentcloned into M13 mp9 was labeled as a probe to hybridize tothe RNA blots. Fifteen micrograms of cloned DNA carryingthe mRNA strand was hybridized to 60 ng of a 15-bp primer(Bethesda Research Laboratories), and the probe was pre-pared by the method ofChurch and Gilbert (24). The reactionmixture was fractionated on a 6% polyacrylamide urea geland primer extended DNA of 90-130 nucleotides isolated.Filters were washed at 650C in lx SPE/0.1% NaDodSO4.

Electron Microscopy. Heteroduplexes between equalamounts of DNA from the recombinant phage XSUDcol-17and XJ1 were formed in 50% formamide/0.1 M Tris chloride,pH 8.5/0.01 M EDTA at a total DNA concentration of 5,4g/ml. The mixture was heated to 850C for 10 min andcooled, and then poly(A)+ RNA from blastula-stage embryoswas hybridized to the heteroduplex in 70% formamide/0.5 MNaCl/0.1 M Tris chloride, pH 8.0/5 mM EDTA at 52°C for16 hr. Samples were spread for electron microscopy asdescribed by Tilghman et al. (25). Double-stranded circularsimian virus 40 DNA was included as an internal sizestandard.

RESULTSIsolation and Characterization of a S. purpuratus Collagen

Gene. The mouse cDNA probe contains no poly(A)+ se-quences and is derived solely from the conserved, triple-helical region of the collagen molecule (20). Initial hybrid-ization to the sea urchin genomic library at low stringency ledto identification of seven positive plaques; rescreening ofthese was carried out to yield single positive clones. To testthe specificity of hybridization, the genomic library wasprobed with pCO2A, a plasmid that contains the early histonegenes of S. purpuratus (26). Filters were also hybridized tolabeled total RNA from 16-cell embryos to test the possibilitythat hybridization was to G+C-rich regions of the DNA.None of the positive clones obtained with the mouse collagencDNA probe coincided with positives from either of thesecontrols. After an initial restriction digest of DNA from thepositive clones, one, XSUDcol-17 (Sea Urchin DNAcollagen), was used for further studies.XSUDcol-17 DNA was digested with various restriction

enzymes, fractionated, and hybridized with the cDNA probe.The size of the inserted fragment determined from EcoRI orHindIII digests is nearly 20 kbp. Three of the EcoRI frag-ments and all four of the HindIII fragments, with sizesapproximately 9, 8, 1, and 0.8 kbp, hybridized to the probeat low stringency. This suggested that perhaps the gene wasspread widely within the cloned DNA. The 1-kbp HindIIIfragment, which hybridized strongly to the mouse probe athigher stringency, was subcloned into pUC12 to generatepSUDcoll7-1. Fig. 1 shows a partial restriction map of thisfragment.

Restriction enzymes such as BstNI and Sau96I withrecognition specificities of -CC'AGG- and -GGNCC-, respec-tively, are often used to identify putative coding sequencesfor collagen. These enzymes cut sequences predicted to beabundant in the triple-helical region but relatively infrequentelsewhere. The Dde I-Hph I region of pSUDcoll7-1 has fourand eight sites for Sau96I and BstNI, respectively. Sequenc-

ing showed this region to contain coding information forcollagen. Several fragments were purified from pSUDcoll7-1and sequenced, as indicated in Fig. 1.

Sequence Analysis. The sequence of 896 bp of the insert inpSUDcoll7-1 was determined (Fig. 1). A stretch of 212 bpcodes for the (Gly-Xaa-Yaa)n amino acid repeated sequencecharacteristic of the helical region of all collagen molecules.There is a single point of discontinuity, indicated in thesequence, which is interpreted as a deletion of a glycineresidue from the repeating pattern. Overall, the codingsequence contains information for 70Y3 amino-acid residueswith 23 Gly-Xaa-Yaa repeats. The G+C content ofthis regionis 57%. In addition to having glycine in every third position,the encoded amino acid sequence has other features charac-teristic of collagen. Proline constitutes 18% of the aminoacids, histidine and cysteine are absent, and very littletyrosine is present. These features are fairly typical of theexons coding for the triple-helical region of interstitialcollagens from vertebrates (27). While it is somewhat pre-mature to speculate on the distribution of amino acids in theXaa and Yaa positions, there are nonetheless similaritiesbetween this derived sequence and those of other sequencedcollagens. Thus, asparagine and serine are found only in theX position, while lysine is restricted to the Y position, as isalso seen in vertebrates (27). Sequencing of other exons willhopefully bear more on this subject. The codon usage forglycine is biased towards GGA (57%) and GGT (30%), whileproline is evenly distributed between CCA and CCT.

Flanking the 212-bp coding region are nucleotides whosesequences are homologous to known splice junction se-quences observed for collagen and other eukaryotic genes(28), as shown below, in which a slash represents the splicejunctions:

A ADonor: Consensus CAG/GTGAGT

Found AAG/GTGAGTAcceptor: Consensus CCCCCTNCAG/GT

Found TGGTCTACAG/GA

At the 5' end of the exon, the first nucleotide, guanosine, inthe Gly-Xaa-Yaa coding unit participates in the acceptorsplice sequence. The intervening sequences on either side ofthe Gly-Xaa-Yaa repeat have multiple termination codons inphase with the exon reading frame (Fig. 1).

In order to determine the genomic arrangement of thissequence, the 1-kbp insert in pSUDcoll7-1 was hybridized toHindIII-digested genomic DNA after Southern blotting. In-tense hybridization was detected over a broad size range offragments as well as to a discrete 1-kbp fragment (data notshown). The nondiscrete nature of the signal suggested thepresence of repeated elements in the sequence. Blots werehybridized with probes that contained only intron sequences,Hph I-HincII, and primarily exon sequences, HincII-Hph I.The exon probe labeled a 1-kbp fragment and, with much lessintensity, a 2-kbp fragment (pattern identical to that shown inFig. 2c). This suggests that the coding sequences for collagenare present in a single copy or at most two copies in the totalgenome. The single-copy nature ofthe gene is consistent withcopy-number determinations of collagen genes from higherorganisms (29). We are not sure whether the 2-kbp bandrepresents a closely similar sequence elsewhere in thegenome or whether it represents another collagen gene ofhomologous sequence. The intron fragment hybridized to awide selection of DNA fragments over a size range of 1-7kbp. Thus, a repeated element flanks the collagen sequenceat one end, a feature also seen in other collagen genes (28, 30).The functional significance ofthe repeated sequences presentin this intron remains to be established.

Proc. Natl. Acad. Sci. USA 83 (1986)

Page 3: Structure and developmentally regulated expression of a

Proc. Natl. Acad. Sci. USA 83 (1986) 3353

I iiixxLiiXs 1--i

IIII

I I -I- I I

,H

AGT TGG CCA AAC TCT GTA TAG GAT TSA CCC TGC ACC AGC TAT GGT GCA AGA GGC AAA CCA

64AAC CAG GTA AAC TTT CTA TTG ATT GTG AAG AAA AAA GTA ATT TTC ATA GCG TAC AAA GTT

124ACT TAT AAG TAT TTA AAS 6SG GAC TS TTA AAA TGC TGAT TAC TGT TAA ATT GTG ATC

TGG TCT ACA GGA AAT CAG GSA CTT CCG GGT GAC CAA GGT CCG GAT GOA TAC CAA GGT GAGSi y-Asn-Gin-Gl y-Lwu-Pro-Gi y-Asp-Gl n-GI y-Pro-Asp-Gl y-Tyr -GI n-Gi y-GI u-

244 -------

AAG G6C TCC CAA GSA CAA TCA GGA CCT CCA GGA AAC TCC ATT CCA GOT TCT CCA GGA GAGLys-Gl y-Ser-Gl n-Gl y-Gl n-Swr-Gi y-Pro-Pro-Gl y-Asn-Ser - I1e-Pro-Gl y-Ser -Pr o-Gi y-Gi u -

304AAA GGT GCC CAA GGC ATC CCA GGA GAC GTA GGA CAA CCT GGT CAA CCT GGA CCA ACC GGALys-Gl y-Al a-Gln-GS y-I le-Pro-Si y-Asp-Val -GS y-Gl n-Pro-Gl y-Gl n-Pro-GI y-Pro-Thr -G y-

364CCA CTG GGT AAC CCT GGA ATC CCT GGA OCT TTC GGC GAG AAG GSG AGT GAC TCC ACA TTGPro-Leu-GS y-Asn-Pro-GI y- I1I -Pro-Si y-Al a-Phe-Gl y-Gl u-Lys

424ATA TCC CAC CGC TGC CAC CAT ATA ATG TGS GST TGT TTA AGC TGT ATG ATT AGT ACT GAA

484ATS ATG ACA TAT AGA GAArTSA TCA GAG TTA GST TTC AAT CCA GCA AGC CCG OTT ATA GTG

544SST ATA GAA GAG TCC CAC ACA CAG TOT ACA TTA CTA TGT GTA TGT GTG TOT AGC ATT GAG

604TCA TOT ATA AGA TCA CAT ATT ACA ATT AGC AAA CTG TAT E CTG GAT GST T TTG TCT

664AGA TTT AAC TCT TTC AGT CSA CTA CGA TOT GAT AGC AAG TGT TTT T TAT TCA TCA ACA

724TGT ATA CAC AAA TCA TAC CTA TCA TTG CTC r CAT TTT GAT TCT TAT AAT TAT CCA CCA

784TAN TCC CAT AAC TTC AAA GCT CAC ASG ACA A6T TGC CTT r AAC TCC TTC CAC ACT AA

944ACA AAA ATA ACT CTT GSA SAT T6C OCT TTT GST TOT ACT GCT CCA AA6 CTT SCA TGC

tHind

FIG. 1. (Upper) Partial restriction map and strategy used to establish the DNA sequence of pSUDcoll7-1. (Lower) Partial sequence ofpSUDcoll7-1. The sequence from the Hae III site near one end to the HindIII site at the other is given. The protein sequence correspondingto the Gly-Xaa-Yaa-encoding frame is indicated below the nucleotide sequence. Splicejunctions are shown as straight lines above the nucleotidesequence. The dotted line above the DNA sequence indicates the point of discontinuity in the (Gly-Xaa-Yaa)n repeat. Boxed areas representtermination codons.

Developmental Expression. RNA was isolated fromunfertilized eggs and from 16-cell, morula, blastula, gastrula,and pluteus embryos, fractionated, and transferred to nitro-cellulose. An M13 mp9 clone containing the mRNA strand ofthe 550-bp HindIII-HinclI fragment was primer extendedwith [a-s P]dCTP as described. The first 150 bp of thisfragment is exclusively the coding region for Gly-Xaa-Yaa.Thus, the probe was enriched in these sequences. With thisprobe, a discrete band of approximately 9 kilobases (kb) wasdetected in blot hybridizations, appearing initially at themorula stage, increasing sharply in abundance at the blastulastage, and becoming progressively less abundant at thegastrula and pluteus stages (Fig. 2a). There is a 2-folddifference in the intensity of hybridization between blastulaand gastrula (Fig. 2a). No signal could be seen under theseconditions either in the unfertilized egg or at the 16-cell stage.

To have a control for the integrity of the RNA at the earlierstages of development, the same filter was hybridized topCO2A. The expected pattern of the early histone geneexpression was observed (Fig. 2b). These genes reach amaximum level of expression at the morula stage and then areturned off. To ensure that the collagen exon probe had thesame specificity as that used above, a genomic blot washybridized to the probe. The same two bands were detectedthat previously had reacted with the coding region of the seaurchin collagen gene (Fig. 2c). At low stringency of washing,the sea urchin collagen probe also hybridized to RNA froma mouse teratocarcinoma cell line, which strongly expressestype IV collagen and whose mRNA has been determined tobe -7.4 kb on our gels. The sea urchin collagen mRNA waslarger than this mouse type IV collagen transcript. These dataindicate that the 9-kb transcript is a collagen message.

Developmental Biology: Venkatesan et al.

1.4.

Page 4: Structure and developmentally regulated expression of a

3354 Developmental Biology: Venkatesan et al.

ai F

E 16 M B G P

b

E 16 M B G P

C1 m '

SPERM DNA

9.4-7A -

18S-

OS

23.7-9A-64-42-

22-13-

13-1.1-0.9-

Electron Microscopy of R Loops. Marked differences existbetween the organization of collagen genes from vertebrates,which have multiple small exons and large interveningsequences, and from other invertebrates, which have largeexons with one or two short introns (7, 14, 28, 31). Therefore,it was of interest to determine the organization of a collagengene in the genome of an invertebrate that lies on theevolutionary path to the vertebrates. Fig. 3 presents theresults of an analysis of the structure of this S. purpuratusgene. The structure reveals the presence of multiple (at least15) short 200- to 400-bp exons interrupted by interveningsequences that range in size from 400 bp to 1300 bp. Althoughthe introns appear to be shorter than those observed forvertebrate collagen genes, the overall pattern of multipleintrons is far more reminiscent of vertebrate than the othercharacterized invertebrate collagen genes.

DISCUSSIONWe have described the isolation of a genomic clone from S.purpuratus, which was selected by hybridization to a murinetype IV collagen cDNA probe, and characterized a segmentwithin it. The subclone is a single-copy sequence and con-tains a stretch of 212 bp that codes for 23 Gly-Xaa-Yaarepeats, is rich in proline, is flanked by appropriate splicejunctions, and is transcribed into a large RNA. Takentogether, these data suggest that the sequence described hereis part of a collagen gene. A bonafide collagen gene also mustcode for a protein that functions in the extracellular matrix(32); we have not yet tested this criterion for the sea urchingene.

FIG. 2. Developmental expression of the sea urchincollagen gene. Twenty micrograms of total RNA isolatedfrom unfertilized eggs (lanes E), 16-cell stage (lanes 16),morula (lanes M), blastula (lanes B), gastrula (lanes G),and pluteus (lanes P) were electrophoretically separatedand transferred onto a nitrocellulose filter. (a) Hybridiza-tion with the primer extended HindflI-HinclI fragment.

* (b) The same filter as in a hybridized to pCO2A, a histonegene-containing plasmid. (c) HindIII-digested S.purpuratus genomic DNA hybridized to the same probe asin a. Except for the 28S and 18S fragment, sizes are shownin kb.

Several features of the S. purpuratus gene are similar tovertebrate collagen genes, particularly to the basement mem-brane type IV collagen gene. The triple-helical coding exonsfor types I, II, and III interstitial collagen genes have a(Gly-Xaa-Yaa)n-encoding sequence with no discontinuities.Most of the exons in these genes range in size from 45 bp to162 bp (multiples of 9) and are related to a basic 54-bp unit,leading to the suggestion that a primordial 18-amino acid-encoding sequence has been duplicated and varied to gener-ate the vertebrate collagen gene (8, 17). The Gly-Xaa-Yaa-encoding exons of the interstitial collagen genes always startwith an intact glycine codon and end with an intact Yaacodon. In contrast, exons of type IV collagen genes frommouse (20, 33), like the sea urchin collagen exon describedhere, contains interruptions of the Gly-Xaa-Yaa repeat se-quence, which allows it to form an amorphous network-likestructure commonly found in basement membranes (33, 34).The 64-, 123-, and 182-bp exon sizes reported for the mousea2 (IV) collagen gene (35), like the 212-bp exon size of the seaurchin gene, do not conform to the conservation of the 54-bpbasic coding unit. Three of the four exons in the murine typeIV collagen gene start with a two-thirds-intact glycine codon;the first guanosine in these exons participates in the splicesequence-a feature shared with the sea urchin collagenexon. It is of interest to note that the structure of type IXcollagen genes shares some features with mouse and seaurchin type IV collagen genes. Here too, noncollagenousdomains interrupt triple-helical coding sequences (36, 37).The' sizes of exons, which vary from 33 bp to 1100 bp aredifferent from the 54-bp exon unit of the types I-III collagen

5 ,,¶I'm,';y4:&w,,-8' ,*

> '' ;~~~~~~~~~%"''p4" .

AA k%

;~~~~~X

FIG. 3. Electron microscopy of the hybrid between XSUDcol-17 DNA and blastula mRNA. A heteroduplex of XSUDcol-17 DNA with XJ1was first formed, and then poly(A)+ RNA was hybridized to this heteroduplex. The micrograph and an interpretation are shown.

Proc. Natl. Acad. Sci. USA 83 (1986)

I

Page 5: Structure and developmentally regulated expression of a

Proc. Natl. Acad. Sci. USA 83 (1986) 3355

genes (37). Five of 11 exons of al (type IX) and a2 (type IX)start with a two-thirds-intact glycine codon (37). Based onthese features, it has been proposed that the type IX collagengenes belong to a class distinct from the fibrillar collagengenes types I-Ill (37). Further characterization of the seaurchin collagen gene as well as vertebrate type IV genes isnecessary to see whether these fall within the same class.Although the lack of a 54-bp basic coding unit and inter-

ruptions of the Gly-Xaa-Yaa repeat are features that the seaurchin collagen gene shares with those of D. melanogaster(18) and C. elegans (7, 8), genes from these three inverte-brates differ considerably in structure. The D. melanogastergene, which is also considered to be of the nonfibrillar type,has a single short intron flanked by two Gly-Xaa-Yaa-encoding sequences of 662 and 726 nucleotides (4). The C.elegans collagen genes, which encode cuticle collagens, are1-1.5 kb and contain one or two small introns outside thehelical coding region (7, 8). As yet, no evidence has beenobtained in these protostomial invertebrates for the multiple-intron and -exon structure spread over a considerable stretchofDNA that is seen for the sea urchin and vertebrate collagengenes.The structure of the sea urchin gene suggests that the

multiple-exon approach was a very early decision in thevertebrate line of evolution. Interestingly, sea urchin actingenes contain intervening sequences at exactly the same sitesas in mammalian and avian actin genes, while introns arelocated at different positions in the Drosophila actin gene(38)-another indication of the similarities that exist betweenvertebrate and invertebrate deuterostomes and extend eveninto gene structure.The mRNA encoded by the sea urchin collagen gene is

developmentally regulated in its expression. If the codingsequence represents a type IV gene, then its developmentalexpression coincides with the synthesis of basal lamina, atthe blastula stage. A similar pattern of expression is seen inearly mouse embryogenesis, where the formation of base-ment membranes coincides with synthesis oftype IV collagen(39). The absence of the collagen mRNA in sea urchin eggsis interesting in view of the fact that antibodies to vertebratetype I, III, and IV collagens stain L. variegatus eggsuniformly. Subsequently, the antigens are localized to thelining of the blastocoele wall. At the mesenchyme blastulastage, the basal lamina stains intensely for these antigens (17).It is possible that collagen proteins exist as a small storedmaternal pool in the egg, and, at the blastula stage, the sameor another gene is activated to produce the large amounts ofprotein required at the time of deposition of the basal lamina;this situation is not dissimilar to that known for the histonegenes of sea urchins (26). The decrease in the abundance ofthe mRNA at the gastrula and pluteus larval stages mayreflect dilution of the transcript because of an increase in thenumber of cells that do not synthesize collagen. Hybridiza-tion in situ may help to determine if cell type-specificvariations in collagen synthesis exist during sea urchindevelopment and account for these quantitative changes inwhole-embryo RNA abundance.

Collagens are thought to influence morphogenesis anddevelopment through their roles in cell adhesion, cell migra-tion, and tissue differentiation. Sea urchins provide a con-venient system for a study of these processes in earlyembryogenesis. Cloned collagen genes such as that describedhere will provide sensitive molecular probes for studies oftherole of these ubiquitous proteins in development and differ-entiation.

We are indebted to Dr. Michael Clarke for his expert advice, Dr.Mike Cashel for advice on M13 sequencing, and our colleagues in the

laboratory for helpful discussions. We thank Margery Sullivan for theR-loop analysis. We also acknowledge Ms. Bonnie Richards forpreparation of the manuscript.

1. Bornstein, P. & Sage, H. (1980) Annu. Rev. Biochem. 49,957-1003.

2. Miller, E. J. & Gray, S. (1982) Methods Enzymol. 82A, 3-32.3. Yamada, Y., Avedimento, E. V., Mudryj, M., Ohkubo, H.,

Vogeli, G., Irani, M., Pastan, I. & Crombrugghe, B. (1980) Cell22, 887-892.

4. Monson, J. M., Natzle, J., Friedman, J. & McCarthy, B. J.(1982) Proc. Nati. Acad. Sci. USA 79, 1761-1765.

5. Natzle, J. E., Monson, J. M. & McCarthy, B. J. (1982) Nature(London) 296, 368-371.

6. Lunstrum, G. & Fessler, J. H. (1980) J. Supramol. Struct. 4,183-190.

7. Kramer, J. M., Cox, G. N. & Hirsh, D. (1982) Cell 30,599-606.

8. Kramer, J. M., Cox, G. N. & Hirsh, D. (1985) J. Biol. Chem.260, 1945-1951.

9. Cox, G. N. & Hirsh, D. (1985) Mol. Cell. Biol. 5, 363-372.10. Adams, E. (1978) Science 202, 591-598.11. Davidson, E., Hough-Evans, B. R. & Britten, R. J. (1982)

Science 217, 1-26.12. Golob, R., Chetsanga, C. J. & Doty, P. (1974) Biochim.

Biophys. Acta 349, 135-141.13. Mintz, G. R., DeFrancesco, S. & Lennarz, W. J. (1981) J.

Biol. Chem. 256, 13105-13111.14. Pucci-Minafra, I., Fanara, M. & Minafra, S. (1980) J.

Submicrosc. Cytol. 12, 267-273.15. Crise-Benson, N. & Benson, S. C. (1979) Wilhelm Roux's

Arch. Dev. Biol. 186, 65-70.16. Gibbins, J., Tilney, L. & Porter, K. (1969) J. Cell Biol. 41,

201-226.17. Wessel, G. M., Marchase, R. B. & McClay, D. (1984) Dev.

Biol. 103, 235-245.18. Loenen, W. A. & Bormann, W. J. (1980) Gene 20, 249-254.19. Woo, S. L. C. (1979) Methods Enzymol. 68, 389-395.20. Nath, P., Laurent, M., Horn, E., Sobel, M. E., Zon, G. &

Vogeli, G. (1986) Gene, in press.21. Messing, J. (1983) Methods Enzymol. 101, 20-77.22. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.

Acad. Sci. USA 74, 5463-5467.23. Cathala, G., Savonnet, J. F., Mendez, B., West, B. L., Kann,

M., Martial, I. F. & Baxter, J. D. (1983) DNA 2, 329-335.24. Church, G. M. & Gilbert, W. (1984) Proc. Natl. Acad. Sci.

USA 81, 1991-1995.25. Tilghman, S. M., Custis, P., Tiemeier, D. C. & Leder, P.

(1978) Proc. Natl. Acad. Sci. USA 75, 1309-1313.26. Kedes, L. H., Cohn, R. H., Lowry, J. C., Chang, A. C. Y. &

Cohen, S. N. (1975) Cell 6, 359-370.27. Hoffmann, H., Fietzek, P. P. & Kuhn, K. (1980) J. Mol. Biol.

141, 293-314.28. Wozney, J., Hanahan, D., Tate, V., Boedtker, H. & Doty, P.

(1981) Nature (London) 294, 129-135.29. Dalgleish, R., Trapnell, B. C. & Crystal, R. G. (1982) J. Biol.

Chem. 257, 293-314.30. Monson, J. M. & McCarthy, B. J. (1981) DNA 1, 59-69.31. Yamada, Y., Mudryj, M., Sullivan, M. & deCrombrugghe, B.

(1983) J. Biol. Chem. 258, 2758-2761.32. Ninomiya, Y., Showalter, A. M. & Olsen, B. R. (1984) in The

Role of Extracellular Matrix in Development, ed. Trelstad,R. L. (Liss, New York), pp. 255-275.

33. Kefalides, N. A., Alper, R. & Clark, C. C. (1979) Int. Rev.Cytol. 61, 167-228.

34. Hay, E. D. (1981) in Cell Biology of Extracellular Matrix, ed.Hay, E. D. (Plenum, New York), pp. 379-409.

35. Kurkinen, M., Bernard, M. P., Barlow, D. P. & Chow, L. T.(1985) Nature (London) 317, 177-179.

36. van der Rest, M., Mayne, R., Ninomiya, Y., Seidah, N. G.,Chreiten, M. & Olsen, B. R. (1985) J. Biol. Chem. 260, 220-225.

37. Lozano, G., Ninomiya, Y., Thompson, H. & Olsen, B. R.(1985) Proc. Natl. Acad. Sci. USA 82, 4050-4054.

38. Davidson, E. H., Thomas, T. L., Scheller, R. H. & Britten,R. J. (1982) in Genome Evolution, eds. Dover, G. & Flavell,R. B. (Academic, New York), pp. 177-192.

39. Adamson, E. D. & Ayers, S. E. (1979) Cell 16, 953-965.

Developmental Biology: Venkatesan. et al.