characterization of the mouse sparc/osteonectin … · characterization of the mouse...

6
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1988 by The American Society for Biochemistry and Molecular Biology, Inc. Vol. 263, No . 23, Issue of August 15, pp. 11111-11116,1988 Printed in U.S.A. Characterization of the Mouse SPARC/Osteonectin Gene INTRON/EXON ORGANIZATION AND AN UNUSUAL PROMOTER REGION* (Received for publication, January 26, 1988) John H. McVeyS, Shintaro Nomura, Paul Kelly, Ivor J. Mason& and Brigid L. M. Hogan7 From the Laboratory of Molecular Embryology, National Institute for Medical Research, Mill Hill, London NW7 IAA, United Kingdom Two overlapping cosmids have been isolated contain- ing the entire murine gene for SPARC (osteonectin), a Ca2+-binding, phosphorylated glycoprotein associated with extracellular matrix synthesis and remodeling. The gene contains 10 exons and covers 26.5 kilobase pairs of DNA. Exon analysis shows that the two N- terminal glutamic acid-rich sequences whichare pre- dicted to undergo conformational change upon binding of calcium, as well as the C-terminal EF-hand Ca2+- binding domain are each encoded by a single exon. Comparative analysis of the exon sequences does not support the idea that the SPARC gene has evolved by shuffling of exons from other Ca2+-binding proteins. The 5’ flanking region of the SPARC gene, which promotestranscriptionwhenplacedinfront ofthe bacterial chloramphenicol acetyltransferase gene, con- tains neither “TATA” nor “CAAT” box sequences. However, unlike most other genes lacking these motifs, mapping of the 5’ end of the SPARC gene by RNase protection and primer extension analysis reveals only a single major and one minor transcription start site. The upstream region to -120 includes six repeats of the sequence GGAGG, two repeats of the sequence 5’ GGAGG A/C GGAGGG 3‘, and a potential transcrip- tion factor AP-2 binding site. SPARC, a secreted, phosphorylated, Ca2+-binding glycopro- tein of M, = 43,000, was first identified and cloned as amajor product of parietal endoderm cells of the mouse embryo (Mason et al., 1986a). These cells are specialized for the synthesisand remodeling of a thick basement membrane known as Reichert’s membrane. Northern analysis and in situ hybridization have shown that SPARC is also differentially expressed in a wide variety of adult and embryonic tissues (Holland et al., 1987; Mason et al., 1986b; Nomura et al., 1988). For example, high levels of SPARC RNA are found in developing bone, odontoblasts, deciduum, connective tissue sheaths of whisker follicles, zona fasiculata of the adrenal gland, interstitial cells of the testis, and thecal and corpus * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequencefs) reported in thispaper has been submitted 503913. to the GenBankTM/EMBLData Bunk with accession number(s) 2 Present address:Haemostasis Research Group, Clinical Research Centre, Harrow, Middlesex HA1 3UJ, UK. Present address: MRC Developmental Neurobiology Programme, University College London, London WClE 6BT, UK. ll To whom correspondence and reprint requests should be ad- dressed Dept. of CellBiology, VanderGlt University Medical School, Nashville, TN 37232. luteal cells of the ovary. In addition, a 20-fold increase in the steady state level of SPARC RNA is observed when F9 teratocarcinoma stem cells are induced to differentiate into parietal endoderm by treatment with retinoic acid and dibu- tyryl CAMP.This induction is accompanied by an increase in SPARC gene transcription (Mason et al., 1986b). Work from a number of laboratories has shown that the amino acid sequence of SPARC is highly conserved between species, and that a number of proteinsare identical with SPARC. These include osteonectin, a protein originally iso- lated from bone and mineralized tissue (Termine et al., 1981; Young et al., 1986; Bolander et al., 1988); BM-40, synthesized at high levels by the transplantable Engelbreth-Holm-Swarm mouse tumor which secretes an abundant extracellular matrix (Mann et al., 1987); and M, = 43,000 glycoprotein, produced by bovine aortic endothelial cells in culture (Mason et al., 1986a). The observed pattern of SPARC/osteonectin expres- sion in embryonic and adult tissues has led us to suggest that the protein plays a role in some calcium-dependent process involved in extracellular matrix assembly, turnover, or re- modeling (Mason et al., 1986b; Engel et al., 1987; Holland et al., 1987). Analysis of the predicted amino acid sequence of SPARC, and physiochemical studies on purified BM-40 protein have localized four structural domains which may be functionally distinct (Engel et al., 1987). The first is a glutamic acid-rich N-terminal domain, analogous to the Ca2+-binding Gla do- main of the vitamin K-dependent clotting factors. The second is a cysteine-rich region, which contains within it limited sequence similarity with the third domain of the turkey ovo- mucoid Pst-I type protease inhibitor. Domain I11 is predomi- nantlya-helical, while domain IV containsa single Ca2+- binding motif known as an EF-hand. This consists of a 12- amino acid loop flanked by two regions of a-helix and was first described in a family of partially homologous, high affin- ity Ca2+-binding cytoplasmic proteins including parvalbumin, calmodulin, intestinal Caz+-bindingprotein, and troponin C (Kretsinger, 1980).The isolation of the SPARC gene therefore provides an opportunity not only to study the regulation of its expression, but also to examine the possibility that the different domains of the protein have arisen duringevolution by shuffling of exons from other Ca2+-binding proteins, in the manner proposed by Gilbert (1978). MATERIALS AND METHODS Isolation of Recombinant Cosmids-Approximately 6 X lo6 cosmid colonies from a 129/Sv mouse liver DNA genomic library in pcos2EMBL (Poustka et al., 1984) (kindly supplied by Dr. Anne- Marie Frischauf, Imperial Cancer Research Fund, London) were screened by standard methods (Maniatis et al., 1982). Inserts were subcloned into Gemini plasmid vectors (Promega Biotec) for restric- tion mapping and nucleotide sequence analysis in both strands using either chain termination methods (Sanger et al., 1977), or chemical 11111

Upload: dinhcong

Post on 30-Aug-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1988 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 263, No . 23, Issue of August 15, pp. 11111-11116,1988 Printed in U.S.A.

Characterization of the Mouse SPARC/Osteonectin Gene INTRON/EXON ORGANIZATION AND AN UNUSUAL PROMOTER REGION*

(Received for publication, January 26, 1988)

John H. McVeyS, Shintaro Nomura, Paul Kelly, Ivor J. Mason& and Brigid L. M. Hogan7 From the Laboratory of Molecular Embryology, National Institute for Medical Research, Mill Hill, London NW7 I A A , United Kingdom

Two overlapping cosmids have been isolated contain- ing the entire murine gene for SPARC (osteonectin), a Ca2+-binding, phosphorylated glycoprotein associated with extracellular matrix synthesis and remodeling. The gene contains 10 exons and covers 26.5 kilobase pairs of DNA. Exon analysis shows that the two N- terminal glutamic acid-rich sequences which are pre- dicted to undergo conformational change upon binding of calcium, as well as the C-terminal EF-hand Ca2+- binding domain are each encoded by a single exon. Comparative analysis of the exon sequences does not support the idea that the SPARC gene has evolved by shuffling of exons from other Ca2+-binding proteins. The 5’ flanking region of the SPARC gene, which promotes transcription when placed in front of the bacterial chloramphenicol acetyltransferase gene, con- tains neither “TATA” nor “CAAT” box sequences. However, unlike most other genes lacking these motifs, mapping of the 5’ end of the SPARC gene by RNase protection and primer extension analysis reveals only a single major and one minor transcription start site. The upstream region to -120 includes six repeats of the sequence GGAGG, two repeats of the sequence 5’ GGAGG A/C GGAGGG 3‘, and a potential transcrip- tion factor AP-2 binding site.

SPARC, a secreted, phosphorylated, Ca2+-binding glycopro- tein of M, = 43,000, was first identified and cloned as a major product of parietal endoderm cells of the mouse embryo (Mason et al., 1986a). These cells are specialized for the synthesis and remodeling of a thick basement membrane known as Reichert’s membrane. Northern analysis and in situ hybridization have shown that SPARC is also differentially expressed in a wide variety of adult and embryonic tissues (Holland et al., 1987; Mason et al., 1986b; Nomura et al., 1988). For example, high levels of SPARC RNA are found in developing bone, odontoblasts, deciduum, connective tissue sheaths of whisker follicles, zona fasiculata of the adrenal gland, interstitial cells of the testis, and thecal and corpus

* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequencefs) reported in thispaper has been submitted

503913. to the GenBankTM/EMBL Data Bunk with accession number(s)

2 Present address: Haemostasis Research Group, Clinical Research Centre, Harrow, Middlesex HA1 3UJ, UK.

Present address: MRC Developmental Neurobiology Programme, University College London, London WClE 6BT, UK.

ll To whom correspondence and reprint requests should be ad- dressed Dept. of CellBiology, VanderGlt University Medical School, Nashville, T N 37232.

luteal cells of the ovary. In addition, a 20-fold increase in the steady state level of SPARC RNA is observed when F9 teratocarcinoma stem cells are induced to differentiate into parietal endoderm by treatment with retinoic acid and dibu- tyryl CAMP. This induction is accompanied by an increase in SPARC gene transcription (Mason et al., 1986b).

Work from a number of laboratories has shown that the amino acid sequence of SPARC is highly conserved between species, and that a number of proteins are identical with SPARC. These include osteonectin, a protein originally iso- lated from bone and mineralized tissue (Termine et al., 1981; Young et al., 1986; Bolander et al., 1988); BM-40, synthesized at high levels by the transplantable Engelbreth-Holm-Swarm mouse tumor which secretes an abundant extracellular matrix (Mann et al., 1987); and M , = 43,000 glycoprotein, produced by bovine aortic endothelial cells in culture (Mason et al., 1986a). The observed pattern of SPARC/osteonectin expres- sion in embryonic and adult tissues has led us to suggest that the protein plays a role in some calcium-dependent process involved in extracellular matrix assembly, turnover, or re- modeling (Mason et al., 1986b; Engel et al., 1987; Holland et al., 1987).

Analysis of the predicted amino acid sequence of SPARC, and physiochemical studies on purified BM-40 protein have localized four structural domains which may be functionally distinct (Engel et al., 1987). The first is a glutamic acid-rich N-terminal domain, analogous to the Ca2+-binding Gla do- main of the vitamin K-dependent clotting factors. The second is a cysteine-rich region, which contains within it limited sequence similarity with the third domain of the turkey ovo- mucoid Pst-I type protease inhibitor. Domain I11 is predomi- nantly a-helical, while domain IV contains a single Ca2+- binding motif known as an EF-hand. This consists of a 12- amino acid loop flanked by two regions of a-helix and was first described in a family of partially homologous, high affin- ity Ca2+-binding cytoplasmic proteins including parvalbumin, calmodulin, intestinal Caz+-binding protein, and troponin C (Kretsinger, 1980). The isolation of the SPARC gene therefore provides an opportunity not only to study the regulation of its expression, but also to examine the possibility that the different domains of the protein have arisen during evolution by shuffling of exons from other Ca2+-binding proteins, in the manner proposed by Gilbert (1978).

MATERIALS AND METHODS

Isolation of Recombinant Cosmids-Approximately 6 X lo6 cosmid colonies from a 129/Sv mouse liver DNA genomic library in pcos2EMBL (Poustka et al., 1984) (kindly supplied by Dr. Anne- Marie Frischauf, Imperial Cancer Research Fund, London) were screened by standard methods (Maniatis et al., 1982). Inserts were subcloned into Gemini plasmid vectors (Promega Biotec) for restric- tion mapping and nucleotide sequence analysis in both strands using either chain termination methods (Sanger et al., 1977), or chemical

11111

11112 SPARC Gene Organization cleavage (Maxam and Gilbert, 1980).

Filter Hybridization and RNase Mapping-Radiolabeled RNA probes for Southern analysis were synthesized as recommended by Promega Biotec, using [ ~ I - ~ ~ P ] U T P (approximately 600 Ci/mmol; Du Pont-New England Nuclear) and either SP6 or T7 RNA polymerase. Hybridization and washing were as previously described (Krumlauf et al., 1987). For RNase protection experiments, antisense RNA was uniformly labeled with [(u-~’P]UTP using SP6 polymerase. 1 x lo5 dpm (220 pg) were hybridized with 6 pg of total RNA at 55 “C for 16 h, followed by digestion with a mixture of RNase A and RNase T1 at 33 “C for 75 min. All conditions were as recommended by Promega Biotec. Protected fragments were extracted with phenol/chloroform/ isoamyl alcohol, and ethanol-precipitated. After denaturation, they were analyzed on an 8% acrylamide-urea gel, followed by autoradi- ography.

RNA Isolation-RNA was isolated from tissues of adult and em- bryonic CBA or C3H/He mice and F9 cells using a modification of the 3 M LiCl, 6 M urea method of Auffray and Rougenon (1980) as described (Krumlauf et al., 1987). Mouse F9 teratocarcinoma stem cells were cultured in Dulbecco’s modified Eagle’s medium supple- mented with 10% w/v fetal bovine serum. Differentiated cells were harvested 5 days after adding 5 X lo-’ M retinoic acid, lo-‘ M dibutyryl CAMP, and M isobutylmethylxanthine, as described (Hogan et al., 1986).

Primer Extension-Primer extension was performed as described (Williams and Mason, 1985) using a 22-nucleotide strand-separated primer (HpaII-DdeI fragment, nucleotide 127-148 (Mason et al., 1986a). Briefly, 17.5 pg of [3ZP]T4 polynucleotide kinase-labeled primer were hybridized to 1 pg of poly(A)+ RNA at 70 “C overnight in a sealed glass capillary. The hybridization products were expelled into extension buffer (Williams and Mason, 1985) and extended from the primer with reverse transcriptase at 42 “C for 1 h. The products of the reaction were denatured and analyzed on an 8% urea/acryl- amide sequencing gel followed by autoradiography (Maniatis et al., 1982). For sequencing of extension products, the reaction was scaled up using 40 pg of poly(A)+ RNA from differentiated F9 cells, and the products were isolated from the 8% sequencing gel by electroelution. The 44-nucleotide product was sequenced using all four chemical cleavage reactions of Maxam and Gilbert (1980). However, sufficient radioactivity was present in the 148-nucleotide extension product for only purine- and pyrimidine-specific sequencing reactions.

Plasmid Construction-The plasmid pUC-cat (kindly provided by Dr. Roger Watson, St. Bartholomew’s Hospital, London) was derived by inserting the 1.6-kb’ HindIII-Bum1 fragment of pSV2-cat (Gorman et al., 1982) into the XbaI-BamHI site of pUC18. pSP/639-cat was derived by inserting in the sense orientation the NcoI-PstI (-639 to +11) upstream regions of the SPARC gene into the HindIII-PstI site of pUC-cat.

Cell Transfections and Assay of Chloramphenicol Acetyltransferase Activity-On the day prior to transfection, mouse PYS parietal en- doderm cells and LD1 fibroblasts (which secrete SPARC protein- data not shown) were seeded at 0.5 X lo6 and 1 X lo6 cells per 90- mm dish, respectively. Transfection was carried out essentially as described (Gorman et al., 1982), using 15 pg of supercoiled DNA per plate, except that cells were cultured in the presence of the Capo4/ DNA co-precipitate for 10 h before applying a 3-min glycerol shock (15% glycerol in HEPES-buffered saline (140 mM NaCl, 0.75 mM NazHP04, 25 mM HEPES, pH 7.1)). Cells were harvested 48 h later for assay of chloramphenicol acetyltransferase activity as described (Gorman et al., 1982), except that the reaction was carried out at 37 “C for 1 h in a final volume of 150 p1 containing 50 pl of cell extract, 0.2 pCi of [‘4C]chloramphenic~l (Amersham International), and 20 p1 of 4 mM acetyl-coA.

RESULTS

Isolation and Analysis of SPARC Genomic Clones-A single cosmid, p30g3, was found to contain most of the SPARC coding region but not the first exon, which was located in a second, overlapping cosmid, pcosSP13 (Fig. lA). Mouse ge- nomic DNA restricted with BamHI gave an identical pattern of hybridization as these two cosmids when probed with full length SPARC cDNA (data not shown).

Structural Analysis of the SPARC Gene-The murine

The abbreviations used are: kb, kilobase pairs; bp, base pairs; HEPES, 4-(2-bydroxyethyl)-l-piperazineethanesulfonic acid.

SPARC gene spans 26.5 kb of DNA and contains 10 exons (Fig. 1, A and B ) . The 5’ donor and 3’ acceptor splice sites in all of the introns conform to the consensus sequences for exon-intron boundaries (Shapiro and Senapathy, 1987). Exon 1 is 74 bp in length and specifies only a 5‘ untranslated segment of SPARC mRNA and is followed by an intron of about 10 kb. The 2nd exon, 70 nucleotides long, specifies the remaining 13 untranslated nucleotides and the first 19 amino acids of the protein, of which 17 represent the signal peptide (Mason et al., 1986a). Exons 3 and 4 encode amino acid residues 3-23 and 24-51, respectively. These residues com- prise the two segments of the N-terminal glutamic acid-rich domain I which are thought to undergo conformational change from random coil to a-helix upon binding ea2+ (Engel et al., 1987). An intron of approximately 1 kb divides the sequence of this domain into two halves at a position of predicted B- turn in the protein structure.

Exons 5 and 6 encode amino acid residues 52-92 and 93- 132 which make up the cysteine-rich domain I1 (Engel et al., 1987), while exons 7 and 8 encode amino acid residues 133- 177 and 178-227 covering the third structural domain of the protein which is predicted to be mainly a-helical. The single EF-hand ea2+-binding domain extending from Phe-251 to Phe-272 is contained completely within exon 9 (residues 228- 276). Finally, exon 10 encodes the last nine amino acids of SPARC and the entire 3‘ untranslated region of the mRNA.

Mapping the 5’ End of the SPARC Gene--“TATA” and “CAAT” boxes were not found in the 639 bp upstream of the putative first exon (Fig. 1B and data not shown). Many other characterized eukaryotic genes have these two sequence ele- ments approximately 30 and 80 bp upstream of the RNA initiation site, respectively (Breathnach and Chambon, 1981). It was therefore important to establish that this was indeed the 5‘ end of the SPARC gene. Two methods were used. Firstly, a 747-nucleotide single-stranded RNA probe (Probe 1, Fig. 2 A ) spanning the first exon was used in an RNase protection assay of RNA isolated from embryonic and adult mouse tissues, and from differentiated and undifferentiated F9 teratocarcinoma cells. In all tissues expressing SPARC RNA, a major fragment of about 75 bp was protected from RNase digestion (Fig. 2B). The size of the protected fragment is consistent with the 5’ end of the RNA determined by cDNA cloning (Mason et al., 1986a). A minor fragment of about 90 bp was also protected. Identical results were obtained when a 1500-nucleotide probe which extended more 5’ (Fig. 2 A , BarnHI/HindIII fragment, Probe 2) was used (data not shown). Secondly, poly(A)+ RNA isolated from mouse embryo parietal endoderm, differentiated F9 cells, and the parietal endoderm cell line F9AcC19 was analyzed by primer extension using a single-stranded primer (see “Materials and Methods”). In all cases, two major transcripts (44 and 148 nucleotides long) and a minor transcript (180 nucleotides long) were detected (Fig. 2C and data not shown). The nucleotide se- quences of both the major transcripts were consistent with the sequence of murine SPARC cDNA (Mason et al., 1986a). The premature termination of the 44-nucleotide product is assumed to be due to RNA secondary structure. These results are consistent with one major transcription start site (Fig. 1B) and a minor transcription start site at approximately -20 nucleotides.

Promoter Activity of the 5’ Region-In order to confirm that the 5’ end of the murine SPARC gene had promoter activity, a -639 to +11 fragment was placed upstream of the bacterial chloramphenicol gene (pSP/639-cat, see “Materials and Methods”). This construct directed chloramphenicol ace- tyltransferase activity when transfected into murine PYS and

SPARC Gene Organization 11113

A 0 5 20 25 30 I

10 15 1 1 Kb

BH H E H H E B H EHHBE E H E E B H E B E E BE B B 5' 3 i I1 I I II I II I II II

I I 1 11

Exons 1 2 3 4 5 6 7a 9 10

pcOSSP13 "_ -

c . d d t g q q a c t t t q c a a q a a q q a t c a a a c a g a t c .......... a 9 q t f a a q g a a t a q q q a a C a a a a q t c t t a q a a c a ~ g a q q c a t q a a a a q c c c a a a c a t g a c l l c t t a q c c t q t t q t c t t t q t c c a a t c a c t t a a a a a a q c c e a a a c r q a t g t a c t q a q a a t q g a q ~ t c t c a t c t c a t q a c c a t a t a t a q a a a a c t c t q t ~ c c t t t a t c c t c a a ~ a q a ~ g q a a a c ~ q ~ q q a ~ q q q q c q q q q a q t c a a q c a q t a q a a a a a t ~ a ~ t t c c a t c a q c t g t t g c t t q c a q c ~ q c a q q q t t ~ c r c t t q q n q g c t c c a q q c a q q c c t q c t t t t t q c ~ t q ~ a q a q q t a q q a c ~ a q g a q g t ~ c a ~ a q q g c q ~ c ~ ~ a q c ~ c t a a a c a q q t q q c q t c t t t c t c a t t q t c a a q a q q t t a t c a c a c a t c t t q q t t g a c a t t c c

~ q ~ c t a a q t c t a a c t c a a q a ~ q ~ c t c r r g c t t q c t q a c t c a t c t t c t c t t c t c t t c c c c t q c a q A G T T C C C A G C 76

A T C A T G A G G 6 C C T G G A T C T T C T T T C T C C T T T 6 C C ~ G 6 C C G 6 G A 6 6 G C C C T G G C A G C C C C T q t ~ ~ ~ t ~ C C C 9 ~ 1 4 5

~ e t A r g A l ~ T r p l l c P m e P h ~ L e ~ L e ~ C y s L e u A l ~ G l y A r ~ A ~ ~ L e u A ~ ~ A ~ ~ P r 0 - 1 7 1

a g a t q c ~ q t c a g a q ~ t c ~ a a c r c a t c c c g c r t t c r c c g ~ ~ t t c t c c ~ q c t t c t c a a c c t q c t c t q q q a g ~ ........... q c c a g a q a c a c t t c c q a q a a q a q q q t q q ~ c t t q q c c t q a c t a a c t c c c t q a a c c t t q c t c t c t t t t c c a g C A

G I

G C A 6 A C T G A A 6 T T G C T 6 A G G A G A T A G T G 6 A G G A G G A A A C C G T G G T G G A G G A G A C A G G G q t a C ~ t ~ t ~ t ~ ~ t 9 2 0 5

n G l n T h r G l u V a l A l a G l u G l u l l c V I 1 G l u G l u G l u T h r V ~ l V a l G l u G l u T h r G l y

t g g t q q t q q t q q t g q t t q a c c c c a c c c t g a t c c c . . . . . . . . . . g a c c t c c q c a c a q a t q a q q q t q q c c t g g c c c a q c c c t a q a t q c c c c t c a c r r t c r r c t t ~ t q q q t c t c c q a q c ~ t c a t q c a q q c c c c t t c t t c c t t q c a a c c c

t c t c c c a a q t c t q q q q ~ q q q q c q a q a a c c c c c q a a c t c ~ t t t t t t t t t t c t c t c c c ~ t q c c ~ g G T A C C T G T G

GGTGCCAACCCAGTCCAGGTGGAAAlGGGA6AATTT6A6GACGGTGCAGAGGAAACGGTC6AGGAGGTG6TG V a l P r o V a l

G l y A l ~ A s n P r o v a 1 G l s V b l 6 l ~ ~ e t G l y G l u P m e G l ~ A s p G l y A l a G l u G l u T h r V a l G l u ~ l u V ~ l V ~ l

146

1 0 2 0

206

30 40

G c T 6 A c A q t a a g r c t c t r c c a t g t c a c t t q q c t t a t c t q g a t c a c c c c c ~ q c t q t q t g q g t q q a q q c c t a a q 293

A l a A s p A 5 0 q a ~ a q ~ ~ ~ a ~ a q q a ~ q t t t ~ c a q t c a q q g q c a t q c a t t q t g q a a q c t c t q q t a g c t q t q q a a q q c c a q c t q q

qaaccc tcaaqcn .qq . . . . . . . . . . a t t a t q t t c l q t t n c c c a q A C C C C T G C C A G A A C C A T C A T T G C A A A C snPrOCysGlnAsnHisHisCysLysH

ATG6CAAG6TGTGT6AGCTGGACGAGAGCAACACCCCCATGTGT6lGTGCCAG6ACCCCACCAGCT6CCCTG i r G l y L y s V ~ 1 C y s G l u L e u A s p G l ~ S ~ r A s n T h r P r o n e t C y s V ~ l C y s G l n A s p P r o T h r S e r C y s P r o A

294

60

4 1 5 C T C C C A T T G G C G A 6 T T T G A G A A G g t a a q a q c t g q c c t g q q c c c a g c ~ a ~ q c a c a a t g t n t q t a ~ ~ ~ a ~ t c t c l ~ P r o I l e 6 1 y 6 1 u P m e 6 1 u L y s

c q q a c a c q q c a a q q c ~ q q a q a q a t g g c t a g n t q t t a a q ~ q c a c t q q c t q c t c t t c c a q a q 9 t c c t ~ ~ ~ t t c c q q q ~ c c c a q c a a c c a c a t ~ a t q q ~ t c ~ ~ ~ a c t a t c t q I ~ ~ t g q q ~ t c c .......... a a t c c c g q q t c ~ q q

C ~ ~ ~ ~ ~ ~ a a ~ ~ a d ~ ~ a ~ 9 t ~ C C C t ~ t ~ c t t ~ ~ c t c a c a q G T A T G C A G C A A T G A C A A C A A G A C C T T C G A C l C T

7 0 8 0

90

416

ValCysSerAsnAspAsnLysTmrPmeAs~Sc~

TCCTGCCACTTCTTTGCCACCAAGTGCACCCTGGAGGGCACCAAGAAGGGCCACAAGCTCCACCTGGACTAC 1 0 0

S e r C y s H i s P D e P m e A l b T h r L y s C y s T h ~ L ~ ~ G l ~ G l y T m ~ L y s ~ y s G l y m i s L y s L e ~ U i s L e ~ A s p T y ~ 1 1 0 1 2 0

A T C G G A C C A T G C A A A T 9 t q a q t q t c c t c a t ~ c c t t c c a g c ~ t ~ ~ t ~ c c ~ g c c t ~ ~ t t c t c a g q q ~ q a t q q t t 536

I I e G l y P r a C y r L y r T

t q ~ t a d c a c c a t t ~ c c f c t . . . . . . . . . . g a q C t C t 9 q t c f t g g L t g ~ C C t C t t ~ t t a t 9 t 9 9 ~ t ~ ~ 9 ~ ~ ~ ~

a ~ q q a c r c t ~ ~ q ~ r a c c t q c t 9 ~ t ~ ~ c q ~ c c c t ~ c ~ q G T G A A G A A G A l C C A T G A G A A l G A G A A G C G C C T G G A 6 7 1

V ~ 1 L y s L y ~ l l e H i s G l u A s ~ G l ~ L y s A r g L e u G 1

GGCTGGA6ACCACCCCGTGKAGCTGTTGGCCCGA6ACTTTGAGAAGA~CTACAATAT6TACAlCTTCCCTGT ~ ~ ~ a G l y A s o ~ i s P r o V ~ I G l u L e ~ L e ~ A l ~ A r g A 5 o P h e G I ~ ~ L y s A s n T y r A s n ~ e t T y r I l e P ~ e P r o V b

. "

180

1 9 0 200 819

C C A C T G G C A G T T T G G C C A G C l G 6 A T C A G C A C C C T A T T 6 A T G G q t a ~ q a ~ t c ~ ~ ~ t c t a ~ .......... Caq lHisTroGlnPmeGlyGlnLeuA,pblnH1SProllcArpGl

c c c ~ q 9 g c t c c t a c c a c t g q t ~ t ~ c c ~ t ~ c t q q a c t q ~ g t q t t ~ t c ~ c a c a c a c a c c c t t c c c t a c t c t t c c 220

~ ~ ~ C ~ ~ ~ G T A C C T G T C C C A C A C T G A G C T G G C C C C A C T G C G ~ G C T C C C C T C A T C C C C A T ~ G A A C A T T ~ C A C C A y T y r L e ~ S e r H i s T n r G l u L ~ u A l a P r o L e u A r q A l a P r o L e u ~ l e P r o ~ e t G l u H ~ s C Y s ~ ~ r ~

CACGTTTCTTTGAGACCTGTGACCTAGACAACGACAAGTACATTGCCCTGGAGGAATGGGCCGGCTGCTTTG h r A r q P n e P m e G l u T h r C v ~ A s o L ~ ~ ~ s o A s n A s ~ ~ v s T v r l l ~ A l a L e u G l ~ G l u T r ~ A l ~ 6 ~ v C v s P ~ e ~

210

8 2 0

230 240

250 Z b U Z I U 968

G C A T C A A G G A G C g t q a q t g ~ ~ t q q q c t t ~ a g g a ~ ~ ~ ~ a a ~ a a q t g t c c t t ~ ~ 9 a 9 a t ~ ~ 9 a t ~ t ~ 9 t ~ C C ~ ~ l y l l e L y s G l u G

c c c c t c c q q r q q c r t t q c c t q c t q t ~ t q t .......... c r r c a c t r a r q t a c c t q a t a t a t t t q a t c a t c t t c a t q g c t c a ~ t t c c c ~ c t c c t q c c c c a c t c c c a c t g r c c c c c t t c t c t t c c c q a c c a q c c c c c t c c t ~ t ~

t t c a a q c q q ~ a q a q ~ c t t a t t t t a a c c ~ c q c c c c t t c t t c c t t t q c ~ q A G G A C A T C A A C A A G G A T C T G 6 T G A InksplleAsnLysAsPLeuVall

TCTAAGTTCACGCCTCCTGCTGCAGTCCTGAACTCTCTCCCTCTGATGTGTCACCCCTCCCATTACCCCCTT

GTTTAAAATKTTTGGATGGTGGCTGTTCCGCCTGGGGATAAGGTGCTAACATAGATTTAACTGAATACATTA ACGGTGCTAAAAAAAAAAAAAAAACAAAGTAACAAA6AAAClAGAACCCAAGTCACA6CATTlTCCCACATA ACTCTGAGGCCAT6GCCCATCCACAGCCTCCT6TCCCCTGCACTACCCAGTGTCTCACTGGCTGT6TTGGAA ACGGAGTTGCATAAGCTCACCGTCCACAAGCACGAGATATCTCTAGCTTTGGTTTGTATTTT6CATTTGACT CTTAACACTCACCCAGACTCTGTGCTTATTTCATTTTGGGGGGATGTGGGCTTTTTCCCCTK6TGGTTTGGA GTTAGGCA6A~SGAAGTTACAGACACAGGTACAAAATTTGGGTAAAGATACTGTGAGACCTGAGGACCCACC AGTCAGAACCCACATGGCAAGTCTTAGTAGCCTAGGTCAAGGAAAGACAGAATAATCCAGAGCTGTGGCACA

TGTTGGTTTTAGTTTTGGTGAGCCATG6GTGG6CCAGAACATCACTCAACT6CAATTGGGCTTTCAGGTTCT CATGACAGACTCCCA6CAGCCCGGGACCTTGCTGTCTTCTCGACTCTTCGG6CGTTlCTTTCCATGTTTGGC

TGCCGGGAGCTCTA6GCACTGGGA6GCTGTTlCAG6AAAGTGAGACTCAAGAGGAAGACAGAAAAGGTTGTA LCGTAGAGGAAGTGAGACTCGTGAATTGGTTCGATTTTTTTCACATCTAGATGGCTGTCATAAAGTTTCTAG CATGTTCCCCCTCACCTCTCCCCACCCCCTGCCACTTGAAACCTTCTACTAATCAAGAGAAACTTCCAAGCC AACGGAATGGTCAGATCTCACAGGCTGAGAAATTGTTCCCCTCCAAGCATTTCATGAAAAAGCTGCTTCTCA T T A A C C A T G C A A A C T C T C A C A G C G A T G T G A A G A 6 C T

A A C t G c t q ~ ~ c t q t c c c q q g t q ~ t q q t ~ t q t g t q t c t t a c t c t ~ ~ g q t t c t t c g I t t t ~ ~ q q I g t

969

2 8 0

l e

2078

FIG. 1. A , restriction map of the murine SPARC gene. All BamHI ( B ) , EcoRI (E), and Hind111 ( H ) sites are displayed. The 10 exons are represented by solid vertical bars. The approximate 5' and 3' limits of the cosmid clones p30g3 and pcosSP13 are shown by the arrows. B, partial nucleotide sequence of the Sparc gene. The probable transcription initiation site (C at +1) was determined by RNase protection and primer extension analysis (see Fig. 2). The repeated motif 5'GGAGG3' and the EF-hand domain are underlined, the 9-bp inverted repeat is indicated by arrows, the 5'GCCT3' repeat is boxed, the nucleotides corresponding to the transcription factor AP-2 consensus sequence are marked by filled circles.

LD1 cells (Fig. 3, lanes 2, 3, 7, and 8). No activity was seen with the pUC-cat DNA alone (Fig. 3, lane 1 and 6), showing that the 5' SPARC region was not enhancing transcription from a promoter within the plasmid vector.

DISCUSSION

The data presented here are entirely consistent with the existence of a single Sparc gene, which we have previously assigned to mouse chromosome 11, band 1D (Mason et al., 198613). The pattern of hybridization of SPARC cDNAs to large genomic DNA fragments separated by pulsed field gel electrophoresis is also consistent with a single gene in both

mouse (Barlow et al., 1987) and human (Swaroop et al., 1988). These results argue convincingly for SPARC and osteonectin being identical proteins and not products of a closely related gene family. In human tissues, a major 2.2-kb and a minor 3- kb SPARC RNA are detected by Northern analysis and cDNA cloning. The larger RNA is due to run-through transcription and the use of an alternative polyadenylation signal (Swaroop et al., 1988)), but in mouse there is no evidence for such alternative 3' ends.

The mouse SPARC gene spans about 26.5 kb of DNA and consists of 10 exons interrupted by nine introns. One question we have asked is whether these exons correspond with the structural domains predicted for the protein, some of which

11114 SPARC Gene Organization

C 1 2

A Exon I

N Pv PHPN Pv Bo1 II

5' ya

160- 147-

123- 110-

90-

76

516- 394 - 298- 210-

154- - 75-

primer

FIG. 2. Determination of the 5' end of the SPARC gene. A , the first exon is depicted by a box. The bars below (PI (747 bp) and Pz (1500 bp)) signify the probes used in the RNase protection experiments. H = HindIII, N = NcoI, P u = PuuII; P = PstI. B, RNase protection. Hybridization was carried out with 1 X lo5 dpm of antisense RNA (NcoI-HindII1) and 6 pg of total RNA at 55 "C for 16 h. RNase digestion was performed at 33 "C for 75 min. Protected fragments were analyzed on an 8% sequencing gel. The molecular weight markers are denatured MspI fragments of pBR322. SK = skull (calvaria) of 16.5-day postcoitum embryo; LG = adult lung; AM = 13.5-day amnion; PL = 13.5-day placenta; F9- and F9+ = undifferentiated and differentiated F9 cells, respectively; TS = adult testis; A D = adult adrenal; -RNA = no RNA control. C, primer extension. The primer was a 22-nucleotide strand-separated fragment (see "Materials and Methods") 32P-labeled at the 5' end. Annealing was carried out with 17.5 pg of primer, with (lane 2 ) and without (lane I ) 1 pg of parietal endoderm poly(A)+ RNA at 70 "C overnight and extension carried out a t 42 "C for 1 h. Products were analyzed on a 8% sequencing gel. The molecular weight markers are denatured HinfI fragments of pAT153. Similar results were obtained with poly(A)' RNA from differentiated F9 cells and the parietal endoderm cell line F9AcC19.

3-ACM *. e 1-ACM " 0

* e 0 0

0 0 0 e. 0 .

r - r r

1 2 3 4 5 6 7 8 9 -PYs", EN2 -l"

. .

FIG. 3. Chloramphenicol acetyltransferase enzyme activity in mouse PYS and L cells transfected with a 5' pSP/639-cat. PYS (lanes 1-4) and L cells (lanes 6-9) were transfected with 15 pg of either the promoterless pUC-cat DNA (lanes 1 and 6) or pSP/639- cat (lanes 2 , 3 , and 7,8) as described under "Materials and Methods." Controls are plates receiving no DNA (lanes 4 and 9) and reaction mix with 0.4 units of chloramphenicol acetyltransferase enzyme (lane 5). CM, chloramphenicol, I-ACM, 1-acetyl chloramphenicol, 3-ACM, 3-acetyl chloramphenicol.

could be "shared" with other proteins. The N-terminal glu- tamic acid-rich domain, consisting of two cy-helical segments, is analogous to the Ca2+-binding Gla domain of the vitamin K-dependent clotting factors in that it exhibits a similar number and density of negative charges and is predicted to bind calcium (Engel et al., 1987). However, the Gla domain in the vitamin K-dependent clotting factors is encoded by a single exon, whereas in SPARC the domain is encoded by

exons 3 and 4. A comparison of exons 3 and 4 and the sequence encoding the Gla domain of human prothrombin (Friezner Degen and Davie, 1987) reveals no nucleotide sequence simi- larity, nor is there any similarity between the nucleotide sequences of exons 3 and 4 themselves. Thus, the glutamic acid-rich domain does not appear to be homologous to the Gla domains, nor has it arisen by tandem duplication of either exon 3 or 4. The first glutamic acid-rich segment is the least highly conserved region of the mouse, human, and bovine SPARC/osteonectin protein and there is only 66% nucleotide sequence identity between the murine and human sequences of exon 3, compared with an overall sequence identity of 90% for the coding exons. However, at the protein level the number and density of negatively charged amino acids is maintained.

Parvalbumin, calmodulin, intestinal Ca2+-binding protein, troponin C, and a growing list of other proteins share a common Ca2+-binding structure, designated an EF-hand. The EF-hand structures of these proteins are thought to have evolved from an ancestral gene by duplication events (Kret- singer, 1980; Nojima, 1987), giving rise to four EF-hand domains in calmodulin, troponin C, and intestinal Ca2+-bind- ing protein. Parvalbumin shows homology to these proteins but retains only two functional Ca2+-binding domains. The SPARC protein has a single EF-hand structure encoded by a single exon (exon 9). We have analyzed all the exon and intron sequences surrounding exon 9 and the 3' untranslated region of the RNA for internal repeats or the presence of

SPARC Gene Organization 11115

redundant EF-hand sequences. No significant sequence sim- ilarities to rat calmodulin (Nojima, 1987), rat parvalbumin (Epstein et al., 1986), rabbit troponin C (Putney et al., 1983), or to a consensus sequence for the third Ca2+-binding domain of troponin C (Hardin et al., 1987) were found. It is therefore possible that the EF-hand domain of SPARC has evolved independently of the EF-hand structures of these Ca2’-bind- ing proteins.

The nucleotide sequence 639 bp upstream of the SPARC RNA start site contains neither a TATA box nor a CAAT box (Fig. lB, and data not shown). Many promoters recog- nized by RNA polymerase I1 have these two sequence ele- ments located around -30 and -80, respectively (Breathnach and Chambon, 1981). In vitro transcription studies have shown that the TATA box serves to fix the site at which transcription starts (McKnight and Kingsbury, 1982) and most genes which are lacking TATA boxes in their promoter regions have multiple transcription initiation sites. It is there- fore unexpected that only a single major transcription start site is seen for SPARC (Fig. 2). In many of the eukaryotic genes which lack both TATA and CAAT box sequences, e.g. hamster 3-hydroxyl-3-methylglutaryl-CoA reductase (Rey- nolds et al., 1984), human hypoxanthine-guanine phosphori- bosyltransferase (Patel et al., 1986), human adenosine deam- inase (Valerio et al., 1985), mouse thy-1 (Giguere et al., 19851, mouse hypoxanthine-guanine phosphoribosyltransferase (Melton et al., 1984), and human epidermal growth factor receptor gene (Ishii et al., 1985), the 5’ flanking sequences are rich in CG content. In contrast, the 5’ flanking sequence of the murine SPARC gene is purine-rich.

Sequence analysis of the 5’ flanking region of the SPARC gene has revealed a number of interesting sequence motifs. The sequence GGAGG is repeated six times between position -60 and -170 (Fig. 1B). This sequence is the complement of the motif CCTCC which has been found in the promoter region of the mouse hypoxanthine-guanine phosphoribosyl- transferase gene (Melton et al., 1984), human epidermal growth factor receptor gene (Ishii et al., 1985), human al(1) collagen gene (Chu et al., 1985), and both chicken and mouse 2(I) collagen genes (McKeon et al., 1984). In the chicken and mouse a2(I) collagen genes, direct repeats of the sequence CCCTCCC correspond to an SI-sensitive site in cloned DNA (McKeon et al., 1984) and a DNase I hypersensitive site in chromatin (Merlino et al., 1983). It has been proposed that these tandem repeats form a staggered loop structure, and a similar structure could be formed in the SPARC gene between the two tandem repeats at position -90 and -69 (Fig. 1B). A 9-base pair inverted repeat centered about nucleotide -99 was also noted. Finally, the sequence 5’GGCCAGGGGA3’ be- tween positions -117 and -108 is only 1 bp different from an inverted consensus binding site for the transcription factor AP-2 found in a number of promoters activated during F9 cell differentiation (SV40, polyoma, and H-2Kb) and in response to phorbol ester or CAMP (Imagawa et al., 1987).

Exon 1 specifies only 5’ untranslated sequence. However, within this sequence, the motif 5’GCCT 3’ is repeated seven times and four times in human (Swaroop et al., 1988) and bovine (Bolander et al., 1988) SPARC/osteonectin RNAs. This conserved sequence may be important in either the regulation of the gene or the stability of the RNA. Sequences that confer mRNA lability have been identified in the 3’ untranslated regions of a series of lymphokine, cytokine, and proto-oncogene mRNAs (Shaw and Kamen, 1986). Recently, sequences within the coding region of the first exon of a- tubulin have been shown to confer autoregulation on the RNA stability (Gay et al., 1987). Experiments are underway to

determine the sequence elements which are important in the tissue-specific regulation of SPARC gene expression during normal development, and in regulating induction during the differentiation of F9 teratocarcinoma cells.

Acknowledgments-We thank David Murphy for help in the initial isolation of genomic clones, Anne-Marie Frischauf for the mouse genomic cosmid library, and Elaine Ward for skilled technical assist- ance. Also Jurgen Engel, Robb Krumlauf, Peter Rigby, and Jeffrey Williams for many helpful suggestions and discussion throughout the work, and Lydia Pearson for preparing the manuscript.

REFERENCES

Auffray, C., and Rougeon, F. (1980) Eur. J. Biochem. 107,303-314 Barlow, D. P., Bukan, M., Lehrach, H., Hogan, B. L. M., and Gough,

Bolander, M. E., Young, M. F., Fisher, L. W., Yamada, Y., and Termine, J. D. (1988) Proc. Natl. Acad. Sci. U. S. A. 85,2919-2923

Breathnach, R., and Chambon, P. (1981) Annu. Reu. Biochem. 50,

Chu, M.-L., de Wet, W., Bernard, M., and Ramirez, F. (1985) J. Biol.

Engel, J., Taylor, W., Paulsson, M., Sage, H., and Hogan, B. L. M.

Epstein, P., Means, A. R., and Berchtold, M. W. (1986) J. Biol. Chem.

Friezner Degen, S. J., and Davie, E. W. (1987) Biochemistry 26,

Gay, D. A., Yen, T. J., Lau, J. T. Y., and Cleveland, D. W. (1987) Cell

Gigubre, V., Isobe, K.-I., and Grosveld, F. (1985) EMBO J. 4, 2017-

Gilbert, W. (1978) Nature 271,501 Gorman, C. M., Moffat, L. F., and Howard, B. H. (1982) Mol. Cell.

Hardin, S. H., Keast, M. J., Hardin, P. E., and Klein, W. H. (1987) Biochemistry 26,351&3523

Hogan, B. L. M., Costantini, F., and Lacy, E. (1986) Manipulating the Mouse Embryo, pp. 262-265, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Holland, P. W. H., Harper, S., McVey, J . H., and Hogan, B. L. M. (1987) J. Cell Biol. 105, 473-482

Imagawa, M., Chiu, R., and Karin, M. (1987) Cell 51, 251-260 Ishii, S., Xu, Y.-H., Stratton, R. H., Roe, B. A., Merlino, G. T., and

Pastan, I. (1985) Proc. Natl. Acad. Sci. U. S. A. 82,4920-4924 Kretsinger, R. H. (1980) Crit. Reu. Biochem. 8, 119-174 Krumlauf, R., Holland, P. W. H., McVey, J. H., and Hogan, B. L. M.

(1987) Deuelopment 99, 603-617 Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular

Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Mann, K., Deutzmann, R., Paulsson, M., and Timpl, R. (1987) FEBS

Mason, I. J., Taylor, A., Williams, J. G., Sage, H., and Hogan, B. L.

Mason, I. J., Murphy, D., Munke, M., Francke, U., Elliott, R. W.,

Maxam, A. M., and Gilbert, W. (1980) Methods Enzymol. 65, 499-

McKeon, C., Schmidt, A., and de Crombrugghe, B. (1984) J. Biol.

McKnight, S. L., and Kingsbury, R. (1982) Science 217, 316-324 Melton, D. W., Konecki, D. S., Brennand, J., and Caskey, C. T. (1984)

Merlino, G. T., McKeon, C., de Crombrugghe, B., and Pastan, I.

Nojima, H. (1987) FEBS Lett. 217,187-190 Nomura, S., Wills, A. J., Edwards, D. R., Heath, J . K., and Hogan,

B. L. M. (1988) J. Cell Biol. 106, 441-450 Patel, P. I., Framson, P. E., Caskey, C. T., and Chinault, A. C. (1986)

Mol. Cell. Bid. 6 , 393-403 Poustka, A.-M., Rackwitz, H.-R., Frischaut, A,”., Hohn, B., and

Lehrach, M. (1984) Proc. Natl. Acad. Sci. U. S. A . 81, 4129-4133 Putney, S. D., Herlihy, W. C., and Schimmel, P. (1983) Nature 302,

Reynolds, G. A., Basu, S. K., Osborne, T. F., Chin, D. J., Gil, G.,

N. M. (1987) EMBO J. 6,617-623

349-383

Chem. 260,2315-2320

(1987) Biochemistry 26,6958-6965

261,5886-5891

6165-6177

50,671-679

2024

Biol. 2,1044-1051

Lett. 218,167-172

M. (1986a) EMBO J. 5 , 1465-1472

and Hogan, B. L. M. (1986b) EMBO J. 5,1831-1837

560

Chem. 259,6636-6640

Proc. Natl. Acad. Sci. U. S. A . 81, 2147-2151

(1983) J. Bid. Chem. 258, 10041-10048

718-721

11116 SPARC Gene Organization

Brown, M. S., Goldstein, J. L., and Luskey, K. L. (1984) Cell 38, McGarvey, M. L., and Martin, G. R. (1981) Cell 26, 99-105 275-285 Valerio, D., Duyvesteyn, M. G. C., Dekker, B. M. M., Weeda, G.,

Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Nutl. Acad. Berkvens, Th. M., van der Voorn, L., van Ormondt, H., and van

Shapiro, M. B., and Senapathy, P. (1987) Nucleic Acids Res. 15, Williams, J. G., and Mason, P. J. (1985) in Nucleic AcidHybridization

Shaw, G., and Kamen, R. (1986) Cell 46,659-667 (Hames, B. D., and Higgins, S. J., eds) pp. 139-160, IRL Press, Oxford

Swaroop, A., Hogan, B. L. M., and Francke, U. (1988) Genomics 2, Young, M. F., Bolander, M. E., Day, A. A., h i s , C. I., Robey, P.

Termine, J. D., Kleinman, H. K., Whitson, S. W., Conn, K. M., 4483-4497

Sci. U. S. A. 74,5463-5467 der Eb, A. J. (1985) EMBO J. 4,437-443

7155-7174

37-47 G., Yamada, Y., and Termine, J. D. (1986) Nucleic Acids Res. 14,