the journal of bi~l~cical chemistxy vol 269, no. 52, …0 1994 by the american society for...

10
0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol . 269, No. 52, Issue of 'December ' 30, PP. 32999-33008, 1994 Printed m U.S.A. Characterization of the Complete Genomic Structure of the Human Versican Gene and Functional Analysis of Its Promoter* (Received for publication, August 23, 1994) Michael F. NasoS, Dieter R. Zimmermanng, and Renato V. Iozzon From the Department of Pathology and Cell Biology and the Jefferson Cancer Institute, Thomas Jefferson University, Philadelphia, Pennsylvania 19107 and the §Institute of Clinical Pathology, Department of Pathology, University of Zurich, CH-8091 Zurich, Switzerland Versican is a modular proteoglycan involved in the control of cellular growth and differentiation. To under- stand versican gene regulation and transcriptional con- trol, we have isolated genomic clones spanning the en- tire gene locus including 5'- and 3"flanking sequences. Versican was encoded by 15 exons encompassing over 90 kilobase pairs of continuous DNA. The exon organiza- tion corresponded to the protein subdomains encoded by homologous proteins, with a remarkable conserva- tion of exon size and intron phase. We discovered an additional exon just proximal to the glycosaminoglycan- binding region that was identical to a recently identified splice variant of versican (Dours-Zimmermann, M. T., and Zimmermann, D. R. (1994) J. BioZ. Chern. 269,32992- 32998). The versican promoter harbored a typical TATA box located approximately 16 base pairs upstream of the transcription start site and binding sites for a number of transcription factors involved in regulated gene expres- sion. This promoter was shown to be highly functional in transiently transfected cells of both mesenchymal and epithelial origin. Stepwise 5' deletions identified a strong enhancer element between -209 and -445 base pairs and a strong negative element between -445 and -632 base pairs. This study provides the molecular basis for discerning the transcriptional control of the versi- can gene and offers the opportunity to investigate ge- netic disorders linked to this important human gene. ~ .~ Versican is a large modular chondroitin sulfate proteoglycan, originally isolated from fibroblasts, consisting of a protein core to which 12-15 chondroitin sulfate side chains are covalently attached (1,2). It is expressed by fibroblasts (1-3), as well as by keratinocytes (4) and arterial smooth muscle cells (5, 6). Im- munohistochemical experiments have revealed versican pro- tein in close association with blood vessels (7) and in connective tissues of various organs, including the epidermis and dermis of the skin (4). This gene product is also developmentally regu- lated in the central nervous system (8-10). Versican belongs to ~ CA-39481 and CA-47282 (to R. V. I.) and by the Swiss National Science * This work was supported by National Institutes of Health Grants Foundation Grant 31-28882.90 and the Cancer Foundation of the Can- ton of Zurich (to D. R. Z.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequencefs) reported in this paper has been submitted to the GenBankmIEMBL Data Bank with accession number(s) U15963. $ This work is for partial fulfillment of a doctoral thesis in pathology and cell biology. 7 Recipient of a Faculty Research Award from the American Cancer Society (FRA-376). To whom correspondence and reprint requests should be addressed: Dept. of Pathology and Cell Biology, Thomas Jef- ferson University, Rm. 249, Jefferson Alumni Hall, 1020 Locust St., Philadelphia, PA 19107. Tel.: 215-955-2208; Fax: 215-923-7969. a growing family of extracellular proteoglycans that bind to hyaluronan, of which aggrecan, the cartilage-specific proteogly- can, is the prototype (11, 12). Other members of this family include a brain proteoglycan, neurocan (131, a developmentally regulated chicken proteoglycan (PG-M)' (141, and brevican (E), a proteoglycan from bovine brain containing an abbreviated glycosaminoglycan-binding region. As deduced from cDNA cloning, all the members of this gene family have a character- istic hyaluronan binding domain at their amino terminus and a selectin-like domain at their carboxyl terminus. These terminal globular domains are connected by an extended domain of vary- ing length that harbors the glycosaminoglycan attachment re- gion (2, 12-15). A proposed function for versican is to connect cells with the extracellular matrix by binding some as yet uni- dentified cell surface moiety through its carboxyl-terminal end and binding hyaluronan through its amino-terminal end (2). As a result, versican may take part in the regulation of cell motil- ity, growth, and differentiation (2, 16). This is in contrast to aggrecan, which is believed to function primarily in the resist- ance of compressional forces acting on cartilage by reversibly binding large amounts of water while remaining anchored to the matrix through interaction with hyaluronan (17). Unlike aggrecan, a molecule endowed with over 100 chondroitin sul- fate chains and a number of keratan sulfate chains, versican contains just a few glycosaminoglycans in a similarly extended protein core (1,2). This supports the idea that tissue hydration may not be the sole function of versican. We have previously shown that the versican gene is hypo- methylated in colon carcinoma tissue (181, a mechanism by which versican may be overexpressed. More recently we have isolated partial humangenomic clones and used them for map- ping the versican gene (chondroitin sulfate proteoglycan 2) to human chromosome 5q12-14 (19). In order to understand in more detail how the versican gene expression is regulated, we have now determined thecomplete organization of the human versican gene and investigated the functional activity of its promoter. The results revealed a modular gene encoded by 15 exons spanning more than 90 kb of genomic DNA. Of note, exon sizes and phases corresponded to exons coding for similar do- mains in homologous molecules. Using cDNA probes generated by anchored PCR, we discovered an additional exon of approxi- mately 3 kb that was identical to a recently identified spliced variant of versican (65) and strongly homologous to the alter- natively spliced region of PG-M (14). Because of the lack of cysteines and thepresence of GAG binding sites, we have des- ignated this region as GAG-a, while the more distal region The abbreviations used are: PG-M, proteoglycan-M; PCR, polymer- ase chain reaction; GAG, glycosaminoglycan; EGF, epidermal growth factor; SCR, short consensusrepeat; CRP, complement regulatory pro- tein; kb, kilobase pair(s); bp, base pair(s); CAT, chloramphenicol acetyl- transferase; XRE, xenobiotic responsive element; PIPES, 1,4-pipera- zinediethanesulfonic acid; AP2, activator protein 2. 32999

Upload: others

Post on 23-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol . 269, No. 52, Issue of 'December ' 30, PP. 32999-33008, 1994

Printed m U.S.A.

Characterization of the Complete Genomic Structure of the Human Versican Gene and Functional Analysis of Its Promoter*

(Received for publication, August 23, 1994)

Michael F. NasoS, Dieter R. Zimmermanng, and Renato V. Iozzon From the Department of Pathology and Cell Biology and the Jefferson Cancer Institute, Thomas Jefferson University, Philadelphia, Pennsylvania 19107 and the §Institute of Clinical Pathology, Department of Pathology, University of Zurich, CH-8091 Zurich, Switzerland

Versican is a modular proteoglycan involved in the control of cellular growth and differentiation. To under- stand versican gene regulation and transcriptional con- trol, we have isolated genomic clones spanning the en- tire gene locus including 5'- and 3"flanking sequences. Versican was encoded by 15 exons encompassing over 90 kilobase pairs of continuous DNA. The exon organiza- tion corresponded to the protein subdomains encoded by homologous proteins, with a remarkable conserva- tion of exon size and intron phase. We discovered an additional exon just proximal to the glycosaminoglycan- binding region that was identical to a recently identified splice variant of versican (Dours-Zimmermann, M. T., and Zimmermann, D. R. (1994) J. BioZ. Chern. 269,32992- 32998). The versican promoter harbored a typical TATA box located approximately 16 base pairs upstream of the transcription start site and binding sites for a number of transcription factors involved in regulated gene expres- sion. This promoter was shown to be highly functional in transiently transfected cells of both mesenchymal and epithelial origin. Stepwise 5' deletions identified a strong enhancer element between -209 and -445 base pairs and a strong negative element between -445 and -632 base pairs. This study provides the molecular basis for discerning the transcriptional control of the versi- can gene and offers the opportunity to investigate ge- netic disorders linked to this important human gene.

~ .~

Versican is a large modular chondroitin sulfate proteoglycan, originally isolated from fibroblasts, consisting of a protein core to which 12-15 chondroitin sulfate side chains are covalently attached (1,2). It is expressed by fibroblasts (1-3), as well as by keratinocytes (4) and arterial smooth muscle cells (5, 6). Im- munohistochemical experiments have revealed versican pro- tein in close association with blood vessels (7) and in connective tissues of various organs, including the epidermis and dermis of the skin (4). This gene product is also developmentally regu- lated in the central nervous system (8-10). Versican belongs to

~

CA-39481 and CA-47282 (to R. V. I.) and by the Swiss National Science * This work was supported by National Institutes of Health Grants

Foundation Grant 31-28882.90 and the Cancer Foundation of the Can- ton of Zurich (to D. R. Z.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequencefs) reported in this paper has been submitted to the GenBankmIEMBL Data Bank with accession number(s) U15963.

$ This work is for partial fulfillment of a doctoral thesis in pathology and cell biology.

7 Recipient of a Faculty Research Award from the American Cancer Society (FRA-376). To whom correspondence and reprint requests should be addressed: Dept. of Pathology and Cell Biology, Thomas Jef- ferson University, Rm. 249, Jefferson Alumni Hall, 1020 Locust St., Philadelphia, PA 19107. Tel.: 215-955-2208; Fax: 215-923-7969.

a growing family of extracellular proteoglycans that bind to hyaluronan, of which aggrecan, the cartilage-specific proteogly- can, is the prototype (11, 12). Other members of this family include a brain proteoglycan, neurocan (131, a developmentally regulated chicken proteoglycan (PG-M)' (141, and brevican ( E ) , a proteoglycan from bovine brain containing an abbreviated glycosaminoglycan-binding region. As deduced from cDNA cloning, all the members of this gene family have a character- istic hyaluronan binding domain at their amino terminus and a selectin-like domain a t their carboxyl terminus. These terminal globular domains are connected by an extended domain of vary- ing length that harbors the glycosaminoglycan attachment re- gion (2, 12-15). A proposed function for versican is to connect cells with the extracellular matrix by binding some as yet uni- dentified cell surface moiety through its carboxyl-terminal end and binding hyaluronan through its amino-terminal end (2). As a result, versican may take part in the regulation of cell motil- ity, growth, and differentiation (2, 16). This is in contrast to aggrecan, which is believed to function primarily in the resist- ance of compressional forces acting on cartilage by reversibly binding large amounts of water while remaining anchored to the matrix through interaction with hyaluronan (17). Unlike aggrecan, a molecule endowed with over 100 chondroitin sul- fate chains and a number of keratan sulfate chains, versican contains just a few glycosaminoglycans in a similarly extended protein core (1,2). This supports the idea that tissue hydration may not be the sole function of versican.

We have previously shown that the versican gene is hypo- methylated in colon carcinoma tissue (181, a mechanism by which versican may be overexpressed. More recently we have isolated partial human genomic clones and used them for map- ping the versican gene (chondroitin sulfate proteoglycan 2) to human chromosome 5q12-14 (19). In order to understand in more detail how the versican gene expression is regulated, we have now determined the complete organization of the human versican gene and investigated the functional activity of its promoter. The results revealed a modular gene encoded by 15 exons spanning more than 90 kb of genomic DNA. Of note, exon sizes and phases corresponded to exons coding for similar do- mains in homologous molecules. Using cDNA probes generated by anchored PCR, we discovered an additional exon of approxi- mately 3 kb that was identical to a recently identified spliced variant of versican (65) and strongly homologous to the alter- natively spliced region of PG-M (14). Because of the lack of cysteines and the presence of GAG binding sites, we have des- ignated this region as GAG-a, while the more distal region

The abbreviations used are: PG-M, proteoglycan-M; PCR, polymer- ase chain reaction; GAG, glycosaminoglycan; EGF, epidermal growth factor; SCR, short consensus repeat; CRP, complement regulatory pro- tein; kb, kilobase pair(s); bp, base pair(s); CAT, chloramphenicol acetyl- transferase; XRE, xenobiotic responsive element; PIPES, 1,4-pipera- zinediethanesulfonic acid; AP2, activator protein 2.

32999

Page 2: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

n n l I I I nl I

1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15

u UUUL."JU, uu 5' UTR SP H BR GAGa GAG P EGF LECTIN CRP 3' UTR

I I

I I I

I I I

DOMAIN I II 1 1 1 j Iva IVb IVC j I I

IV LEC-CAM FIG. 1. Schematic representation of the genomic clones encompassing the human versican gene (top) and introdexon organiza-

tion (bottom). The position and size of each of the 11 genomic clones (VS) encompassing the 15 exons of the versican gene are shown. Genomic clones isolated from the lung fibroblast library are represented by shaded rectangles, whereas clones isolated from the chromosome 5-specific library are represented by open rectangles. Exons are represented by filled rectangles, and introns by thin lines. Empty boxes signify untranslated regions. Open circles in introns indicate that the exact size of that intron is not known. The four major domains are indicated by roman numerals. SP, signal peptide; HBR, hyaluronan-binding region; GAG-a, glycosaminoglycan a-binding domain; GAG-p, glycosaminoglycan P-binding domain; EGF, epidermal growth factor repeat; CRP, complement regulatory protein repeat; UTR, untranslated region.

encoded by exon 8 was designated GAG+ The versican pro- moter contained a variant TATA box as well as some other significant cis-acting elements and displayed functional activ- ity in transient transfection assays. Deletion analysis revealed the presence of a strong enhancer element within this promoter and a possible negative regulatory element. These results pro- vide molecular clues as to the transcriptional control of versi- can and offer the opportunity to study its regulation and role in human pathology.

EXPERIMENTAL PROCEDURES Materials-All reagents were of molecular biology grade. Radionucle-

otides [CY-~~PI~CTP, [a-32P]dUTP, and [Y-~~PIATP (-3000 Ci/mmol; 1 Ci = 37 GBq) and [a-36SldATP (-1000 CYmmol) were obtained from Amer- sham Corp.

Isolation and Characterization of Genomic Clones-To isolate the entire human versican gene, we screened three different genomic li- braries, including two MboI-generated phage libraries in Lambda Fix I and I1 (Stratagene) and one EcoRI-generated chromosome 5-specific library in Charon 21A(ATCC 57720). All probes were generated by PCR based on the previously isolated cDNA (2) and were random-prime labeled (20). Approximately lo6 recombinant clones were screened from each of the Lambda Fix libraries, and lo4 clones were screened from the Charon 21A library. Genomic clones positive on quaternary screening were analyzed by Southern blotting, and the pertinent fragments were subcloned into pBluescript (Stratagene). DNA was sequenced by a modified dideoxynucleotide chain termination method or by an auto- mated sequencing system (Applied Biosystems) using primers based on either the T3 or T7 polylinker sequences of pBluescript or synthetic oligonucleotides. Computer analyses were performed using the GCG or PC/GENE computer packages as described (21).

Polymerase Chain Reaction Amplification of cDNA-To amplify al- ternatively spliced forms of the human versican message, cDNA pre- pared from a A Zap CRL1262 cDNA library (Clontech) was subjected to the polymerase chain reaction using primers from the 3' end of exon 6 or the 5' end of exon 8 with T3 or T7 in the )I phage vector. The reaction was performed under the conditions of 3 cycles a t 94 "C for 45 s, 55 "C for 45 s, and 72 "C for 90 s and then 35 cycles at 94 "C for 45 s,53 "C for 45 s, and 72 "C for 90 s. The products were then analyzed by agarose gel electrophoresis. Relevant bands were subcloned into pBluescript and sequenced by an automated system as above with the primers used for the original amplification.

RNase and SI Nuclease Protection Assays-For RNase protection assay, approximately 50 pg of total RNA isolated from normal human diploid primary skin fibroblasts was hybridized to a 876-nucleotide riboprobe. The riboprobe was complementary to the sense strand of the first 220 bases of the versican cDNA (2) and the first 656 bases of the 5'-flanking region. It was prepared by cloning the respective PstL'EeoRI fragment from clone VS51 downstream of the T7 promoter of pBlue- script using the corresponding sites in the polylinker of pBluescript. The in vitro transcription reaction was carried out in the presence of ATP, GTP, CTP (all 0.5 m ~ ) , UTP (10 p), [a-32PlUTP (3000 mCYmmol), and T7 RNA polymerase for 30 min at 37 "C. The resulting riboprobe was purified through a G-50 column (5 Prime 3 Prime, Inc.), followed by NaOAdethanol precipitation. About 5 x lo6 cpm of the riboprobe was then hybridized with RNAovernight at 45 "C in hybridization buffer (40 mM PIPES, 400 m~ NaC1, 1 m~ EDTA, 80% deionized formamide, pH 6.4). The sample was then diluted with RNase digestion buffer (10 mM Tris-HC1,5 m~ EDTA, 300 m~ NaOAc, pH 7.5) and digested for 30 min at 30 "C with 10 units each of RNase T1 and RNase A. The digestion was stopped with SDS (0.5%) and proteinase K (0.125 pg/pV for 15 min at 37 "C and purified with phenoUchlorofodisoamy1 alcohol (25:24:1) extraction and NaOAdethanol precipitation. The precipitated samples were resuspended in formamide-containing buffer and analyzed on a 6% polyacrylamidelurea sequencing gel.

For the S1 nuclease protection assay, we used approximately 50 pg of total RNA from normal human diploid primary skin fibroblasts and a 5' end-labeled 749-bp BglIIlEcoRI fragment from genomic clone VS51 containing the first 93 bp of the versican cDNA (2) and 656 bp of 5'-flanking region. The RNA was hybridized to 5 x lo6 cpm of the 5"labeled fragment for 30 min at 37 "C in 40 m~ HEPES, pH 6.5,0.4 M

NaC1, 1 mM EDTA, 80% formamide. After hybridization, the samples were diluted in S1 digestion buffer (500 m~ NaCl, 10 m~ ZnSO,, 50% glycerol, 300 m~ NaOAc, pH 4.6) and incubated with 500 units of S1 nuclease for 30 min at 37 "C. The samples were precipitated with etha- nol, resuspended in formamide-containing buffer, and analyzed on a 6% polyacrylamidelurea denaturing gel.

Construction of Nested Deletions in the Promoter of the Versican Gene-Clone VS51, which contained exon 1 and 632 bp of the 5'-flank- ing region, was excised with EcoRI using the site in pBluescript and rendered blunt ended with the Klenow fragment of DNA polymerase I. This linear plasmid was then cleaved with PstI using the site in exon 1, and the resulting 876-bp fragment was purified on an 0.8% agarose gel. The pUCCAT vector was cleaved with HindIII, made blunt ended as above, and digested with Pstl. The resulting linearized vector was pu-

Page 3: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

Domain No.

5'UT E1 SP [2

r-

I1 GAG Q E 7

If1 GAG E 8

IVa

IVb LECTIN 12

IVC CRP E 4

3'UTR E 5

Size (bp) Exon

284

76

375

175

128

294

2961

52 62

114

114

159

83

145

183

1612

The Human Versican Gene

5 Splice Donor

CCAAGgtaat

TAAAGgtgag

GGATGgtaag

GTCAGgtaag

GGATGgtaag

TAAACgtaag

GACAGgtaag

ACCAGgtaag

ACTTGgtaag

GCAAGgtaag

TAATCgtatg

CACTGgtaag

AACAGqtaat

GAACCgtaag

-

Intron Size (kb)

>8

5.7

3.1

84 (bp)

>6

6.2

>4

3.3

2.0

5.2

1.5

>21.9

6.6

144 (bp)

3 Splice Acceptor

tctagGCCAA

tctagTCAAA

ggcagGGGTT

cccagATATC

tccagGTGAT

tctagCTAAA

ttcagGTCGA

ggtagGACCT

ttcagATTTT

ctcagATACC

cttagGTGTG

tacagCAATA

tttagTTGCT

tgtagCATCT

-

Codon Phase

-

I

I

I1

I

I

I

I

I

I

I

0

I

I

-

33001

Amino Acid

-

Val-24

Gly-149

Arg-207

Gly-250

Pro-348

Gly-1335

Gly-3089

Asp-3127

Asp-3165

Arg-3218

Lys/Gln-3245

Val-3294

Pro-3355

-

FIG. 2. Analysis of the introdexon junctions of the human versican gene. Exon sequences are represented by uppercase letters, while intron sequences are represented by lowercase Zetters, as determined by sequence analysis. Introns that do not interrupt codon triplets are indicated by codon phase 0; introns that interrupt a codon triplet after the first nucleotide are indicated by codon phase I; and introns that interrupt a codon triplet after the second nucleotide are indicated by codon phase IZ. Amino acids are numbered according to Dours-Zimmermann and Zimmermann (65). Intron sizes were determined by either direct sequencing (introns 4 and 14), amplification of intronic DNA within genomic clones by PCR utilizing cDNA primers, or estimation from restriction mapping analysis of genomic clones with overlapping genomic probes.

rified on an 0.8% agarose gel, after which the two fragments were ligated to make VCAT(+), which contained 244 bp of exon 1 (as deter- mined from above) and 632 bp of the 5'-untranslated region. Alterna- tively, the 876-bp EcoRIIPstI fragment was ligated into the EcoRIIPstI polylinker sites of pBluescript for later use. To make Vdel 1, which contained 244 bp of exon 1 and 30 bp of the 5"flanking region, VS51 was cut with PstI and RsaI, and the resulting 274-bp fragment was purified on a 0.8% agarose gel. This was ligated into pUCCAT cut with PstI and HindIII that was previously rendered blunt ended. Vdel2, which con- tained 244 bp of exon 1 and 209 bp of the 5"flanking region, and Vdel 3, which contained 244 bp of exon 1 and 445 bp of the 5"flanking region, were made by PCR amplification using primers based on the sequence of the versican 5'-flanking region and T7 of pBluescript. Primer Vdel2 was complementary to bp -209 to -193 of the 5"flanking region, and primer Vdel3 was complementary to bp -445 to -429. Both contained an additional HindIII site at their 5' end. Primer Vdel 2 and T7 were used to amplify a 540-bp piece of versican 5"flanking region, while primer Vdel 3 and T7 were used to amplify a 776-bp piece using the original 876-bp fragment cloned into pBluescript as a template. The amplified fragments were cut with PstI and HindIII, purified on an agarose gel, and ligated into the respective sites in pUCCAT. VCAT(-) was prepared by cutting the 876-bp fragment in pBluescript with PstI and SalI, followed by agarose gel purification and ligation into pUCCAT after it was digested with PstI and SalI and purified. Each construct was sequenced to determine correct orientation.

Dansient Cell Dunsfections and CAT Assays-Transient transfec- tions of HeLa cells and skin fibroblasts in suspension were performed by the calcium phosphate procedure as described previously (22). Briefly, cells were transfected with 20 pg of promoter/pUCCAT plasmid DNA, along with 10 pg of pSV-pgal plasmid DNA to normalize for transfection efficiency. The cells were incubated for 12 h at 37 "C in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum and 4 mM L-glutamine. After 12 h, the medium was changed, and the cells were incubated at 37 "C for an additional 48 h. Cells were then washed twice with Hanks' balanced salt solution (Ca2+- and Mg+-free) and as- sayed for @-galactosidase according to standard procedures. The level of P-galactosidase activity was used to normalize the amount of cell ex- tract used in the CAT assay. CAT assays were performed as described before (22), run on preslotted thin layer chromatography plates in a chlorofodmethanol (955) mobile phase, and exposed to film. The re- sultant autoradiograms were then subjected to scanning laser densitom- etry for quantification of the results. Additional experimental details are provided in the text and in the legends to the figures.

RESULTS AND DISCUSSION

Isolation, Mapping, and ExonlIntron Organization of the Human Versican Gene-Approximately 2 x lo6 recombinant phage clones from three separate genomic libraries were screened under high stringency with probes spanning the en-

Page 4: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

33002

A The Human Versican Gene

B

Hyaluronan Binding Region (Domain I )

Human Versican 3 4 5 6

376 ,176 128, 294 - 303

Rat Aggrecan 3 4 5

384 303 204

Rat Link Protein

2 3 4

372 303 287

Chicken Link Protein 2 3 4

376 303 287

Human CD44

-” 166 134

300

Selectin (Domain IV)

Human Versican 1%

- 1 IVb

1 - IVC

’ 8 10 11 12 I S 14 , I . . .

EQF EGF -Lectind CRP 714 114 159 83 146 183

Human Aggrecan

11 12 I S 14 I6 18

ECF ECF “L.din- CW 114 114 159 83 145 183

Human P-Selectin 3 4 5

Lectin EGF CRPI 387 108 186

Human E-Selectin 4 6

Lectin EGF CRPl 384 106 188

FIG. 3. Comparison of the human versican gene organization with genes of the corresponding homologous proteins. A, comparison

binding regions. The size of exon 4 of rat and chicken link protein include only coding sequences. B, comparison of the genomic organization of the of the genomic organization of the versican hyaluronan-binding region with the genomic organization of other molecules containing hyaluronan-

versican selectin-like region with the genomic organization of other molecules containing selectin-like regions. Exons are represented by boxes with their number in the gene above and their size in bp below. Homologous exons share the same shading. CRPl denotes the first of several CRP repeat encoding exons in t6e gene. Introns are not drawn to scale.

tire length of the versican cDNA. Eleven clones were analyzed in detail using restriction endonuclease digestion, Southern blotting, and sequencing. These 11 human genomic DNA clones spanned the entire length of the versican gene including 5‘- and 3”flanking sequences (Fig. 1). In all, the versican gene con- tained 15 exons spanning more than 90 kb of genomic DNA, a conservative estimate inasmuch as some of the clones did not overlap. The exons ranged in size from 76 to 5262 bp, and all followed the AG/GT rule for splicing junctions (Fig. 2). With the exception of exon 1, which encoded the entire 5”untranslated region, the exons of versican corresponded to discrete func- tional motifs in the core protein (Figs. 1 and 2). We will, there- fore, discuss the exon organization of the human versican do- mains uis-&-vis that of the homologous protein domains.

Signal Peptide-Following exon 1, which encoded 284 bp of the 5”untranslated region (see below), the versican gene con- tained a 76-bp exon encoding the signal peptide and the first few amino acids of the mature protein core. This exon was interrupted by a phase I intron and was very similar to exon 2 of rat aggrecan, which is 77 bp in size (11).

Domain I: The Hyaluronan-binding Region-Exons 3-6 coded for the amino-terminal hyaluronan-binding region of the first globular domain (Gl). The corresponding protein module binds avidly to hyaluronan, thus establishing a strong func- tional correlation with cartilage aggrecan (3). Within this do-

-

main, there was an Ig-like fold encoded by exon 3 and inter- rupted by a phase I intron. This is quite different from the Ig-like repeats found in the human perlecan gene, where each repeat is encoded by two exons separated by a phase 0 intron (23). Although the repeats in the versican and perlecan genes are both flanked on either side by phase I introns, the presence of the phase 0 intron interrupting the perlecan repeats sug- gests that these two proteoglycans evolved separately and that the Ig-like repeat of versican is not of the neural cell adhesion molecule type. Also within this hyaluronan binding domain of versican, there were two repeats of about 100 amino acid resi- dues each, known to mediate binding to hyaluronan (24); the first was encoded by exons 4 and 5, which were interrupted by a phase I1 and a phase I intron, respectively, whereas the sec- ond was encoded by exon 6, which was interrupted by a phase I intron (Fig. 2). This pattern of exon sizing and intron phasing is interesting when compared with other genes encoding hya- luronan-binding motifs (Fig. 3A). For instance, rat (25) and chicken (26) link proteins, as well as rat aggrecan (111, contain all three motifs encoded by three exons, in which the exons encoding the Ig-like fold and the second tandem repeat are nearly identical in size to the corresponding exons in versican. However, the exon encoding the first repeat in these molecules (exon 3 in link proteins and exon 4 in aggrecan) is identical in size (303 bp) to the sum of versican’s first repeat encoded by

Page 5: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

The Human Versican Gene

2.1 \1 5.5 \1 1.5 T T

33003

FIG. 4. Southern blot and schematic representation of clone VSBLF de- picting the presence of an additional exon. Clone VS5LF was subjected to re- striction enzyme digestion with SacI (lanes 1 and 4 ) , EcoRI (not completely di- gested, lanes 2 and 5) , or SacZIEcoRI (lanes 3 and 6) followed by agarose gel electrophoresis and Southern blotting of the resulting fragments. The restriction enzyme SacI was used to liberate the genomic insert from the phage arms. The membrane was then hybridized with a PCR-generated probe spanning bp 417 to 1149 of the versican cDNA (probe A ) or with a PCR product amplified from a CRL1262 cDNA library using a sense primer complementary to the 3' end of exon 6 and the T7 primer of the phage vector (probe B). Probe A hybridized to a 2.1-kb EcoRI fragment originally identi-

while probe B hybridized to 1.5 kb EcoRI fied to contain exon 6 (lanes 2 and 3) ,

and 5.5 kb EcoRI fragments (lanes 5 and 6). Sequence analysis of these two frag- ments and an additional 2.3-kb EcoRY SacI fragment revealed a contiguous stretch of genomic DNA containing a 2961-bp exon identical to the alterna- tively spliced region recently identified (65). The top panel depicts schematically the organization of genomic clone VSBLF, its relation to the cDNA containing the alternatively spliced exon 7, and the posi- tion of the PCR probes used to analyze the Southern blot. Exons or cDNA are repre- sented by filled boxes, and introns by thin lines. Homologous regions share the same shading. PCR-generated probes are rep- resented by open boxes.

€COR I €COR I €COR I €COR I

VERSICAN GENE 6 7

VERSICAN cDNA HER EXON 7

PCR PROBES 00 A B

PROBE A PROBE B

Sac I + - + €COR I - i-

7 - 6- 5- 4-

3 - kb

-.- 2-

1.6-

1-

exons 4 and 5 (175 + 128 = 303). Also, the former exons are separated by phase I introns, similar to the introns separating versican's exons 3 and 4 and exons 5 and 6 (Fig. 2). Additionally, CD44 (271, which contains a region homologous to the first repeat of versican (exons 4 and 51, has this region encoded by two exons of similar size (166 and 134 bp, respectively) and separated by a phase I1 intron as in versican (Fig. 3A).

A model for the evolution of hyaluronan-binding regions has been recently proposed (28). According to this hypothesis, a primordial hyaluronan-binding module equivalent to either ex- ons 4 and 5 or exon 6 of versican duplicated itself and thereby generated the hyaluronan-binding proteins we know today. This model suggests that CD44 and TSG-6, another hyaluro- nan-binding protein containing a tandem repeat similar to ver- sican (29), diverged early in the evolution of these genes and therefore evolved separately from the hyaluronan-binding pro- teoglycans and link proteins (28). These molecules, by way of exon duplication, acquired two repeats of this motif and sub- sequently obtained an Ig-like motif through recombination with a phase I intron (28). However, the similarity between exons 4 and 5 of versican and exons 2 and 3 of CD44 with regard to the presence of a phase I1 intron separating them is in apparent disagreement with this model. It seems relatively unlikely that versican and CD44 could have evolved similar introns interrupting their repeats at similar locations during

1 2 3

+ - + - + +

4 5 6

completely distinct evolutionary pathways. It will be interest- ing to determine if the genomic organization of the TSG-6 gene also has this motif encoded by two exons separated by a phase I1 intron.

Domain ZZ: The GAG-a-binding Region-The presence of an alternatively spliced region between the hyaluronan-binding region and GAG-binding region of PG-M, a chicken proteogly- can similar to versican, led us to believe that this region may be contained in a separate exon in human versican. In an attempt to identify and isolate this exon, we first amplified a cDNA fragment from a human fibroblast cDNA library using a primer complementary to the 3' end of exon 6 and a primer present in the phage vector. The resulting fragment was not present in any part of the published human versican cDNA (2) but was identical to a 3-kb region present in an alternatively spliced form of human versican between the hyaluronan-binding re- gion and the GAG-binding region that had significant similar- ity to the alternatively spliced region of PG-M (65). This frag- ment hybridized strongly to a 1.5-kb EcoRI fragment and a 5.5-kb EcoRI fragment from genomic clone VS5LF (Fig. 4,probe B ) that was previously shown to contain exon 6 by hybridiza- tion with a probe covering the hyaluronan-binding region (Fig. 4, probe A). Cloning and sequencing of these fragments and an additional 2.3-kb EcoRIfSacI fragment revealed a contiguous stretch of genomic DNA containing an exon of 2961 bp (exon 7).

Page 6: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

33004

FIG. 5. Sequence analysis of the ver- sican gene promoter and exon 1. All the cis-acting elements, including the

ing protein (CIEBP), AP2, SPI, CTFI TATA box, XRE, CCAAT-enhancer bind-

CBF, CBP, and cyclic AMP-responsive el- ement-binding protein (CREB) are underlined and in boldface. The TATA box was correctly predicted by the computer program FIND contained in the GCG package. The major transcription start site as determined by S1 nuclease protec- tion assay is indicated by + I , whereas the site predicted by RNase protection assay is indicated by an asterisk. The sequence of exon 1 is shaded.

The Human Versican Gene 1 GAATTCTTACACTTTCCCTCTAGGTCCCCGAAGGATCTCGTTTTCTCAGTGTCTCTTTCAGGTT~CAGG -563

7 1 AGCCTTGAGCCTGACACTTCCCTTTGATGGGACAGGC~GCTCTGT~CGCGT~TAACCA -493 XRE

141 A G T T C T T T G C T G A T T T T A C A G T T T T G T G T G C T C C C G A G A R -423 C\EBP

APZ 211 a C C T C i C C C C C T A A G A G C C T G G G G C T C C T T T C C C C T ~ C C ~ G ~ C T A G C ~ C A C ~ ~ ~ G ~ -353

SP1

CGQACGQG CGGACCCC QCCCCCCCQCCE APZ AP2 APZ

281 GGGGTGGGGAAGGAGTGGGAGOGCAGTGGTGGTTTCCGCGAGCAGAG~TGTTACTGAGTGAGTCCC~MTG -283 CCCCACCC

A P Z AP2 CTF/CBF

351 G O O A G C G C T G C T G T C C C M e C Q A ~ T A C T T C T T G T C A G G M G ~ C G C ~ G A G G ~ G A G T G C C ~ -213

CBP G.!?sx

AP2 APZ

421 GGGAGGGRGGCAGGCGGTCCCTACCGCA~GC~GGAGCTGCCTTTCCCCC~C~CTGCTTTCC~G -143 APZ

CGCGCCC AP2

491 CCTGGACTCTTAGGAGTGGCTGAAGCTGCGGAGCGCTTTTGGAGCCTGTGMTG~CCCTCCTCCTCTCC - 7 3

561 CTCCTCCTTCTTCTCGCTGAGTCTCCTCCTCGGCTCTGACGGTACAGTGA~ATGATGGGTGT~ -3 TATA-box

ACMC CREB

+ 6 4

+ 1 3 4

+ 2 0 4

+274

TAATCACGTTTCTTTTGTTCCCCCCTTAAAAAACAAAAACARAAAACTTATTGAAAAAAACCC +280

981 GCGAGCTTAGAAAAAAGAAGCAATTGGTAGAAGGCTTTAATT~~C~GAGCTGTTAGGCGMGTT~

1051 GRAATGTAGGCACTTATGCAGGTAACTTTTTTCAT~G~CTTTTG~AGAGGCATACAGA~GAC

This region lacked cysteine residues and contained several ser- ine-glycine dipeptides flanked by acidic amino acid residues that could be the attachment site of chondroitin sulfate chains. The deduced protein moiety encoded by this exon would have a molecular mass of 108 kDa, and thus the total molecular mass of human versican could reach 365 kDa. Interestingly, exon 7 was flanked by phase I introns, a fact that supports the exist- ence of alternative splicing variants of versican. The presence of this exon in the human versican gene, along with the overall strong similarity with PG-M and the presence of one single gene in both the avian (14) and human genes (191, supports the concept that PG-M is the avian version of mammalian versican.

Domain ZZZ: The GAG-P-binding Region-This domain was encoded by a uniquely large, single exon of 5262 bp (exon 81, followed by an interrupting phase I intron. This domain con- tained numerous potential attachment sites for chondroitin sulfate side chains through the serine-glycine or glycine-serine residues (2). Similar but smaller exons have been found in the corresponding region of rat and human aggrecan (11); the for- mer possesses an exon of 3741 bp, whereas the latter has an exon of 3942 bp (11). However, the rat and human aggrecan- GAG attachment regions contain several repeating amino acid sequences that are absent in versican (11, 30), suggesting that these exons evolved separately. It has been proposed that one of the functions of introns is to limit the amplification of se- quences in genes that cannot tolerate variations in size, such as the collagen genes (31). Consequently, genes that code for prod- ucts in which the exact length or number of repeats is not important are likely to contain large exons (30,31). Such exons would also allow for more efficient RNA processing (30). This may be the case for many members of the hyaluronan-binding proteoglycan gene family.

Domain N: Homology to Selectins-The carboxyl-terminal region of the second globular domain (G3), which contains two

1; T

EGF repeats, a lectin-like motif, and a CRP-like motif, was encoded by exons 9-14. Interestingly, these motifs are found in the family of leukocyte homing and cell adhesion molecules known as selectins (32). All the members of this family, which includes E-selectin (endothelial leukocyte adhesion molecule 1) (32, 33), L-selectin (Mel-14) (32, 34), and P-selectin (granule membrane protein 140) (32, 35), contain a lectin-like motif followed by an EGF-like repeat and a series of CRP-like motifs (32) (Fig. 3B). These motifs are thought to participate in cell- cell or cell-matrix interactions by binding carbohydrate moi- eties on glycoproteins or other glycosylated molecules (36). We have arbitrarily divided this domain into subdomains called IVa, IVb, and IVc because of the distinct structural properties each subdomain manifests, even though they more than likely act in concert with each other.

Subdomain Na: The Epidermal Growth Factor-like Repeats-The two EGF-like repeats were encoded by two exons of identical size (114 bp each), exons 9 and 10, which were interrupted by phase I introns. Likewise, the EGF-like repeats in P-selectin (37) and E-selectin (33) are both 108 bp, whereas the EGF repeat in coagulation factor IX is 114 bp (38). All are separated by phase 1 introns. These EGF repeats are thought to mediate protein-protein interactions in proteins containing these motifs (39). Of interest is the observation that aggrecan, originally thought to have only one EGF-like repeat (12), has now been shown to contain two such repeats that can be dif- ferentially spliced (40, 41). The second EGF-like repeat of ag- grecan contains amino acid residues that could mediate Ca2+ binding, whereas the first EGF-like repeat lacks these residues (40). Similarly, versican possesses the same Ca2+-binding resi- dues in its second EGF-like repeat and lacks them in its first repeat. I t is possible that the second repeat in versican and aggrecan binds Ca", which could be required for functional interaction with other proteins, whereas the first, Ca2+-inde-

Page 7: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

The Human Versican Gene 33005

A

311 - 249 -

200 -

151 - 140 -

118 - bP

100 -

82 -

66 -

1 2 3 EcoR I Pst I

Exon I * 876 nucleotide riboprobe

mRNA -[AIn

B

200 - 7

118- q

bP 100-

82- 1

66 - 4

1 2 3 EcoR I Bgl II

* Exon 1

mRNA +[AIn

assays. A, to prepare the riboprobe for use in the RNase protection assay, an 876-bp EcoRIIPstI fragment from clone VS51 was ligated in front of FIG. 6. Determination of the transcription start site of the human versican gene by RNase (A) and S1 nuclease (B) protection

the T7 promoter site of PBluescript. This fragment contained the first 220 bp of the cDNA and the first 656 bp of the 5”flanking sequence. After transcription by T7 RNA polymerase in the presence of radiolabeled UTP, the 876-nucleotide riboprobe was hybridized to total RNA from human skin fibroblasts. After RNase treatment, the duplex RNA was analyzed on a 6% denaturing sequencing gel and revealed a protected fragment of 220 nucleotides (lane 2, arrowhead), indicating a transcription start site at about 40 bp distal to the TATA box. Hybridization with tRNA did not reveal any bands (lane 3). B, a 745-bp radiolabeled EcoRUBglII fragment from VS51 was used for S1 nuclease protection assay. This fragment contained the first 93 bp of the cDNA and 656 bp of the 5”flanking sequence. After hybridization to total RNA from human skin fibroblasts and digestion with S1 nuclease, the hybrid nucleic acids were analyzed on a denaturing 6% sequencing gel. The protected band of 117 bp (lam 2, arrowhead) indicated a transcription start site at about 16 bp distal to the TATA box. Hybridization to tRNA did not reveal any bands (lane 3). These results indicate that the major site for transcription initiation is about 16 bp upstream of the TATA box. The bottom panels are schematic representations of the probes used for the respective assays. Sizes are in base pairs, and an asterisk (*) indicates radiolabeled fragments. Lanes 1 in A and B represent DNA size markers and are in base pairs.

pendent EGF-like repeat, along with the lectin-like and CRP- like motifs, is required for determining the specificity of this interaction.

Interestingly, about 70% of aggrecan transcripts in human articular cartilage or Swarm chondrosarcoma cells lack either EGF-like repeat, whereas about 25% contain the Ca2+-inde- pendent repeat, 5% contain the Ca2+-dependent repeat, and only 1-1.5% have both repeats (40). Perhaps in cartilage there is less need for aggrecan to bind to the cell surface or extracel- lular matrix through its carboxyl end than there is for it to interact freely as a supramolecular complex with link protein and hyaluronan. As a result, the functional elements at the carboxyl end would be spliced out in order to prevent such interactions. This would aid in its role of hydrating tissues and maintaining resiliency. Versican, in contrast, contains both EGF-like repeats and therefore may be very important in bind- ing to cells or the extracellular matrix through its carboxyl end and connecting them to hyaluronan.

Subdomain IVb: The kctin-like Motif-The lectin-like motif was encoded by three distinct exons (exons 11-13) of 159, 83, and 145 bp, respectively. Exons 11 and 13 were interrupted by phase I introns, whereas exon 12 was interrupted by a phase 0 intron. This tri-exonic organization is identical to that found in aggrecan (11). In addition, there are a number of other proteins belonging to the C-type lectin family that bind Ca2+, including the Kupffer cell receptor and chicken hepatic lectin, that also have this genomic organization (42). Anumber of other proteins

that contain C-type lectin motifs, including E-selectin and P- selectin, have such motifs encoded for by a single exon (42). However, the size of this exon is nearly identical to the size of the 3-exon lectin motifs combined. Similarly, the phase of the introns flanking this exon is often the same as the phase of the outermost introns for the 3-exon lectin motifs (42). Analysis of the amino acids encoded by these exons revealed a number of highly conserved residues. These residues are probably very important for folding and/or carbohydrate recognition and binding. In combining genomic organization and sequence sim- ilarity, those molecules containing C-type lectin motifs can be divided into four distinct groups with proteoglycans, including aggrecan and versican, in group 1 (42). This implies that the lectin motifs in these molecules evolved from an ancestral lec- tin motif that duplicated before the exon shufflinglintron re- combination event took place to produce the various groups of progenitors (42). The lectin-coding motif for versican is in agreement with this and therefore supports this grouping of related genes. Consequently, this could be supportive of a dis- tinct carbohydrate-binding function for this domain in versi- can. Interestingly, an in vitro expressed carboxyl-terminal por- tion of human aggrecan has been shown to bind various sugar residues (43).

Subdomain NC: The Complement Regulatory Protein-like Motif-The CRP-like motif was encoded by a single exon (exon 14) of 183 bp interrupted by a phase I intron. This exon coded for a structural motif found in a number of complement regu-

Page 8: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

33006 The Human Versican Gene

latory proteins, namely C4b binding protein, factor H, and de- cay accelerating factor (2). This motif, which is usually re- peated several times in the other proteins as a short consensus repeat (SCR), consists of about 60 amino acid residues with a series of highly conserved residues (44). The function of the CRPs is to interact with C3b and C4b and regulate the C3lC5 convertases (44). As a requirement, they must associate with their target molecules through protein-protein interactions, probably involving the SCR in this association. The SCR motifs are usually encoded by single exons of virtually identical size to the SCR in versican (4547). In addition, they are also sepa- rated by phase I introns as in versican (45-47). E-selectin (33) and P-selectin (34) also contain several copies of these SCRs encoded by discrete exons of size and phase similar to versican’s exon 14. Aggrecan also contains a copy of this CRP-like se- quence encoded by a single exon of nearly identical size to that of versican (11). However, this exon in aggrecan apparently is subjected to differential splicing events (48, 49). Perhaps, as with the EGF-like repeats, this motif may be important in determining the specificity of the lectin-like interactions by contacting the protein portion of a glycoprotein. Detailed bio- chemical and protein chemical studies, along with a precise modeling analysis, must be performed in order to determine the precise functionb) of this domain. Likewise, isolation of the receptor or ligand for this domain would greatly accelerate the elucidation of its function as well as the nature of its interactions.

The Carboxyl-terminal End and 3‘-Untranslated Region- Exon 15 encoded 42 amino acid residues of the carboxyl termi- nus, the translation stop codon, and the 3”untranslated region. We sequenced several kb of the 3“flanking region and found three canonical polyadenylation signals, AAUAAA (not shown). The first was located about 510 bp from the 5’ end of exon 15 as was reported in the original cDNA (2). The other two were clustered together a t about 1 kb distal from the first. The sig- nificance of these different signals is not known, but analysis of the sequence separating the first and second polyadenylation signals revealed three repeats of the sequence ATTTA (not shown), which have been shown to regulate mRNA stability in other genes (50).

Determination of the Danscription Start Site by RNase and SI Nuclease Protection Assay-The published versican cDNA sequence includes 260 bp of 5”untranslated region (21, which we found to be encoded by exon 1. Sequence analysis of the 5”flanking region revealed a canonical TATA box 40 bp up- stream from the published 5‘ end (Fig. 5), which could serve as a site of transcription initiation. To establish the transcription start site(s), RNase and S1 nuclease protection assays were performed using total RNA from human skin fibroblasts (Fig. 6, A and B). After RNase protection with an 876-bp riboprobe containing the first 220 bp of the versican cDNA and 656 bp of the 5”flanking region, gel analysis revealed a 220-bp fragment approximately 40 bp upstream from the TATA box (Fig. 6 A , lane 2 1. In contrast, no protected fragment was observed when tRNA was used as the template (Fig. 6A, lane 3 ). This analysis was repeated three times, and a major band was always ob- served a t around 220 bp; a weaker albeit constant band at 240 bp was also seen. It is important to note that because it is well established that RNA migrates slower than DNA in urea/ polyacrylamide gels, this size may actually be smaller. On the other hand, though, RNA:RNA hybrids are also very sensitive to digestion time, enzyme concentration, and base composition, which could produce smaller transcripts than expected. To con- firm this topology, an S1 nuclease protection assay was also performed using a 749-bp genomic fragment that included the first 93 bp of the versican cDNA and the first 656 bp of the

HeLa Cells

SV40 No insert -632

Fibroblasts

SV40 No insert -632

1 2 3 4 5 6

0 CTFlCBF QXRE 0 AP2 A SP1 BTATA BOX

a ClEBP * CREB FIG. 7. Functional activity of the human versican promoter in

transient cell transfection assays. An 876-bp EcoR1IP.d fragment from VS51 including 632 bp of the 5’-flanking region and 244 bp of exon 1 (represented schematically) were ligated directly 5’ of the CAT gene. The construct was transfected into either HeLa cells or IMR-90 embry- onic lung fibroblasts. CAT activity was normalized by P-galactosidase cotransfection and assayed by thin layer chromatography. The results of the versican promoter CAT activity (lane 3 for HeLa cells; lane 6 for IMR-90 cells) are compared with SV40 promoter CAT activity (lane 1 for HeLa; lane 4 for IMR-90) and promoterless CAT activity (lanes 2 for HeLa; lane 5 for IMR-90). A schematic of the versican promoter/CAT construct is indicated at the bottom of the figure with symbols repre- senting potential regulatory elements, which are identified.

5‘-flanking region. After S1 digestion and gel analysis, the pro- tected fragment was 117 bp long (Fig. 6B, lane 2 ) with an estimated transcription start site about 16 bp downstream of the TATA box. Taken together, the results predict the likely site of transcription initiation to be about 16 bp downstream of the TATA box.

The 5’-Flanking Region of the Human Versican Gene, a Com- plex Promoter Active in Dansient Dansfection Assays-Given the results presented above on the site of transcription initia- tion and the presence of putative cis-acting elements, we wished to determine whether this region could act as a func- tional promoter in transient cell transfection assays. To this end, an 876-bp genomic fragment containing 244 bp of exon 1 (basically all untranslated region) and 632 bp of the 5”flanking region was cloned upstream from a CAT reporter gene. This construct was transfected into HeLa cells or IMR-90 embryonic lung fibroblasts along with a P-galactosidase reporter plasmid driven by the SV40 promoter, which was used for normalization of transfection efficiency. As a positive control, we cotransfected the SV40-CAT construct along with the /3-galactosidase plas- mid (22). After normalization based on P-galactosidase activity, the versican promoter construct exhibited considerable CAT activity in HeLa cells (Fig. 7, lane 3) when compared with a promoterless construct (Fig. 7, lane 2) or the SV40-driven CAT construct (Fig. 7, lane 1 ). Because versican expression has not been established in HeLa cells, the same experiment was car- ried out in the embryonic fibroblast cell line IMR-90, the mRNA of which was originally used to clone versican (1, 2). Similar to the HeLa cells, the versican CAT construct exhibited high pro- moter activity (Fig. 7, lane 6) when compared with the promot- erless construct (Fig. 7, lane 5) or the SV40 CAT construct (Fig. 7, lane 4) . The results clearly showed that the 5‘-flanking re-

Page 9: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

The Human Versican Gene 33007

-632 -

0 AP2 A SP1 @ XRE

I CTFlCBF TATA Box * CREB

C/EBP

5'"-

-445

5""

-209

5' "_ -30

S--m +I -

I 3' ""

-640

Vdel3

V d e l P b V d e l l 1 - 1

$I pUCCAT

VCAT(-)

I I I I I

20 40 60 80 100

Relative Promoter Activity (%)

FIG. 8. Deletion constructs of the versican gene promoter and summary of CAT activity. Schematic representation of the 5' stepwise deletion constructs used to test the functional activity of the versican promoter in transient transfection assays and relative CAT activity of each construct are shown. Numbers on the left of each construct indicate the 5' end of the promoter fragment relative to the transcription start site (+I 1. Each construct is fused to the CAT gene at position +244 relative to the transcription start site in exon 1 (shaded box). The regulatory motifs in the versican promoter are schematically represented at the top by various symbols, which are identified. HeLa cells were cotransfected with the various versican promoter CAT constructs and SV40-P-galactosidase plasmid. CAT activity was assayed as described in the text and is expressed as a uercentaee relative to the maximum CAT activitv Droduced bv the Vdel3 construct (-445). The value represents the normalized mean of four - different experiments. Error burs indicate standard deviation.

gion of the versican gene could act as a functional promoter and contained the cis-acting elements necessary for driving the ex- pression of the versican gene in two different cell types derived from epithelial (HeLa) or mesenchymal (IMR-90) tissues.

Deletion Analysis of the Versican Promoter-Close analysis of the 5"flanking sequence revealed a clustering of AP2-binding sites upstream of the TATA box (Figs. 5 and 7). Deletion anal- ysis of this region revealed that these AP2 sites may act as an enhancer, as their presence dramatically increased the level of versican promoter activity in HeLa cells (Fig. 8, Vdel3), as compared with promoter activity lacking all or most of these AP2 sites (Fig. 8, Vdel 1 and Vdel2). However, a versican CAT construct in the opposite orientation showed no CAT activity at all (Fig. 8, VCAT(-)). This signifies that the correct orientation of the TATA box is important for the regulated expression of versican. AP2 is developmentally regulated and can bind to methylated DNA (51, 52), but its role in versican expression is not known. In the developing mouse, AP2 is concentrated in neural crest cells and its derivatives, as well as in surface ectoderm (51). This is consistent with its role in the regulation of genes involved in the morphogenesis of the peripheral nerv- ous system, the face, limbs, skin, and nephric tissues (51). Chondroitin sulfate proteoglycans, including versican, have been shown to be developmentally regulated in the central nervous system (8,53), as well as in the precartilaginous mes- enchyme (8). This regulation could be a direct result of AP2 transcriptional activity. AP2 has also been shown to be involved in epidermal gene expression (541, and there is extensive evi- dence for versican expression in both the dermis and epidermis of normal human skin (4). The deletion analysis also revealed the presence of a possible negative regulatory element between -445 and -632 bp as versican promoter constructs containing this region have markedly reduced promoter activity in HeLa cells (Fig. 8, VCAT(+)) as compared with a.construct without it (Fig. 8, Vdel3). Present in this region is a binding site for xenobiotic responsive element (XRE)-binding factor. A repres-

I _

sor protein, unrelated to XRE-binding factor, has been shown to bind to XREs and down-regulate P450 levels in fibroblasts after treatment with polycyclic aromatic hydrocarbons (55). Such a factor could be involved in the down-regulation of versican expression through this XRE. Negative and positive cis-acting elements have been identified in the promoter of the mouse serglycin gene (56) that appear to be important for its cell- specific expression. Such elements in the versican promoter may also regulate its cell-specific expression.

Also present in the 5'-flanking region of the human versican gene were binding sites for CCAAT binding transcription fac- tor, SP1, CCAAT enhancer-binding protein, and cyclic AMP- responsive element-binding protein. CCAAT-binding transcrip- tion factor has been shown to be involved in the transforming growth factor p activation of the type I plasminogen activator inhibitor gene (57). It is possible that this element also contrib- utes to the observed up-regulation of versican expression in response to transforming growth factor p (5,581. An SP1-bind- ing site was also present in the promoter region. This transcrip- tion factor is ubiquitously expressed during development (591, and, although it is thought to be responsible for the transcrip- tion of a number of housekeeping genes (59), it may also play an important role in regulating cellular processes during differen- tiation (59). A liver-specific DNA-binding protein, EBP20, has been found to recognize the CCAAT enhancer-binding protein site in the promoters of a number of other genes, including transthyretin, albumin, and a,-antitrypsin (60, 61). This pro- tein may participate in the regulated expression of versican. Finally cyclic AMP-responsive element-binding protein has been shown to be developmentally regulated in rat oligoden- drocytes; peak expression is seen just preceding myelinogenesis at postnatal day 14 (62, 63). Interestingly, versican expression in rat spinal cord also precedes myelinogenesis on postnatal day 10 (8). In addition, versican mRNA expression in rat brain also appears most prominently on postnatal days 3, 12, and 20 (10). Versican expression has also been shown to be enhanced

Page 10: THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol 269, No. 52, …0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. THE JOURNAL OF BI~L~CICAL CHEMISTXY Vol .269, No

33008 The Human Versican Gene

by platelet-derived growth factor (5) and inhibited by interleu- kin 10 (64). It will be interesting to establish whether any or all of these trans- and cis-acting elements are involved in the regu- lation of the human versican gene in response to growth factors and cytokines.

Conclusions-We have analyzed the genomic organization of the human versican gene and characterized the functional ac- tivity of its promoter. The results indicate that the exons en- coding versican correspond to discrete functional motifs in the protein and that this organization is shared by homologous proteins including aggrecan, link protein, CD44, P-selectin, and E-selectin. The similarities between the versican gene and genes of homologous proteins suggest that versican may have evolved from a primordial gene through the process of exon shuffling and intron recombination. As a result, the various domains of versican may have a function analogous to other proteins harboring similar motifs and sharing the same genomic organization. There was also an additional exon not present in the original cDNA that was identical to a spliced variant of human versican and similar to the alternatively spliced region of avian PG-M. We also found that the 5’- and 3’-untranslated regions appear to contain sequences necessary for the regulated expression of versican. There is one major transcription initiation site, and the 5”flanking region ex- presses functional promoter activity in transient transfection assays using cells derived from epithelial or mesenchymal tis- sues. The promoter has potential binding sites for various tran- scription factors that could alter its expression in various phys- iological and pathological conditions. Overall, our results provide insight into the evolution of this gene and allow further exploration into the function and regulation of versican.

Acknowledgments-We are grateful to C. C. Clark, and K. Danielson, for their critical reading of the manuscript. We thank all of the members of Dr. Iozzo’s laboratory for their advice and Dr. E. Ruoslahti for pro- viding the versican cDNAs used in the original screening.

1.

2. 3.

4.

5.

6.

7.

8.

9.

10.

11. 12.

13.

14.

15.

16. 17. 18.

19.

REFERENCES Krusius, T., Gehlsen, K. R., and Ruoslahti, E. (1987) J. B i d . Chem. 262,

Zimmermann, D. R., and Ruoslahti, E. (1989) EMBO J. 8,2975-2981 Lebaron, R. G., Zimmermann, D. R., and Ruoslahti, E. (1992) J. Biol. Chem.

13120-13125

Zimmermann, D. R., Dours-Zimmermann, M. T., Schubert, M., and Bruckner-

Schonherr. E.. Jarvelainen. H. T.. Sandell. L. J.. and Wieht. T. (1991) J. Biol.

267, 10003-10010

Tuderman, L. (1994) J. Cell Biol. 124,817425 , , , ,

Chem. 266; 17640-17647

Levine, E. M., and Iozzo, R. V (1991) Lab. Znuest. 64,474482

I .

Tan, E. M. L., Dodge, G. R., Sorger, T., Kovalszky, I., Unger, G. A., Yang, L.,

Yao, L. Y., Moody, C., Schonherr, E., Wight, T. N., and Sandell, L. J. (1994)

Bignami. A,, Perides, G., and Rahemtulla, F. (1993) J. Neurosci. Res. 34, Matrix Biol. 14, 213-225

97-106 ‘Ibna, A,. Perides, G., Rahemtulla, F., and Dahl, D. (1993) J. Histochem. Cyto-

Crawford, T. J., Melhado, I. G.. and Jirik, F. R. (1993) Deu. Brain Res. 76,

Doege, K., Sasaki, M., and Yamada, Y. (1990) Biochem. Soc. Dans. 18,200-202 Doege, K J., Sasaki, M., Kimura, T., and Yamada, Y. (1991) J. Biol. Chem. 266,

Rauch, U., Karthikeyan, L., Maurel, P., Margolis, R. U., and Margolis, R. K.

Shinomura, T., Nishida, Y., Ito, K, and Kimata, K (1993) J. Biol. Chem. 268,

Yamada, H., Watanabe, K., Shimonaka, M., and Yamaguchi, Y. (1994) J. Biol.

Esko, J. D. (1991) Corr Opin. Cell Biol. 3.805416 Gallagher, J. T. (1989) Cum Opin. Cell Biol. 1, 1201-1218 Adany, R., and Iozzo, R. V. (1990) Biochem. Biophys. Res. Commun. 171,

Iozzo, R. V., Naso, M. F., Cannizzaro, L. A,, Wasmuth, J. J., and McPherson,

chem. 41,593-599

264-267

894-902

(1992) J. Biol. Chem. 267,19536-19547

14461-14469

Chem. 269,10119-10126

1402-1413

J. D. (1992) Genomics 14,845-851

20. Feinberg, A. P., and Vogelstein, B. (1984) Anal. Biochem. 137,266-267 21. Dodge, G. R., Kovalszky, I., Chu, M. L., Hassell, J. R., McBride, 0. W., Yi, H.

22. Santra, M., Danielson, K. G., and Iozzo, R. V (1994) J. Biol. Chem. 269, F., and Iozzo, R. V. (1991) Genomics 10,673480

579587 23. Cohen, I. R., Grassel, S., Murdoch, A. D., and Iozzo, R. V. (1993) Proc. Natl.

24. Goetinck, P. E, Stirpe, N. S., ’bonis, P. A,, and Carlone, D. (1987) J. Cell B i d . Acad. Sci. U. S. A. 90,10404-10408

25. Rhodes, C., Doege, K., Sasaki, M., and Yamada, Y. (1988) J. Biol. Chem. 263, 105, 2403-2408

6063-6067 26. Kiss, I., Deak, F., Mestric, S., Delius, H., Soos, J., Dekany, K., Argraves, W. S.,

Sparks, K. J., and Goetinck, P. F. (1987) Proc. Natl. Acad. Sci. U. S. A. 84, 6399-6403

27. Screaton, G. R., Bell, M. V., Jackson, D. G., Cornelis, F. B., Gerth, U., and Bell, J. I. (1992) Proc. Natl. Acad. Sei. U. S. A. 89, 12160-12164

28. Barta, E., Deak, F., and Kiss, I. (1993) Biochem. J. 292,947-949 29. Lee, T. H., Wisiewski, H.-G., and Vilcek, J. (1992) J. Cell Biol. 116, 545-558 30. Upholt, W. B., Chandrasekaran, L., and Tanzer, M. L. (1993) Experientia 49,

31. Alexander, F., Young, P. R., and Tilghman, S. M. (1984) J. Mol. Biol. 173,

32. Bevilacqua, M. E, and Nelson, R. M. (1993) J. Clin. Znuest. 91, 379-387 33. Collins, T., Williams, A,, Johnston, G. I., Kim, J., Eddy, R., Shows, T.,

Gimbrone, M. A,, Jr., and Bevilacqua, M. P. (1991) J. Bid . Chem. 266, 24662473

34. Lasky, L. A,, Singer, M. S., Yednock, T. A,, Dowbenko, D., Fennie, C., Rodriquez, H., Nguyen, T., Stachel, S., and Rosen, S. (1989) Cell 66, 1045- 1055

384-392

159-176

35. Johnston, G. I., Cook, R. G., and McEver, R. P. (1989) Cell 66, 1033-1044 36. Stoolman, L. M. (1989) Cell 66,907-910 37. Johnston, G. I., Bliss, G. A,, Newman, P. J., and McEver, R. P. (1990) J. Biol.

38. Yoshitake, S., Schach, B. G., Foster, D. C., Davie, E. W., and Kurachi, K. (1985)

39. Appella, E., Weber, I. T., and Blasi, E (1988) FEBS Lett. 291, 1-4 40. Fulop, C., Walcz, E., Valyon, M., and Glant, T. T. (1993) J. Biol. Chem. 268,

41. Baldwin, C. T., Reginato, A. M., and Prockop, D. J . (1989) J. Biol. Chem. 264,

42. Besouska, K, Crichlow, G. V, Rose, J. M., Taylor, M. E., and Drickamer, K.

43. Halberg, D. F., Proulx, G., Doege, K., Yamada, Y., and Drickamer, K (1988)

44. Muller-Eberhard, H. J. (1988)Annu. Reu. Biochem. 67,321347 45. Hillarp, A., Pardo-Manuel, F., Ruiz, R. R., Rodriquez de Cordoba, S., and

46. Rodriquez de Cordoba, S., Sanchez-Corral, P., and Rey-campos, J. (1991) Dahlback, B. (1993) J. B i d . Chem. 268,15017-15023

47. Post, T. W., Arce, M. A,, Liszewski, M. K., Thompson, E. S., Atkinson, J. P., and J. Exp. Med. 173,1073-1082

48. Grover, J., and Roughley, P. J. (1993) Biochem. J. 291,361367 Lublin, D. M. (1990) J. Immunol. 144,740-744

49. Boyd, C. D., Pierce, R. A., Schwarzbauer, J. E., Doege, K., and Sandell, L. J.

50. Peltz, W. S., and Jacobson, A. (1992) Cum Opin. Cell Bid. 4,979-983 51. Mitchell, P. J., Timmons, P. M., Hebert, J. M., Rigby, P. S. J., and Tijian, R.

53. Kruecer, R. C., Jr., Hennig, A. K., and Schwartz, N. B. (1992) J . Biol. Chem. 52. Comb, M., and Goodman, H. M. (1990) Nucleic Acids Res. 18, 3975-3982

Chem. 266,21381-21385

Biochemistry 24,3736-3750

17377-17383

15747-15750

(1991) J. Biol. Chem. 266,11604-11609

J. Biol. Chem. 263,9486-9490

(1993) Matrix 13,457-469

(1991) Genes & Deu. 6,105-119

267, 12149-12161

7948-7952 54. Leask, A., Byrne, C., and Fuchs, E. (1991) Proc. Natl. Acad. Sci. U. S. A. 88,

55. Gradin, K., Wilhelmsson, A,, Poellinger, L., and Berghard, A. (1993) J. Biol.

56. Avraham, S., Avraham, H., Austen, K. F., and Stevens, R. L. (1992) J. Biol.

57. Riccio, A,, Pedone, P. V., Lund, L. R., Olesen, T., Olesen, H. S., and Andreasen,

58. KahBri, V M., Larjava, H., and Uitto, J. (1991) J. Biol. Chem. 266, 10608-

59. SafTer, J. D., Jackson, S. P., and Annarella, M. B. (1991) Mol. Cell. Bid. 11,

Chern. 268,4061-4068

Chem. 267,610417

P. A. (1992) Mol. Cell. Biol. 12, 18461855

10615

2189-2199 60. Costa, R. H., Grayson, D. R., Xanthopouls, G. K., and Darnell, J . E. (1988)

61. Johnson, P. F., Lanschulz, W. H., Graves, B. J., and McKnight, S. L. (1987) Proc. Natl. Acad. Sci. U. S. A. 86, 3840-3844

Genes & Deu. 1, 133-146 62. Sato-Bigbee, C., and Yu, R. K. (1993) J. Neurochem. Bo, 21062110 63. Herdexen, T., Fiallos-Esteada, C., Schmid, W., Bravo, R., and Zimmerman, M.

64. Qwarnstrom, E. E., Jarvelainen, H. T., KinseIla, M. G., Ostberg, C. O., (1992) Neurosci. Lett. 142, 57-61

Sandell, L. J., Page, R. C., and Wight, T. N. (1993) Biochem. J. 294, 613420

65. Dours-Zimmermann, M. T., and Zimmermann, D. R. (1994)J. Biol. Chem. 269, 32992-32998