genomic organization of human surfactant protein d - journal of

8
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1993 by The American Society for Biochemistry and Molecular Biology, Inc Vol. 268, No. 4, Issue of February 5, pp. 29762983,1993 Printed in U.S.A. Genomic Organization of Human Surfactant Protein D (SP-D) SP-D IS ENCODED ON CHROMOSOME lOq22.2-23.1* (Received for publication, August 28, 1992) Edmond Crouch#$, Kevin Rust$, Rosalie Veilell, Helen Donis-Kellerll, and Leonard GrossoS From the $Department of Pathology, Jewish Hospital at Washington UniversityMedical Center and the TDepartment of Genetics, Washington UniversitySchool of Medicine, St. Louis. Missouri 631 10 Surfactant protein D (SP-D) is a member of the fam- ily of mammalian C-type lectins. SP-D is secreted into the pulmonary airspaces by lung epithelial cells and is believed to contribute to the lung’s defense against inhaled microorganisms. We have previously charac- terized cDNAs specific for human SP-D (hSP-D). We now describe the partial characterization of genomic clones for hSP-D and present evidence for an SP-D gene with coding sequences spanning > 11 kilobases on thelong arm of chromosome 10. Genomic sequenc- ing demonstrated that the signal peptidelamino-ter- minal domain, the carbohydrate recognition domain, and the linking sequence betweenthe collagen domain and carbohydrate recognition domain are each encoded by a single exon, as for surfactant protein A and the mannose-binding protein C. However, sequencing also demonstrated a unique intron-exon structure for the collagen domain which is encoded on five exons, in- cluding four tandem exons of 117 bp. The latter exons show marked conservation in the predicted distribu- tion of hydrophilic amino acids, consistent with tandem replication of this collagen gene sequence during evo- lution. Segregation analysis of Hind111 digests of ge- nomic DNA using specific cDNA probes demonstrated selective hybridization of radiolabeled hSP-D cDNA to chromosome 10- and l0q-containing humanhamster somatic hybrids.The presence of SP-D gene sequences was confirmed by DNA amplification using oligomers specific for sequences within the collagen domain of the hSP-D gene. Fluorescence in situ hybridization of metaphase chromosomes using genomic probes gave selective labelingof 10q22.2-23.1. We speculate that SP-D is encoded at a locus on 1Oq that includes the genes for surfactant protein A. The insoluble components of alveolar lining material con- tain at least four genetically different pneumocyte-derived proteins. These include surfactant protein A (SP’-A), a col- * This work was supported by Program Project Grant HL-29594 and National Institutes of Health Grants HL-44015 and HG-00304. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequencefs) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) $ To whom correspondence should be addressed Dept. of Pathol- ogy, Jewish Hospital, 216 S. Kingshighway, St. Louis, MO63110. Tel.: 314-454-8462. The abbreviations used are: SP, surfactant protein; hSP, human surfactant protein; kb, kilobase pair(s); CRD, carbohydrate recogni- tion domain; bp, base pair(s); MBP,mannose-binding protein; UTR, untranslated region. LO5483-LO5485. lagenous glycoprotein with carbohydrate and lipid binding properties, and two highly hydrophobic proteins, referred to as SP-B and SP-C (Hawgood, 1989; Weaver and Whitsett, 1991). We have recently described an additional protein des- ignated surfactant protein SP-D (Persson et al., 1988, 1989, 1990; Crouch et al., 1991a, 1991b, 1991~). SP-D is a collagenous glycoprotein (43 kDa, reduced) that is structurally and functionally related to other members of the collagenous C-type lectin family (Persson et al., 1990; Lu et al., 1992; Rust et al., 1991; Shimizu et al., 1992).This family includes SP-A, the major surfactant protein, as well as serum conglutinin and theserum mannose-binding proteins (MBP) (Drickamer 1988,1989; Thiel and Reid, 1989). These proteins, like SP-D, are multivalent carbohydrate-binding proteins that consist of multimers of trimeric subunits comprised of ho- mologous peptide chains. Each of the constituent chains is characterized by a short amino-terminal non-collagenous do- main that participates in interchain disulfide bond formation, a collagenous domain ((Gly-X- Y),,), a short amphipathic link- ing or connecting sequence, and a highly conserved carboxyl- terminal C-type carbohydrate-binding domain (CRD) that contains intra- but not interchain disulfide bonds. Unlike SP- A and MBP, which have short and interrupted collagen do- mains, both rat andhuman SP-D possess comparatively long collagen domains comprised of 59 consecutive -Gly-X- Y- tri- plets. The collagenous lectins also show differences in supra- molecular organization, as well as apparent ligand specificity. Recent studies indicate that SP-Dcan specifically interact with a variety of microorganisms (Kuan et al., 1992) and can specifically bind to the surface of alveolar macrophages, sug- gesting that it may function as a pulmonary host defense protein. In addition, the marked structural homology of this protein with the major surfactant protein (SP-A), and its capacity to associate with certain classes of surfactant phos- pholipids in uitro? suggest that SP-D could participate in the extracellular reorganization or turnover of pulmonary surfac- tant. Previous studies have localized the genes for human SP-A and MBP to the long arm of chromosome 10 (Bruns et al., 1987; Fisher et al., 1987; Sastry et al., 1991). In this paper, we describe the characterization of genomic clones for human SP-D and the use of somatic-cell and fluorescence in situ hybridization assays to establish the chromosomal localiza- tion of sequences coding for human SP-D. MATERIALS AND METHODS SP-D cDNA Probes-Complementary DNAs to human SP-D were cloned and characterized as previously described (Rust et al., 1991). For purposes of blotting, a truncated fragment of the largest cDNA clone was prepared by digesting H13 cDNA (subcloned into pGEM Persson, A., Gibbons, B., Shoemaker, J. D., Moxley, M., and Longmore, W. (1992) Biochemistry, manuscript in press. 2976

Upload: others

Post on 09-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomic Organization of Human Surfactant Protein D - Journal of

THE J O U R N A L OF BIOLOGICAL CHEMISTRY 0 1993 by The American Society for Biochemistry and Molecular Biology, Inc

Vol. 268, No. 4, Issue of February 5, pp. 29762983,1993 Printed in U.S.A.

Genomic Organization of Human Surfactant Protein D (SP-D) SP-D IS ENCODED ON CHROMOSOME lOq22.2-23.1*

(Received for publication, August 28, 1992)

Edmond Crouch#$, Kevin Rust$, Rosalie Veilell, Helen Donis-Kellerll, and Leonard GrossoS From the $Department of Pathology, Jewish Hospital at Washington University Medical Center and the TDepartment of Genetics, Washington University School of Medicine, St. Louis. Missouri 631 10

Surfactant protein D (SP-D) is a member of the fam- ily of mammalian C-type lectins. SP-D is secreted into the pulmonary airspaces by lung epithelial cells and is believed to contribute to the lung’s defense against inhaled microorganisms. We have previously charac- terized cDNAs specific for human SP-D (hSP-D). We now describe the partial characterization of genomic clones for hSP-D and present evidence for an SP-D gene with coding sequences spanning > 11 kilobases on the long arm of chromosome 10. Genomic sequenc- ing demonstrated that the signal peptidelamino-ter- minal domain, the carbohydrate recognition domain, and the linking sequence between the collagen domain and carbohydrate recognition domain are each encoded by a single exon, as for surfactant protein A and the mannose-binding protein C. However, sequencing also demonstrated a unique intron-exon structure for the collagen domain which is encoded on five exons, in- cluding four tandem exons of 117 bp. The latter exons show marked conservation in the predicted distribu- tion of hydrophilic amino acids, consistent with tandem replication of this collagen gene sequence during evo- lution. Segregation analysis of Hind111 digests of ge- nomic DNA using specific cDNA probes demonstrated selective hybridization of radiolabeled hSP-D cDNA to chromosome 10- and l0q-containing humanhamster somatic hybrids. The presence of SP-D gene sequences was confirmed by DNA amplification using oligomers specific for sequences within the collagen domain of the hSP-D gene. Fluorescence in situ hybridization of metaphase chromosomes using genomic probes gave selective labeling of 10q22.2-23.1. We speculate that SP-D is encoded at a locus on 1Oq that includes the genes for surfactant protein A.

The insoluble components of alveolar lining material con- tain at least four genetically different pneumocyte-derived proteins. These include surfactant protein A (SP’-A), a col-

* This work was supported by Program Project Grant HL-29594 and National Institutes of Health Grants HL-44015 and HG-00304. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequencefs) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s)

$ To whom correspondence should be addressed Dept. of Pathol- ogy, Jewish Hospital, 216 S. Kingshighway, St. Louis, MO 63110. Tel.: 314-454-8462.

’ The abbreviations used are: SP, surfactant protein; hSP, human surfactant protein; kb, kilobase pair(s); CRD, carbohydrate recogni- tion domain; bp, base pair(s); MBP, mannose-binding protein; UTR, untranslated region.

LO5483-LO5485.

lagenous glycoprotein with carbohydrate and lipid binding properties, and two highly hydrophobic proteins, referred to as SP-B and SP-C (Hawgood, 1989; Weaver and Whitsett, 1991). We have recently described an additional protein des- ignated surfactant protein SP-D (Persson et al., 1988, 1989, 1990; Crouch et al., 1991a, 1991b, 1991~).

SP-D is a collagenous glycoprotein (43 kDa, reduced) that is structurally and functionally related to other members of the collagenous C-type lectin family (Persson et al., 1990; Lu et al., 1992; Rust et al., 1991; Shimizu et al., 1992). This family includes SP-A, the major surfactant protein, as well as serum conglutinin and the serum mannose-binding proteins (MBP) (Drickamer 1988,1989; Thiel and Reid, 1989). These proteins, like SP-D, are multivalent carbohydrate-binding proteins that consist of multimers of trimeric subunits comprised of ho- mologous peptide chains. Each of the constituent chains is characterized by a short amino-terminal non-collagenous do- main that participates in interchain disulfide bond formation, a collagenous domain ((Gly-X- Y),,), a short amphipathic link- ing or connecting sequence, and a highly conserved carboxyl- terminal C-type carbohydrate-binding domain (CRD) that contains intra- but not interchain disulfide bonds. Unlike SP- A and MBP, which have short and interrupted collagen do- mains, both rat and human SP-D possess comparatively long collagen domains comprised of 59 consecutive -Gly-X- Y- tri- plets. The collagenous lectins also show differences in supra- molecular organization, as well as apparent ligand specificity.

Recent studies indicate that SP-D can specifically interact with a variety of microorganisms (Kuan et al., 1992) and can specifically bind to the surface of alveolar macrophages, sug- gesting that it may function as a pulmonary host defense protein. In addition, the marked structural homology of this protein with the major surfactant protein (SP-A), and its capacity to associate with certain classes of surfactant phos- pholipids in uitro? suggest that SP-D could participate in the extracellular reorganization or turnover of pulmonary surfac- tant.

Previous studies have localized the genes for human SP-A and MBP to the long arm of chromosome 10 (Bruns et al., 1987; Fisher et al., 1987; Sastry et al., 1991). In this paper, we describe the characterization of genomic clones for human SP-D and the use of somatic-cell and fluorescence in situ hybridization assays to establish the chromosomal localiza- tion of sequences coding for human SP-D.

MATERIALS AND METHODS

SP-D cDNA Probes-Complementary DNAs to human SP-D were cloned and characterized as previously described (Rust et al., 1991). For purposes of blotting, a truncated fragment of the largest cDNA clone was prepared by digesting H13 cDNA (subcloned into pGEM

Persson, A., Gibbons, B., Shoemaker, J. D., Moxley, M., and Longmore, W. (1992) Biochemistry, manuscript in press.

2976

Page 2: Genomic Organization of Human Surfactant Protein D - Journal of

Chromosomal Localization of SP-D 2977

:3Z, Promega, Madison, WI) with the restriction enzymes EcoRI and BglII. An 886-base pair fragment (H13-886) was isolated on a 1.5% agarose gel, excised, and purified with glass beads (Gene Clean 11, Bio 101, La Jolla, CA). The resultant cDNA contained most of the available coding sequence of human SP-D, including the majority of the collagen sequence, and most of the 3"carbohydrate-binding do- main, terminating a t a BglII site 96 base pairs from the 3' end of the coding sequence. The cDNA insert was characterized by Northern and Southern hybridizations and DNA sequencing as previously described (Rust et al., 1991). In particular, the radiolabeled insert showed specific hybridization to SP-D mRNA in Northern blots of human lung RNA.

Isolation and Subcloning of Genomic Clones-Human genomic clones were selected from a library derived from human placenta (Stratagene, La Jolla, CA). The library was titered and screened according to the manufacturer's protocol. The nitrocellulose lifts were hybridized overnight with 4 X lo8 cpm of nick-translated H13-886 cDNA and washed with the final high stringency wash in 0.1 X SSC, 0.5% SDS at 60 "C. The air-dried filters were exposed to Kodak X- OMAT AR film overnight a t -80 "C with an intensifying screen. Selected positive clones were purified to uniformity. Two such clones, designated H5 and H6, were subjected to restriction digestion with either EcoRI or SacI (Promega, Madison, WI). The restricted DNA was extracted with phenol-chloroform, ethanol precipitated in the presence of 0.3 M sodium acetate, redissolved in distilled water, and ligated into the plasmid vector pGEM3Z which had been predigested with the appropriate restriction enzyme. Subcloned genomic inserts were initially characterized by Southern blotting with specific oligo- mers. Southern blotting and sequencing were performed as previously described (Rust et al., 1991).

Radiolabeling of cDNA and Oligomeric Probes-The gel-purified H13-886 insert was labeled for blotting using the random priming method and extension with Klenow (Boehringer Mannheim to incor- porate [ ~ Y - ~ * P ] ~ C T P (ICN, Irvine, CA) into newly formed chains. Labeled cDNAs were purified on P30 (Bio-Rad) columns before use. The specific activity of the labeled probe ranged from 5 X 10' to 1.2 X lo9 cpm/pg DNA.

Synthetic oligomers were also used in some of the blotting experi- ments. The oligomers were end labeled by incubation with 1-2 units of T4 polynucleotide kinase (Promega) in the recommended buffer and in the presence of [y-"PIATP (ICN). The labeling reactions were terminated after 30 min by heating to 70 "C. Unincorporated label was removed using a P30 column. The specific activity ranged from 5 X 10' to 2 X 10' cpmlpg.

Somatic Cell Hybrids-Southern blots of restriction digests of human and hamster genomic DNA and of genomic DNA from a panel of hamster/human somatic cell hybrids were obtained from BIOS Corporation (New Haven CT) (Table I). Genomic DNA from selected somatic hybrids was also provided by BIOS Corporation. DNA from a hybrid containing the long arm of human chromosome 10 and an intact chromosome 12 (640-34p6-1C) was kindly provided by Dr. Carol Jones (Fisher et al., 1987; Jones, 1975).

isolation of Genomic DNA-Human leukocytes were purified by Ficoll gradient centrifugation from the blood of adult volunteers. DNA was then isolated from by the method of Herrmann and Fris- chauf (1987).

Southern Blotting-The BIOS blots were prehybridized according to the manufacturer's instructions and hybridized for 16-18 h a t 65 "C with heat-denatured cDNA probe (1-5 X lo6 cpm/ml). Blots were washed twice in 2 X SSC, 0.5% SDS for 10 min a t room temperature, once in 1 X SSC, 1% SDS for 15 min at 65 "C, and twice for 15 min a t 65 "C in 0.1 X SSC, 1% SDS also a t 65 "C. The damp blots were covered with plastic wrap and exposed to film (Kodak X- AR) for 1-10 days with an intensifying screen.

For other experiments, isolated genomic DNA was transferred to a charged nylon membrane (Genescreen Plus, New England Nuclear), fixed by exposure to UV light according to the manufacturer's speci- fications. The blots were prehybridized for a t least 1 h a t 65 "C in 1% SDS, 1 M NaC1, 10% dextran sulfate (Sigma), and then hybridized overnight at 65 "C with heat-denatured cDNA probe in the presence of 100 pg/ml denatured salmon sperm DNA. The blots were washed twice at room temperature for 5 min in 2 X SSC, twice for 30 min in 2 X SSC, 1.0% SDS at 65 "C, and twice for 30 min in 0.1 X SSC a t room temperature. The procedure for blotting with labeled oligomers was the same except that the prehybridization and hybridization reactions were conducted at 37 "C, and the high temperature washes were conducted at 42 "C.

Oligomeric DNA Synthesis and DNA Amplification-Amplification

primers were synthesized using an Applied Biosystems Synthesizer, detritylized, and purified by denaturing polyacrylamide gel electro- phoresis. The specificity of the synthetic oligomer was confirmed by Northern and/or Southern blotting using empirically optimized con-

"

ditions of stringency. .~

Genomic DNA was amdified bv thermal cvclina with Taq DNA " I

polymerase (Promega) in the presence of the characterized primers using a Perkin Elmer-Cetus DNA thermal cycler. The DNA was denatured for 2 min at 94 "C and amplified with 25-30 cycles as follows: 30 s a t 94 "C, 30 s a t 55 "C, and 2 min a t 72 "C. After the last cycle, the reactions were incubated for 3 min at 72 "C to finish incomplete strands. Reactions were then held a t 4 "C until they could be phenol-chloroform extracted and precipitated in ethanol-acetate. Thermally amplified samples were run on 1.5 or 2% agarose gels in 1 X TAE (0.04 M Tris-acetate, 0.002 M EDTA, pH 7.2) buffer (Davis et at., 1986) for approximately 80 V/h in the presence of ethidium bromide, examined and photographed under UV, then denatured and transferred to Gene Screen Plus as described. Positive amplification controls included H13 cDNA and human genomic DNA; negative controls consisted of total hamster genomic DNA or DNA from hamster/human hybrids lacking the chromosome of interest.

Fluorescence in Situ Hybridization-Fluorescence in situ hybridi- zations were performed essentially as described by Lichter et al. (1988). Briefly, human prometaphase chromosome spreads were pre- pared from cultured phytohemagglutinin-stimulated peripheral blood lymphocytes from a male of normal karyotype (46 XU). Extended chromosomes were produced by standard colchicine treatment. Sev- eral SP-D genomic sequences were used as probes, including the intact genomic clones H5 and H6, and subclones encoding the colla- gen (H5E5) or CRD (H6S4 + H6E3) domains. The probes were labeled with Biotin-11-dUTP by nick translation (Rigby et al., 1977), and each slide containing the spreads was hybridized with -150 ng of labeled probe. For fluorochrome detection, slides were incubated with 5 pg/ml fluorescein isothiocyanate-conjugated goat anti-avidin DCS antibody (Vector Laboratories) and counterstained with 200 ng/ ml4,6-diamidino-2-phenylindole-dihydrochloride and 200 ng/ml pro- pidium iodide. Cytogenetic banding patterns were observed by stain- ing the slides with giemsa following fluorescence hybridization.

RESULTS

Partial Characterization of Genomic Clones-Nucleotide se- quencing and restriction mapping of overlapping clones H5 (-18 kb) and H6 (-35 kb) indicate that the translated se- quence of human SP-D is encoded by seven exons with the transcribed sequences of the gene spanning > 11 kb (Figs. 1- 5). The first protein encoding exon (UTR/S/N/Cl) includes sequences corresponding to the signal peptide, amino-termi- nal non-collagen domain, and the first seven Gly-X- Y triplets of the collagen domain of SP-D (total of 59 triplets) (Fig. 2). The remainder of the collagen domain is encoded by four exons (Fig. 3, see below). Separate exons also encode the C- type lectin CRD and a predicted a-helical linking sequence (L) between the collagen- and carbohydrate-binding domains (Figs. 3 and 4).

The primary sequences predicted by genomic and cDNA sequencing were in agreement except for two apparent base substitutions: one at the codon for amino acid residue 11 following the initiator met (CTG uersus CTA), and the second at the codon for residue 11 in the mature protein (ATG uersus ACG) resulting in the substitution of a Met for a Thr (Lu et al., 1992). Three additional differences between genomic and H13 cDNA-derived sequence (resulting in three amino acid substitutions) were found within the collagen domain (Fig. 5).

Complete sequencing of a SacI fragment derived from clone 6 (H6S2) and an EcoRI fragment from clone 5 (H5E5) dem- onstrated four 117-bp exons encoding 52 triplets of uninter- rupted collagen sequence corresponding to bases 140-607 of the hSP-D open reading frame (Fig. 3). There was 50% homology in predicted amino acid sequence for these exons with the deduced sequences for the collagen-encoding exons of human SP-A and MBP-C and 40% homology for the exon

Page 3: Genomic Organization of Human Surfactant Protein D - Journal of

2978 Chromosomal Localization of SP-D

-+ + -+ -+ -++ + -+ +-++ + - + -+ c 4- e 4 - 4 - 4- c c e 4 - e c e c e c c e c 4- e

E E s s S E S aataaa S I p"1 I I m a 1 e l o I I 1 P " I L

5' UTR UTWSNC 1 c 2 c 3 c 4 c5 L CRD UTR

H5 (-18kb) 7

H5E7 (-7 kb) H5E5 (-5 kb) H5E2 (2kb) -

lo00 bp F

H6 (-35 kb )

H6SB3 (-300 bp) - -

H6E3 (- 10 kb)

H6S2 (-2 kb) H6S1(-1 kb)

H6S4 (-2.4 kb) FIG. 1. The human SP-D gene. The figure diagrams the genomic organization of the human SP-D gene as deduced from sequencing of

two overlapping genomic clones (H5 and H6). Subcloned fragments derived from EcoRI ( E ) and Sac1 ( S ) digests of the two genomic clones are shown. Protein encoding exons (boxes) were identified by comparison with cDNA sequences for rat and human SP-D (Rust et al., 1991; Lu et al., 1992; Shimizu et al., 1992; Lu et al., 1992). These include exons encoding the signal peptide, amino terminus, and a hydrophilic amino-terminal collagen sequence (S /N/Cl) ; the remainder of the collagen domain (C2-C5); a C-type lectin carbohydrate binding domain (CRD), and a linking peptide ( L ) between the collagen domain and CRD. The 5'-UTR is interrupted by at least one intron. The single consensus for polyadenylation (aataaa) is shown at right. The location and orientation of sequencing oligomers is shown by small arrows at the top. Intron sizes were determined by a combination of direct sequencing, restriction mapping, and DNA amplification.

....... t c a c c t c t a g a a g c t g a g c c c t a a g c c c t a a a c c a t g t c c a t g a a g c a t a a g c a t g t c t t t t c t c t t t c t g t t c a c c t c t g c A G ~

-60 B T G C L G U . C ~ C X € T . C ~ G G A ~ G T s C L G ~ A C A G A G L C S S 3 G G G ( : ~ L I G E B B - 2 0 Met Leu Leu Phe L e u Leu Ser Ala Leu Val Leu Leu T h r G l n Pro Leu G l y T y r Leu G l u

1 Ala G l u Met L y s T h r T y r Ser His Arg T h r Met P r o Ser Ala C y s Thr Leu Val Met C y s 1 ~GBBBTGAAci~lAc~ucBGB~arc,s€cm~u;rA€crrr;rrr&rr;r

2 1 Ser Ser Val G l u Ser G l y Leu P r o G l y Arg Asp G l y Arg Asp G l y Arg G l u G l y Pro Arg 6 1 A G C ~ E T G G B E B G T L G € C L G ~ E G T u ; I : ~ L G A € G E ~ G G G B G B G B G G . G C c L L T € G G

4 1 G l y G l u L y s G l y Asp Pro I G l y I 1 2 1 G Y : G B G B B G G W ; G B C . L x a G

gtagggtggggccctgggcttatcctgctggggaggaatggtcattggaactgtaactagcccagcaactctgggtactttgttat . . . . . . . FIG. 2. The first exon containing translated sequence is shown together with associated flanking intron sequences. The exon

encodes three bases of UTR (GCC), a short signal peptide, the amino-terminal non-collagen domain of the mature protein, and a conserved hydrophilic collagenous sequence corresponding to the first seven triplets of the collagen domain of SP-D (Cl). Numbering at left identifies the amino acid residue or base corresponding to the full-length sequence reported by Lu et al. (1992).

encoding the linking sequence (L). Significantly, the 117-bp exons begin and end with split glycine codons (Fig. 3). The associated introns, although similar in size, showed only short stretches of base sequence homology by matrix analysis (not shown).

Given the identical size of collagen exons C2-C5 (117 bp), we were prompted to compare their nucleotide sequences. Although overall homology in predicted amino acid sequence between each pair of exons ranged from 31 to 61%, the deduced protein sequences for each of the four exons showed striking conservation of charged amino acids at several posi- tions within the distal half of the exon (Fig. 6). Thus, the sequence data predict an approximately 39 (117/3) residue periodicity in hydrophilic amino acids within the collagen domain of SP-D.

Southern Blotting of Genomic DNA-Southern blotting of restriction digests of human and hamster genomic DNA at high stringency revealed a relatively simple hybridization profile most consistent with a single gene (Fig. 7). Hybridi- zation of radiolabeled SP-D cDNA to HindIII digests of human genomic DNA revealed a single labeled species with a n apparent size of approximately 8 kb relative to internal DNA standards; no corresponding signal was observed in parallel blots of hamster DNA. Based on these results, sub- sequent characterizations of genomic DNA from human/ham-

ster hybrids were performed by Southern blotting of HindIII digests.

Chromosomal Localization by Synteny Analysis-Screening of HindIII digests of genomic DNA isolated from a panel of human/hamster somatic hybrids revealed hybridization with only three members of the clone panel (Table I). In each case, the hybridization signal comigrated with the major hybridiz- able fragment in the HindIII digests of total human genomic DNA. All other hybrids showed no specific hybridization with the probe. There was complete concordance between the presence or absence of the specific hybridization signal and the presence or absence of chromosome 10, respectively.

DNA Amplification Using SF-D Specific Primers-To fur- ther confirm the presence of sequences encoding SP-D in the chromosome 10-containing hybrids, segments of genomic DNA from selected hybrids were amplified by thermal cycling, using Taq DNA polymerase, and DNA primers (oligomers 19C and 4N) corresponding to sequence in the predicted open reading frame of the H13 cDNA (see Fig. 3, Rust et al., 1991). Amplification of total human genomic DNAs yielded a single major product of approximately 685 bp as detected by ethid- ium bromide staining, consistent with the predicted size based on genomic sequencing. The identity of the amplified DNA was confirmed by Southern hybridization with H13 cDNA and by blotting with an internal oligonucleotide probe (oli-

Page 4: Genomic Organization of Human Surfactant Protein D - Journal of

FIG. 3. The four collagenous ex- ons encoding the 52 carboxyl-ter- minal triplets of the collagen do- main of SP-D (C2-C5), and the sin- gle exon encoding the linking peptide (L) are shown with flanking and intervening intron sequences. Predicted amino acid sequence is shown for each exon. Note that each of the 117- bp (39 amino acids) collagenous exons (C2-C4) begins and ends with a split glycine codon. Sequences corresponding t o primer and detection oligomers used for polymerase chain reaction of somatic hybrid DNAs are shown in italics. Oli- gomer sequences are identified a t right; the orientations of the priming oligomers are indicated with arrows.

Chromosomal Localization of SP-D

257 LB 86 I G l y ) P -0 G - Y

GGB G l Y GGI

G ! " GAA

A l a G K m

Le "

165 491

2979

Oliqo 15C*

Oliqo 32C

Oligo 31C

0liqo 19c*

01igo 18

olipo drc

223 667

gaartrctaa ctgcacqtttgctgcttqtqtqactttqaqcctctaqcttctctaaactccagtactctccatctqtaaqtgqggaatcatgqtgtcacctca cagqqtagqtcatgqtqqacqqgqgctqttctggaacaatctctcttqcattc~ctctggcatcctgatqcgqtqcctgtttggqaqqtctqa aaggctcaqqt tggagcacagagqtc tcacaqcccc tgaqc t t tqgccagqtga tqccacaaccccaacc tqac t t tc t tc t tc tcc tac tAG

231 (Val) G l u L e u P h e P r o Asn G l y G l n Ser Val G l y G l u L y s I l e P h e L y s T h r Ala G l y P h e 692 ?X E B G ~ X L C C C A A U G G € C A A B G T G T G G E E G B G B B G ~ T L C A A G A C A G C A G G € ~

251 Val L y s P r o P h e T h r G l u Ala G l n L e u L e u C y s T h r G l n Ala G l y G l y G l n L e u Ala Ser 751 ~BBB~TTT&XEBGLCAGAG€TGCTG2GC&3~U~EEBCAGTTEEu:~

271 Pro Arq Ser Ala Ala G1u Asn Ala Ala L e u G l n G l n L e u Val Val Ala L y s Asn G l u Ala 811 u~mGL;I:uEBGBBT~GL;I:~.aAcAG~uGTBmBBGBBL:GBGGI;T 291 Ala P h e L e u Ser Met T h r Asp Ser L y s T h r G l u G l y L y s P h e T h r T ~ K P r o T h r G l y G l u 871 G E ; T ~ S X . G A G . C B T G & X G B T Z X A A G B c B G B G G G € B B G ~ ~ I A G C ! X B L ; B G E B E B G

311 Ser L e u Val TYK Ser Asn Trp Ala P r o G l y G l u P r o Asn Asp Asp G l y G l y Ser G l u Asp 931 ~UGTcTA2ZXAALrrS,~~GEEGBG€C.CAACGBTGBTGG€GEEXAGAG~

331 C y s Val G l u I l e P h e T h r Asn G l y L y s T r p Asn Asp Arq Ala C y s G l y G l u L y s Arq L e u 991 m~EBGAKmuBBT~BBErrS,BBT~BGGUmEEB~BBG~m

1051 ETG GTI: TGG EBG DX TGA 351 Val Val C y s G l u P h e Stop

qccaactgqqqtqqgtgqqqcaqtgcttqqcccaggagtttqqcaaqaagtcaaqqcttaqaccctcatqctqccaatatct~aatgq aqaccatc

FIG. 4. Sequences for the C-type lectin domain (CRD) and contiguous 3'-UTR are shown together with as- sociated flanking intron sequences. The translated and UTR sequences are identical to the corresponding cDNA se- quences (Rust et al., 1991; Lu et al., 1992.) The single predicted consensus for polyadenylation (aataaa) is under- lined.

gomer 18). Hybridizable products of identical size were am- using hamster DNA or DNA extracted from hamster/human plified from genomic DNA isolated from all three of the hybrid 867 (containing chromosomes 1, 5, 13, 14, 18, and 19) chromosome 10-containing hybrids. However, there was no or hybrid 423, containing chromosome 3 (Fig. 8A) . detectable hybridizable product in amplification reactions The specificity of the observed Southern hybridization for

Page 5: Genomic Organization of Human Surfactant Protein D - Journal of

3 5 5

3 2 6

FIG. 5 . The deduced amino acid sequence for the genomic clones (HSP-D) is compared to deduced amino acid sequence for rat SP-D cDNA ( top, Shimizu et al., 1992) and the H 1 3 HSP-D cDNA (bottom, Rust et al., 1992). The amino-terminal alanine of'the mature protein is shown in bold. The translated exon sequences are identical to the translated sequences for the human cDNA isolated by Lu c't a/ . (1992) except for the predicted methionine position a t residue 11 of the mature protein (underlined). However, there are three additional discrepancies between the collagenous sequence predicted from the genomic clones and the sequence of H13 cDNA; none of the "substitutions" would interfere with helix formation.

- kb 1 2 3 4 5 6 7 8 9

" 5

R

I

I

I 1 2

I

1 R 3

R I 1 1

1 1 4

1 5 R

". ~

= = = 4 1 4

I I 1

1 5 l I

ti 1 1

I 3 I 1 1

1 2 I

1 5

1 E

E

E 1 4

E

_ _ _ . . ~ ~ ~ ."

FIG. 6. Comparison of the predicted distribution for hydro- philic amino acids for the aligned collagen exons C2-C5. The distribution of the basic residues lysine ( K ) and arginine ( R ) is shown at the top; the distribution of the acidic residues aspartic acid (D) and glutamic acid ( E ) are at the bottom.

10-

5-

c

BglZ EcoRl HindIII Mspl Hind 111

FIG. 7. Southern blots of restriction digests of human and hamster genomic DNA. Aliquots of restricted human (lanes I, 3, 5 , 7, and 9) or hamster (lanes 2, 4, 6, and 8) genomic DNA were hybridized with radiolabeled SP-D probe as described under "Mate- rials and Methods." Lane 9 is a longer exposure of a second HindIII digest. Note that the SP-D probe hybridizes to a single large HindIII fragment (arrow). There is no detectable hybridization to hamster DNA.

SP-D gene sequences was further confirmed by amplification of the expected 620-bp fragment using a combination of exon- and intron-specific primers and detection oligomers (Fig. 8B). The use of intron-specific primers was considered to be a necessary precaution to further exclude possible amplification of related collagenous sequences in conglutinin or other col- lagen proteins which have not yet been characterized at the genetic level. Specifically, DNA from selected hybrids was amplified using a primer corresponding to bases cDNA se- quence (5'-oligomer, 15C) and a primer corresponding to the reverse complement of intron sequence within the human SP- D gene (3"oligomer; 31. Amplified DNA from these reactions was detected with a second, internal, intron-derived oligomer (32C) (Fig. 8B). Hybridizable products of identical size were amplified from DNA isolated from the three chromosome 10-

Page 6: Genomic Organization of Human Surfactant Protein D - Journal of

Chromosomal Localization of SP-D 2981

TABLE I Svntenv analysis of hamster-human hvbrids usin,? human SP-D cDNA

Human chromosome

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y cDNA

DNA/hybrid 212 P 324 + 423 507 + + + + + + 683 + + + + + +

+ - 734 + + + 750 + + + + + 756 + + + + + + + + + 803 + + + + + + 811 + 860 862 + + 854 + 867 904 + + + + + 909 + + + + + 937 + + + + + + 940 967 + + + 968 + + + 983 + + +

+ - - + -

- - -

+ - - + + -

+ + + + + + + - + -

+ + + + + + - + - -

- + + -

- + -

1006 + + + + + + + + 1049 1079 + + + + + + 1099 + + + + + +

- + + -

Hamster -

640-34p6-1C q + - +

B

620bp-

1 2 3 4 5 6 7

1 2 3 4 5 6 Hu Hu Ha 867 860 983 1079

Hu Ha 867 1% Hu Hu

FIG. 8. Southern blots of DNAs amplified by thermal cy- cling in the presence of Taq DNA polymerase. A, DNAs were amplified using primers corresponding to known exon-specific se- quences of human SP-D, and amplified DNAs were detected with a radiolabeled internal exon-specific oligomer as described under “Ma- terials and Methods” and “Results.” Lanes l and 2, aliquots of human genomic DNA from two donors; lane 3, total hamster genomic DNA; lane 4, DNA from Bios hybrid clone 867 (lacking chromosome 10); lanes 5-7, DNA from Bios clones 860, 983, and 1079 (containing chromosome IO). B, DNAs were amplified using a primer correspond- ing to known exon- and intron-specific sequences of human SP-D. Amplified DNAs were detected with an internal, intron-derived oli- gomer. Lane I , total human genomic DNA; lane 2, total hamster genomic DNA; lane 3, DNA from Bios hybrid clone 867 (lacking chromosome 10); lane 4, hybrid clone 640-34~6-1C (containing 1Oq); lanes 5 and 6, two independent isolates of human genomic DNA from volunteers. In both experiments, the major amplified DNAs comigrate with species amplified from the human genomic clones (H5 and H6).

containing hybrids, as well as from DNA isolated from hybrid 640-34-p6-1C, which contains only the long arm of chromo- some 10 and an intact chromosome 12 (Fisher et al., 1987). There was no detectable hybridizable product in amplification

reactions using hamster DNA or DNA extracted from ham- ster/human hybrid 867.

Fluorescence in Situ Hybridization of Metaphase Chromo- somes-To more precisely localize the SP-D gene sequences on chromosome 10, we performed fluorescence in situ hybrid- ization on metaphase spreads using the full-length biotinyl- ated genomic clones (H5 and H6) as probes (Fig. 9). To further exclude possible cross-hybridization with other collagen pro- teins or C-type lectins, parallel studies were performed using small genomic subclones encoding the CRD (H6S4 and H6E3) or collagen exons (H5E5) (data not shown). These spreads showed much weaker signals but identical localization. Giemsa banding confirmed the localization of all probes to the long arm of chromosome 10 at 10q22.2-23.1. More than 50 metaphase spreads were examined, and in all cases no consistent secondary-site hybridizations were observed.

DISCUSSION

We have isolated and characterized genomic clones encod- ing human SP-D. Sequences for seven exons obtained from two overlapping genomic clones show nearly complete identity with the human cDNA sequence. Sequencing demonstrated a hybrid exon encoding the 3’ end of the 5’-UTR, signal peptide, amino-terminal non-collagen sequence, and a short hydro- philic collagen sequence; four homologous 117-bp collagen- encoding exons, a single exon encoding the CRD and 3‘-UTR and a short exon encoding the connecting or linking peptide between the collagen domain and CRD.

Sequences for the linking peptide and predicted CRD are identical, and only four discrepancies were identified between available genomic and cDNA sequences, one within the amino-terminal non-collagen domain and three others within the collagen-encoding exons. One previously documented dif- ference between human SP-D protein sequence and deduced amino acid sequence (Phe + Pro at the 5‘ end of the H13

Page 7: Genomic Organization of Human Surfactant Protein D - Journal of
Page 8: Genomic Organization of Human Surfactant Protein D - Journal of

Chromosomal Localization of SP-D 2983

cannot be explained by simple tandem replication of a 54- or 45-bp unit), Runnegar (1985) hypothesized that fiber-forming collagens evolved through replication of 702-bp exons. Such a n evolutionary scheme would, of course, require the reintrod- uction of introns were during subsequent evolution, perhaps t o minimize homologous recombination events among colla- gen proteins. Thus, it is possible that the collagen domain of SP-D evolved from a primitive 117-bp exon that was also replicated to form larger exon units (e.g. 6 X 117 = 702) encoding a collagen sequence with a 39-amino-acid charge periodicity. In any case, the highly conserved distribution of hydrophilic residues within these exons raises the possibility of lateral interactions between the collagen domains.

One potential difficulty in interpreting the chromosome localization studies relates to possible cross-hybridization of the cDNA or genomic probes with sequences encoding other C-type lectins and/or collagenous proteins on chromosome 10. Our previous studies have established a close structural and functional relationship of the carboxyl-terminal domain of SP-D with the CRD of the C-type lectins. For example, SP-D demonstrates spatial conservation of all 17 of the “in- variant residues” that have been used to define the putative CRD of the C-type lectins, including 4 invariant cysteine residues that participate in intrachain disulfide bonds and that appear to be necessary for relatively high affinity inter- actions with saccharide ligands. On the other hand, the CRD of SP-D shows less than 50% homology (no gaps) in amino acid sequence with the corresponding domain in human MBP- C and SP-A (Rust et al., 1991), and only 76% (19/25) homol- ogy in amino acid sequence with a known peptide sequence of bovine SP-D.

Several lines of evidence support our contention that the hybridizing sequences on chromosome 10 correspond to SP- D. First, Northern assays using the SP-D cDNA-derived probe under conditions of high stringency showed no evidence for cross-hybridization with mRNAs encoding other C-type lectins (including SP-A in lung or MBP in liver). Second, given the chromosomal dispersion of collagen proteins and other closely related C-type lectin genes, the selective hybrid- ization of our cDNA-derived probe (and related genomic sequences) to chromosome 10 or chromosome 10-containing hybrids indicates that the hybridization conditions are strin- gent. Third, Taq polymerase amplification of positive somatic hybrids using primers corresponding to SP-D-specific colla- gen domain sequences yielded the expected amplification product. Because the protein and cDNA sequences of human conglutinin are not known, we also used a combination of collagen exon- and intron-specific primers to amplify SP-D sequences with Taq polymerase from the chromosome 10- containing somatic hybrids (Fig. 8). Assuming that intron sequences are divergent, the demonstration of the expected amplification product rules against cross-hybridization with a conglutinin gene.

Thus, the present studies provide strong evidence for the localization of the SP-D gene on human chromosome 1Oq, specifically in region 10q22.2-23.1. Two of the three previ- ously characterized collagenous lectins, SP-A and MBP, have also been localized to the long arm of human chromosome 10, to the regions of lOq21-24 and lOq11.2-q21, respectively (Bruns et al., 1987; Fisher et al., 1987; Sastry et al., 1991). The “close proximity” of these genes on 1Oq is particularly note- worthy given that the genes for collagen proteins and C-type lectins are widely dispersed in the human genome. Only two

other collagen proteins have been localized to chromosome 10, type XI11 collagen (10q22) (Shows et al., 1989) and bullous pemphigoid antigen (10q24) (Li et al., 1991), and both show a different intron/exon structure than SP-D. No C-type lectins other than SP-A and MBP-C have been localized to this chromosome. Interestingly, the gene for the a-chain of prolyl hydroxylase, the enzyme required for 4-prolyl hydroxylation in collagen proteins, has also been localized at 10 q21.3-23.1 (Pajunen et al., 1989). Although the existence of a cluster of collagenous C-type lectin genes could have involved duplica- tion of an ancestral gene encoding sequential collagenous and “mannose-specific” carbohydrate recognition domains, the comparatively low homology of the 117-bp collagen-encoding exons of SP-D with the collagen-encoding exons of SP-A and MBP, and the observed sequence differences in the exon encoding the linking peptide (L), indicates more complicated evolutionary relationships.

Acknowledgments-We thank Rhonda Taylor for technical assist- ance with the genomic sequencing and DNA amplification experi- ments, Dr. Carol Jones (Denver, Colorado) for providing the 640- 34~6-1C hybrid, and Janet North for excellent secretarial assistance.

REFERENCES Bruns. G.. Stroh. H.. Veldman. G. M.. Latt. S. A,. and Floros. J. (1987) Hum.

Genet. 76,58-62 ’

Crouch, E. C., Persson, A,, Chang, D., and Parghi, D. (1991a) Am. J . Pathol. 1 3 9 , 765-776

Crouch, E. C., Rust, K., Chang, D., Persson, A., Parghi, D., and Mariencheck, W. (1991h) Am. J. Resmr. Cell Mol. Biol. 5. 13-18

Crouch, E. C., Rust, K., Persson, A,, Mariencheck, W., Moxley, M., and %$more, W. (1991~) Am. J. Physiol. Lung Cell Mol. Physiol. 2 6 0 , L247-

, I _ , . .

Crouch, E., Persson, A,, and Chang, D. (1993) Am. J. Pathol. 142,241-248 Davis, L. G. , Dibner, M. D.,and Battery, J. F . (1986) Basic Methods in Molecular

Dovle. B.’B.. Hukins. D. W. L.. Hulmes. D. J. S.. Miller. A.. and Woodhead-

b L J J

Biology, pp. 58-61, Elsevler Sclence Publrshing Co., New York

GaliowaY,j. ( 1 9 7 5 i ~ . MOL 9 1 , 79-99 ’ I ~I

Drickamer, K. (1988) J. Biol. Chem. 263,9557-9560 Drickamer, K. (1989) Ciba Found. Symp. 145,45-61 Exposito, J-Y., Le Guellec, D., Lu, Q., and Gamone, R. (1991) J. Biol. Chem.

9 G G 91Q91-9109Q

Fisher, J. H., Kao, F. T., Jones, C., White, R. T., Benson, B. J., and Manson, “VI I*“*” ““l”

R. J. (1987) Am. J. Hum. Genet. 40.503-511 Floros, J., Steinbrink, R., Jacobs, K., Phelps, D., Kriz, R., Recny, M., Sultzan,

L.. Jones. S.. Taeusch. H. W.. Frank. H. A,. and Fritsch. E. F. (1986) J. B i d I ~I ~~I ~ ~~ I~~ I ~ ~~ ~

Chem. 261,’9029-9033

Herrmann, B. G., and Frischauf, A. M. (1987) Methods Enzymol. 152 , 180- Hawgood, S. (1989) Am. J. Physiol. Lung Cell Mol. Physiol. 1 , L13-L22

182 Hoffmann, H., Fietzek, P. P., and Kuhn, K. (1980) J . Mol. Biol. 141, 293-314 Jones, c. (1975) Somatic Cell Genet. 1 , 345-354 Kuan, S. F., Rust, K., and Crouch, E. C. (1992) J . Clin. Inuest. 90.97-106 Li, K., Sawamura, D., Giudice G . J., Diaz, L. A,, Mattei, M-G., Chu, M-L., and

Lichter, P., Cremer, T., Borden, J., Manuelidis, L., and Ward, D. C. (1988)

Lu, L. U., Willis, A. C., and Reid, K. B. M. (1992) Biochem. J. 284 , 795-802 Pajunen, L., Jones, T. A., Helaakoski, T., Pihlajaniemi, T., Solomon, E., Sheer,

Persson, A., Rust,, K., Chang, D., Moxley, M., Longmore, W., and Crouch, E.

Persson, A., Chang, D., Rust, K., Moxley, M., Longmore, W., and Crouch, E.

Persson, A., Chang, D., and Crouch, E. (1990) J. Biol. Chem. 265 , 5755-5760 Rigby, P. W., Deickman, M., Rhodes, C., and Berg, P. (1977) J . Mol. Biol. 113 ,

Runnegar, B. (1985) J. Mol. Euol. 2 2 , 141-149 Rust, K., Grosso, L., Zhang, V., Chang, D., Persson, A,, Longmore, W., Cai, G.

Z., and Crouch, E. C. (1991) Arch. Biochem. Bio hys 2 9 0 , 116-126 Sandell, L. J., and Boyd, C. D. (1990) in Extracefular Matrix Genes (Sandell,

L. J., and Boyd, C. D., eds) pp. 1-56, Academic Press, Inc., San Diego, CA Sastry, K., Herman, G. A. Day, L., Deignan, E., Bruns G., Morton, C. C., and

Ezekowitz, E. A. B. (19d9) J. Ex Med 170 , 1175-<189 Sellar G. C., Blake, D. J., and R e i J K . B.”. (1991) Biochem. J. 274,481-490 Shimizu, H., Fisher, J. H., Papst, P., Benson, B., Lau, K., Mason, R. J., and

Shows, T. B., Tikka, L., Byers, m. G., Eddy, R. L., Haley, L. L., Henry, W. M.,

Taylor, M. E., Brickell, P. M., Craig, R. K., and Summerfield, J. A. (1989)

Thiel, S., and Reid, K. B. M. (1989) FEES Lett. 250 , 78-8421 Weaver, T. E., and Whitsett, J. A. (1991) Biochem. J. 2 7 3 , 249-264 White, R. T., Damm, D., Miller, J., Spratt, K., Schilling, J., Hawgood, S.,

Uitto, J. (1991) J. Biol. Chem. 266,24064-24069

Hum. Genet. 80, 224-234

D., and Kivirikko, K. I. (1989) Am. J. Hum. Genet. 45,829-834

(1988) Biochemlstly 27,8576-8584

(1989) Biochemistry 27,6361-6367

237-251

Voelker, D. R. (1992) J. Biol. Chem. 267 , 1853-1857

Prockop, D. J., and Tryggvason, K. (1989) Genomics 5 , 128-133

Biochem. J. 262,763-771

Benson, B., and Cordell, B. (1985) Nature 317 , 361-363