mosaic structure of globular domains in the human type

9
The EMBO Journal vol.9 no.2 pp.385-393, 1990 Mosaic structure of globular domains in the human type VI collagen oz3 chain: similarity to von Willebrand Factor, fibronectin, actin, salivary proteins and aprotinin type protease inhibitors Mon-Li Chu, Rui-Zhu Zhang, Te-cheng Pan, David Stokes, Dorothy Conway, Huey-Ju Kuo1, Robert Glanville', Ulrike Mayer2, Karlheinz Mann2, Rainer Deutzmann2 and Rupert Timpl2 Departments of Biochemistry and Molecular Biology and Dermatology, Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA 19107, IShriners Hospital for Crippled Children, Portland, OR 97201, USA and 2Max-Planck-Institut flir Biochemie, D-8033 Martinsried, FRG Communicated by R.Timpl Human collagen a3(VI) chain mRNA (-10 kb) was cloned and shown by sequence analysis to encode a 25 residue signal peptide, a large N-terminal globule (1804 residues), a central triple helical segment (336 residues) and a C-terminal globule (803 residues). Some of the sequence was confirmed by Edman degradation of peptides. The N-terminal globular segment consists of nine consecutive 200 residue repeats (Ni to N9) showing internal homology and also siificant identity (17-25%) to the A domains of von Willebrand Factor and similar domains present in some other proteins. Deletions were found in the N3 and N9 domains of several cDNA clones suggesting variation of these structures by alternative splicing. The C-terminal globule starts immediately after the triple helical segment with two domains Cl (184 residues) and C2 (248 residues) being similar to the N domains. They are followed by a proline rich, repetitive segment C3 of 122 residues, with similarity to some salivary proteins, and domain C4 (89 residues), which is similar to the type III repeats present in fibronectin and tenascin. The most C-terminal domain C5 (70 residues) shows 40-50% identity to a variety of serine protease inhibitors of the Kunitz type. The whole sequence contains 29 cysteines which are mainly clustered in short segments connecting domains Ni, Cl, C2 and the triple helix, and in the inhibitor domain. Five putative Arg-Gly-Asp cell-binding sequences are exclusively localized in the triple helical segment. The globular domains of human cal(VI) and a2(VI) chain [Chu et al. (1989) EMBO J., 8, 1939-1946] are analogous to Ni, Cl and C2 of a3(VI) chain, indicating that its extra domains have special functions in type VI collagen. Key words: alternative splicing/microfibrillar collagen/ polypeptide sequences/protease inhibitor/von Willebrand Factor Introduction Type VI collagen is a major component of a group of ubiquitously occurring tissue microfibrils and consists of three polypeptide chains, al(VI), a2(VI) and a3(VI) (Timpl Oxford University Press and Engel, 1987). As shown by electron microscopy (Furthmayr et al., 1983; Jander et al., 1984; von der Mark et al., 1984) these chains are assembled into a short triple helix of 105 nm in length, flanked on each side by a large globular domain. This structure was confirmed by complete sequence analysis of human (Chu et al., 1988, 1989) and chicken (Bonaldo et al., 1989; Koller et al., 1989) aI(VI) and a2(VI) chains each containing - 1000 amino acid residues with about one-third of them contributing to the triple helix. Both chains share - 35 % identical residues (Chu et al., 1989) and their genes are located in close proximity on human chromosome 21 (Weil et al., 1988). Their globular structures consist of three 200-residue repeats with similarity to domains found in von Willebrand Factor, cartilage matrix protein, some integrins and complement components (Bonaldo et al., 1989; Chu et al., 1989; Koller et al., 1989). This was interpreted to indicate homo- and heterotypic binding of these globular domains during self assembly and matrix formation. The a3(VI) chain is apparently quite different from the a1(VI) and a2(VI) chains with molecular mass estimates ranging from 200 kd in tissues (Trueb and Winterhalter, 1986) to 250-260 kd in cell cultures (Engvall et al., 1986; Colombatti and Bonaldo, 1987; Colombatti et al., 1987). The size of the triple helical sequences is virtually identical in all three chains but there are some structures contributed exclusively by the a3(VI) chain which are essential for oligomer formation (Chu et al., 1988)' In addition, the selective down-regulation of a3(VI) chain synthesis by -y-interferon was shown to be rate limiting for the secretion and matrix deposition of type VI collagen (Heckmann et al., 1989). These data predict further unique structural elements in the a3(VI) chain not existing in the other type VI collagen chains. In the present study we have determined the whole sequence of the a3(VI) chain. The data show structures homologous to the a 1 (VI) and a2(VI) chains and 11 additional domains including some with similarity to von Willebrand Factor, fibronectin, the trypsin inhibitor aprotinin, and an unusual 122 residue sequence repeat. Results Isolation and characterization of cDNA clones Three cDNA clones (P24, P108 and F376) (Figure 1), covering 3 kb of a3(VI) chain mRNA, have been isolated previously from a placenta and a fibroblast cDNA library and were partially sequenced (Chu et al., 1988). Three strategies were used to extend the clone repertoire to nearly the full mRNA length. The insert of P108 was used to screen an oligo(dT) primed fibroblast cDNA library. Three clones (FM12, FM20 and FM22) with insert sizes of 3.5 -3.8 kb were characterized and they all shared a 2.2 kb EcoRI fragment with a short poly(A)+ tail at the 3' end. The sequence upstream of this EcoRI site matched the sequence at the 3' end of P108. Using a 0.4 kb EcoRI-HindHI 385

Upload: lengoc

Post on 10-Feb-2017

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mosaic structure of globular domains in the human type

The EMBO Journal vol.9 no.2 pp.385-393, 1990

Mosaic structure of globular domains in the human typeVI collagen oz3 chain: similarity to von WillebrandFactor, fibronectin, actin, salivary proteins and aprotinintype protease inhibitors

Mon-Li Chu, Rui-Zhu Zhang, Te-cheng Pan,David Stokes, Dorothy Conway, Huey-JuKuo1, Robert Glanville', Ulrike Mayer2,Karlheinz Mann2, Rainer Deutzmann2 andRupert Timpl2Departments of Biochemistry and Molecular Biology and Dermatology,Jefferson Institute of Molecular Medicine, Thomas JeffersonUniversity, Philadelphia, PA 19107, IShriners Hospital for CrippledChildren, Portland, OR 97201, USA and 2Max-Planck-Institut flirBiochemie, D-8033 Martinsried, FRG

Communicated by R.Timpl

Human collagen a3(VI) chain mRNA (-10 kb) wascloned and shown by sequence analysis to encode a 25residue signal peptide, a large N-terminal globule (1804residues), a central triple helical segment (336 residues)and a C-terminal globule (803 residues). Some of thesequence was confirmed by Edman degradation ofpeptides. The N-terminal globular segment consists ofnine consecutive 200 residue repeats (Ni to N9) showinginternal homology and also siificant identity (17-25%)to the A domains of von Willebrand Factor and similardomains present in some other proteins. Deletions werefound in the N3 and N9 domains of several cDNA clonessuggesting variation of these structures by alternativesplicing. The C-terminal globule starts immediately afterthe triple helical segment with two domains Cl (184residues) and C2 (248 residues) being similar to the Ndomains. They are followed by a proline rich, repetitivesegment C3 of 122 residues, with similarity to somesalivary proteins, and domain C4 (89 residues), whichis similar to the type III repeats present in fibronectinand tenascin. The most C-terminal domain C5 (70residues) shows 40-50% identity to a variety of serineprotease inhibitors of the Kunitz type. The wholesequence contains 29 cysteines which are mainly clusteredin short segments connecting domains Ni, Cl, C2 andthe triple helix, and in the inhibitor domain. Five putativeArg-Gly-Asp cell-binding sequences are exclusivelylocalized in the triple helical segment. The globulardomains of human cal(VI) and a2(VI) chain [Chu et al.(1989) EMBO J., 8, 1939-1946] are analogous to Ni,Cl and C2 of a3(VI) chain, indicating that its extradomains have special functions in type VI collagen.Key words: alternative splicing/microfibrillar collagen/polypeptide sequences/protease inhibitor/von WillebrandFactor

IntroductionType VI collagen is a major component of a group ofubiquitously occurring tissue microfibrils and consists ofthree polypeptide chains, al(VI), a2(VI) and a3(VI) (Timpl

Oxford University Press

and Engel, 1987). As shown by electron microscopy(Furthmayr et al., 1983; Jander et al., 1984; von der Market al., 1984) these chains are assembled into a short triplehelix of 105 nm in length, flanked on each side by a largeglobular domain. This structure was confirmed by completesequence analysis of human (Chu et al., 1988, 1989) andchicken (Bonaldo et al., 1989; Koller et al., 1989) aI(VI)and a2(VI) chains each containing - 1000 amino acidresidues with about one-third of them contributing to thetriple helix. Both chains share - 35% identical residues (Chuet al., 1989) and their genes are located in close proximityon human chromosome 21 (Weil et al., 1988). Their globularstructures consist of three 200-residue repeats with similarityto domains found in von Willebrand Factor, cartilage matrixprotein, some integrins and complement components(Bonaldo et al., 1989; Chu et al., 1989; Koller et al., 1989).This was interpreted to indicate homo- and heterotypicbinding of these globular domains during self assembly andmatrix formation.The a3(VI) chain is apparently quite different from the

a1(VI) and a2(VI) chains with molecular mass estimatesranging from 200 kd in tissues (Trueb and Winterhalter,1986) to 250-260 kd in cell cultures (Engvall et al., 1986;Colombatti and Bonaldo, 1987; Colombatti et al., 1987).The size of the triple helical sequences is virtually identicalin all three chains but there are some structures contributedexclusively by the a3(VI) chain which are essential foroligomer formation (Chu et al., 1988)' In addition, theselective down-regulation of a3(VI) chain synthesis by-y-interferon was shown to be rate limiting for the secretionand matrix deposition of type VI collagen (Heckmann et al.,1989). These data predict further unique structural elementsin the a3(VI) chain not existing in the other type VI collagenchains. In the present study we have determined the wholesequence of the a3(VI) chain. The data show structureshomologous to the a 1 (VI) and a2(VI) chains and 11additional domains including some with similarity to vonWillebrand Factor, fibronectin, the trypsin inhibitoraprotinin, and an unusual 122 residue sequence repeat.

ResultsIsolation and characterization of cDNA clonesThree cDNA clones (P24, P108 and F376) (Figure 1),covering 3 kb of a3(VI) chain mRNA, have been isolatedpreviously from a placenta and a fibroblast cDNA libraryand were partially sequenced (Chu et al., 1988). Threestrategies were used to extend the clone repertoire to nearlythe full mRNA length. The insert of P108 was used to screenan oligo(dT) primed fibroblast cDNA library. Three clones(FM12, FM20 and FM22) with insert sizes of 3.5 -3.8 kbwere characterized and they all shared a 2.2 kb EcoRIfragment with a short poly(A)+ tail at the 3' end. Thesequence upstream of this EcoRI site matched the sequenceat the 3' end of P108. Using a 0.4 kb EcoRI-HindHI

385

Page 2: Mosaic structure of globular domains in the human type

M.-L.Chu et al.

Protein,' M9,@mN 7,N7 6N MS, M4, N3, H2 ,,N I TH ,,C1 ,, C?2 pC3C411C5

I EXHcONA

1

L*,# ss

L, ss,

P B

2

P E P P E P E H XIP H1 n' I I I , I,

3 4 5 6S 27 *^% -*-sP 308

R 3011R314,

P E P

7 8FM 12

FM 20

FM 22

Tx9 10 kb

AA

F 3761

P 108

I P24FO 1

FO 4FO 13

FO 19FO 22F 024FO 9

E: Eco R IP: Pst IH: Hind IJlB: Bom HIX: Xbo I

~ FFOR 6 i

Fig. 1. Alignment and restriction maps of cDNA clones encoding -9.9 kb of human a3(VI) chain mRNA. Potential splice sites in clones S27, FOI,F04 and F013 are indicated by a dashed line. A denotes a poly(A) tail. SP 308 locates the position of a primer used to extend the library. On topthe cDNA sequence is correlated with various globular domains (NI to N9, Cl to C5) and the triple helical segment (TH) predicted for the a3(VI)chain (see Figure 7). ATG and TAA show the beginning and end of the open reading frame.

CAGTTTGGAGCICAGTCTTCCACCAAAGGC 30

CGTTCAGTTCTCCTGGGC TCCAGCCTCC TGCAAGGAC TGCAAGAG TTTTcc TCCGCAGCTCTGAGTCTCCACTT T T TY TGGTGGAGAAGGCIGCAAWGAAWGAGACGCAGTGAGTGGGAAAAGTATGCATCCTATTCAAACCTAATTGAATCGAGGAGCCCAGGGACACACGCCTTCAGGT T ITGCTCAGGGGTTCATATTTGGTGCTTAGACAAATTCAAhTGAGGAAACATCGGCAC TTGCCCTTAGTGGCCGTC TTTI"CTCrcTTTCTCTCAGGC TTTCCTACAACTCATGCC

M R K H R H L P LV AV F( )L F L S G F P T H A

CAGCAGCAGCAAGCAGCACAAGACTCTGCTGACAT tATTTTCCTTATTGATGGATCAACAACACCGGAAGTGTCAAT T tCGCAGTCAT TC TCGAC TTCCTTGTAMATC TCCTTGAGA4ACTCCCMATTGGAAC TCAGCAGATCCGAGTGQ Q Q Q A A Q D S A D I I F L I D G SIN N TIG S V N F A V I L D F L V N L L E K L P I G T Q Q I R V

GGGGTGGTCCAGTTTAGCGATGAGCCCAGAACCATGTTTTCCT TGGACACC TACTCCACCAAGGCCCAGG TTC TGGGTGCAGTGAAGCCC TCGGTTL TGGTGGGAGTTGGCCAATATCGCTCGCCTTGATTTCGTGGTGGAGAACCACG Y V Q F S D E P R T H F S L a T Y S T K A Q V L G A V K A LGC )WW E L A N I A R L D f V V E N H

TTCACCCGGGCAGGGGGCAGCCGCGTGGAGGAAGGGGTTCCCCAGGTGCTGGTCCTCATAMGTGCCGGGCC TTCTAGTGACGAGATTCGCTACGGGGTGGTAGCACTGAAGCAGGC TAGCGTGTTC TCATTCGGCCTTGGAGCCCAGGCCF T R A G G S R V E E G V P Q V L V L I S A G P S S 0 E I R Y G V V A L K Q A S V F S F G L G A Q A

GCCTCCAGGGCAGAGCTTCAGCACATAGCTACCGATGACAACTTGGTG TTTACTGTCCCGGAATTCCGTAGCTTTGGGGACCTCCAGGAGAAATTAL TGCCGTACATTGTTGGCGTGGCCCAAGGCACATTGTC 7 TGAAACCGCCAACCA S R A E L Q H I A T D D N L V F T V P E F R S F G D L Q E K L L P Y I V G V A Q R H I V L K P P T

ATTGTCACACAAFTCATTGAAGTCAACAAGAGAGACATAGTCTTCCTGGTFGGATGGCTCATC TGCACTGGGAC TGGCCFAACT CAATGCCATCCGAGACTTCATTGCTAAGCATCCAGAGGC TGGATCLGGACAGGATCTTATCCAGI v T Q V I E V N K R D I V F L V D G S S A L G L A N F N A I R D F I A K V I Q R L E I G Q D L I Q

:TAGACTTTGTTCGTV A V A Q Y A D T V R P E F Y F N T H P T K R E V I T A V R K N K P L D G S A L Y T G S A L D F V R

AACAACCTATTCACGAGTTCAGCCGGCTACCGGGC TGCCGAGGGGAT'TCCTAAGCTTTTGGTGCTGATCACAGGTGGTAAG TCCC TAGATGMAAACAGCCAGCC TGCCCAGGAGC TGAAGAGAGCAGCATAATGGCC TTTGCCATTGGGN N L F T S S A G Y R A A E G I P K L L V L I T G G K S L D E I S Q P A Q E L K R S S I H A F A I G

AACAAGGGTGCCGATCAGGCTGAGCTGGAAGAGATCGCTTTCGACTCCTCCCTGGTGTTCATCCCAGCTGAGT TCCGAGCCGCCCCATTGCAAGGCATGCTGCCTGGCTTGCTGGCACCTCTCAGGACCC TCTCTGGAACCCC TGAAGTTN K G A D Q A E L E E I A F D S S L V F I P A E F R A A P L Q G M L P G L L A P L R T L S G T P E V

CACTCAAACAAAAGAGATATCATCTTTCTTTTGGATGGATCAGCCAACGTTGGAAAAACCAATTTCCCTTATGTGCGCGACTTTGTAATGAACCTAG TAACAGCC fTTGATATTGGAAATGACAATATTCGTGTTGGTTTAGTGCAATTTH S N K R 0 I I F L L D G S A N V G K T N F P Y V R D F V N N L V H S L D I G N D N I R V G L V Q F

AGTGACACTCCTGTAACGGAGTTCTCTTTAAACACATACCAGACCAAGTCAGATATCCTTGGTCATCTGAGGCAGC TGCAGCTCCAGGGAGGTTCGGGCCTGAACACAGGCTCAGCCCTAAGCTATGTCTATGCCAACCACTTCACGGAAs o T P v T E f S L N T Y Q T K S D I L G H L R Q L Q L Q G G S G L N T G S A L s Y v Y A N H F T E

GCTGGCGGCAGCAGGATCCGTGAACACGTGCCGCAGCTCCTGCTTCTGCTCACAGCTGGGCAGTCTGAGGACTCC TATTTGCAAGCTGCCAACGCCT'GACACGCGCGGGCATCCTGACTIRXGTGGGAGCTAGCCAGGCGAATAAGA G G S R I R E H V P Q L L L L L T A G Q S E D S Y L Q A A N A L T R A G I L T F jWV G A S Q A N K

GCAGAGCTTGAGCAGATTGCMTAMCCCAAGC TGGTGTATCTCATGGATGAMCAGCTCCCTGCCAGCTTTGCCTCAGCAGCTGATTCAGCCCCTAACCACATATGT-rAGTGGAGGTGTGGAGGAAGTACCACTCGCTCAGCCAGAGA E L E Q I A F IN P S IL V Y L M D D F S S L P A L P Q Q L I Q P L T T Y V S G G V E E V P L A Q P E

AGCAAGCGAGACATTCTGTTCCTCrTTTGACGGCTCAGCCAATCTTGTGGGCCAGT'CCCTGT-TGTCCGTGACTTTCCTCACAAGATTATCGATGAGCTCAATGTGAAGCCAGAGGGGACCCGAATTGCGGTGGCTCAGTACAGCGATGATS K R D I L F L f D G S A N L V G Q F P V V R 0 F L Y K I I D E L N V K P E G T R I A V A Q Y S D D

V K V E S R F D E H Q S K P E I L N L V K R N K I

AGCCGGATCGAGGACGGAGTGCTTCAGTTCCTGGTGCTGCTGGTCGCAGGAAGGTCATCTGACCGTCTGCGATGGGS R I E D G V L Q f L V L L V A G R S S D R V 0 G

GcAGCAGATCTGCTGTCTCCAGCGTTTATCCTGGCTGCAGAGTCGCTTCCCAAGAITGGAGATCTTCATCCACAGJE Q I V L S P A F I L A A E S L P K I G D L H P Q

CTTGATGGCtCTGAGGGCGTCAGGAGCGGCTTCCCTCTGTTGAAAGAGMGTCCAGAGAGTGGTGGAGCCTGCL D C S E G V R S G F P L L K E f V Q R V V E S L

TCATGTACAACCACCGTCAACGCTCGCCCCAGCIGACCCTGCTGGGACGGCCGACCCCCAACACCS Y N N K Q D V V N A V R Q L T L L G G P T P N T

CAGCTGCTGATCGTCCTCACGGCCGACAGGTCTGGGGATGATGTGCGGAACCCCTCCGTGGTCGTGAAGAGGGGTQ L L I V L T A D R S G D D V R IN P 51V V V K R G

ITTGCCGTGGCCATTCCCACCTTTCGCCAGCTGGGGACCGTCCAACAGGTCATCTCTGAGAGGGTGACCCAGCTCF A V A I P T F R Q L G T V Q Q V I S E R V T Q L

GTGGCTCTTCTCATCGATGGGTCCCAAAGTGCCGGGCCTGACTTCCAGTACG TTCGCACCCTCATAGAGAGGC TGV V F L I D G S Q S A G P E F Q Y V R T L I E R L

TTCCTGCTGAACGCCCATTCCAGCAAGGATGAAGTGCAGAACGCGGTGCAGCGGC TGAGGCCCAAGGGAGGGCGGF L L N A N S S KD E V QN A V Q R L R P K G G R

K T G K A L N L G Y A L 0 Y A

:CAGCAAGTAACCTGAAGCAGAGTGGGGTTGTGCCTTTCATCTTC(P A S H L K Q S G V V P F I f

kTAGTGAACTCTTAAAATCAGTGCACAACGGAGCACCAGCACCAII V N L L K S V HN G A P A P

ATGTGGGCCAGGACCGGGTCCGCGTGGCCGTGGTGCAGTACAGCCD V C Q D R V R V A V V Q Y S

iGGGCCGCCCTGGAGMGTCCTCGAACATCCTGCTCAGCCTCC A A L E F V L R N I L V S S

;GGGCTGTGCCCATCGGCATTGGCATCGGGAACGCCGACATCACAIG A V P I G I G I G N A D I T

CACCCGCGAGGAGCTGAGCAGGCTGCAGCCGGT R E E L S R L Q P

GTTGACTACCTGGACGTGGGCTCTGACACCV 0 Y L 0 V G F 0 T

ACAACGTGGGCAATGCCCTGGAGTACQ I N V G N A L E Y

Q R Y I F

:AAGCCAAGAACGCAQ AK N A

iTTTCAGGTGAAAAGAV S C E K

iACCGGACCAGGCCCCD R T R P

;CGCGCAGCACCATAA C S R I

AGATGCAGACCATCE N Q I

6TGAAGTCTGCTGGC 2430V K S A G 700

6ACCCTGCTGAGTTA 2580D P A E L 750

6ACGTGGTG6TTCTG 2730D V V F L 800

;AGTTCTACCTGAAT 2880E F Y L N 850

kCACAGGTGTGCCC 3030T E C V P 900

TCCTTCATCCCGGAC 3183S F I P D 950

STGTTGCAGCCTCTACCGAGCCCAGGTGTTGGTGGCAAGACGGAC 3330V L Q P L P S P G V G G K R 0 1000

kCCCGGGTGGC TGTCATCCAGTTCAGCGATGACCCCAAGGCGGAG 3480T R V A V I Q F S 0 0 P K A E 1050

STGTCCAGGAACATCTTCAAGAGGCCCCTGGGGAGCCGCATTGAA 3630V S R N I F K R P L G S R I E 1100

180

330-1

48350

630100

760150

930200

1080250

1230300

1380350

1530400

1680450

1830500

1980550

2130600

2280650

386

GTGGCAGTGGCCCAGTATGCAGACACTGTGAGGCCTGAATTTTATTTCAATACCCATCCAACAAAAAGGGAAGTCATAACCGC 7G TGCGGAAAATGAAGCCCCTGGACGGCTCGGCCCTGTACACGGGCTCTGCTC

GTCAAGGTGGAGTCCCGTTTTGATGAGCACCAGAGTAAGCCTGAGATCCTGAATCTTGTGAAGAGAATGAAGATCAAGACGGGCAAAGCCCTCAACCTGGGCTACGCGCTGGACTATGCACAGAGGTACATTTTTG

ArG:CC;Cc

GGGlrA

CG:GA)

AAArGctG

CT(kGArG(

CG'

GG

GC

Page 3: Mosaic structure of globular domains in the human type

Structure of globular domains in type VI collagen a3 chain

GAGGCTCCCAGTCTGGTCCCACTGTCGGAAGTCGA GTGGG GGTC T CCGGCG TGGGC CAGCATTGGGTGC CCTTTACATCCCAGACG AGCCGGAGACTGTAAAT TCG3TG 808E O P Q F L V I I S S G K 5 0 0 E V V V P A V E I K Q F G V A P F T I A R N A 0 Q K E L V K I S 1 1150

VS P E Y V F S V 5 T F R E I P 5 L E Q K L I T P I T T L T S E Q I Q K I L A S T R Y P P P A A E S D 1200

GCTGAGAATTGCTTCTGTCGAAGCCTGGGGATTAGCCGATGCTTGCAATATCGGAMTTTACAGATTGTCG.AGCTCACACGGCCCATA.AGTGGAGTGGGTCGGCATTCGCAAG 40808A A 0 I V F I I 0 5 5 E G V R P 0 G F A H I R D F V S R I V R R L N I G P S K V R V G V V Q F S N 0 1250

GTCTTCCAAATTCTATTGAACCACGATCCGGCCCG TG TGACC ATCG CCCTGGGCTCGAGGGG TCCCC GAAACTGGAAGC C T GAA TTG G CAAGAACT TTTGTTAG CTC 423023VF P E F Y 1 K T Y R S Q A P V 1 0 A I R R L R 1 R G G S P 1 N T G K A L E F V A R N L F V K S A G 1300

8 I E 0 G V P Q H L V L V 1 G G K S Q 0 0 V 5 R F A Q V I R S S G I V 5 L G V G 0 R N 0 R T E 1 1350

QT I T N 0 P R L V F T V R E F R E L P N I E E R I M N S F G P S A A T P A P P G V 0 T P P P S R P 1400

EK K K A 0 I V F L 1 0 G S I N F R R 0 S F Q E V L R F V S E V 0 T V Y K 0 G 0 S Q V G L V Q Y 1450

N 5 0 P T 0 E F F I K D F S T K R Q I I 0 A I N K V V 0 K G G R H A N T K V G L E H L R V N H F V P 1500

GAGGAGGAGCGCCGGACAGGGGCCCTCAATTCCTTGTGTCAGGGGGAAGTGGTGAAATGACAGATGGAGCTGCCCTCACCAAGGGGGTCAAG G T TG TGTTGAGTAGGATACGAT4G480E A C S R 1 0 Q R V P Q I A F V I T G G K 5 V E 0 A Q 0 V S L A L T Q R G V K V F A V G V R N I 0 S 1550

EEVG K I A S N S AT A F ROVG N V Q E L SE I S E Q VILE T F 00D A ID 0E ITL(C P GOVT 0 A A K 1600

AIJC N LO0V I L G FOG SR OQ N VF VA Q KG FE SKOVO0A IL N R I 50Q9 H R VSCC)S G G R S P 1650

TV R V 5 V V A N T P 5 G P 0 E A F 0 F 0 E T Q P E N I E K F R N N R 5 Q H P 0 V L T E 0 T 1 K 0 T 1700

I N K F R Q 5 5 P 0 5 V K 0 0 I H F T 0 G A 0 G 0 1 A 0 1 H R A S E N L 8 Q E G 0 R A I L 0 G I E 1750

RV V N I E R L N H I E F G R G F N Y 0 R P 1 R I N 1 1 0 1 0 T E I A E Q 1 0 N I A E K A C(-X)G~ V P 1800

K0SG Q ff-C]JR G P I G S G P K G I P G E 0 G Y R G Y P 6 0 E G G P 6 E 8 G P P G V E~Q G 1850

F Q CC)P GEQ R G60K G S R C F P'GGE'KGE V G E I161L0 G 1 0 G E 060DK G 1 P G S S G E K G N P 1900

6 R ~ K G P R G E K S E V G P G N P G Q 3 S Q E 0 G P K G, E T G 3 1 G P 1 G V P 5 053o

R 0 G V P G G P G E T G K H G G F G R R G P P'S V K'GS1 K'G G P G Q P G F E G E G T R G A Q G P V 2::73

A0GGTVCCTCTGGTCCCTCCGGGCTGATACGGAGAASCAGGC TTCTG ACVCTGG GGCVGGA"CGGAGGGCCCGTGGCkCCkTCCTGG4"kCAGGVC~t,-CCCSTcCC~C 7C~kCSCCCTGSG,CCTCACACCTkAkVTC2ACI:A 5533

I S H P G P R G ET GODD G RODG VG S SEG 80R G K K GE V GCF 7'G Y F'G P K'G N P G E P G L 27c3

I - -SG:CC4CTCCATCCA A7~CCCTC0CCVJAGCAVC 57083TG P K' G R G R R G N S S P P C V G Q K G V P C 0 P G P V G P 4 G5 VASDLC I Q SI 2153

K O K KWPCXQ) 0C PLE CS) P V F POTE I A F 01006T E G d N Q 00T F GOM903V V LS I V N V I T 22033

I A E S N E)PP06 80 AVA 00VT0 NHE VOTTE IR F A00 K R K S V 110DK I1K N L0 V AVLT SK 2250

QQS I E 0 A H S F V A R N T F K R V R N G F I H R K V V V F F S N 0 P T R A S P Q 1 R E A V I K 1 2333

T,rG.ICATGMT TTGGACATC1,GCACATCGACCCATCCT GCGTGGCTCTCGGCGAACGAGATAGG~CTGCTGTTACTGCGGTAACCACTTC 7530T5)H VkE) 101 (P) N 10 P 5 (C)GSFGS H R P S FRODRR AA GSODV n 040 F ILDS AKE OT TIT F 2400

CAGTCAATAGATAAGAGTAATAGGTACTGGTAGACACTGACAGAGCCAr.TCCCAGGCTCCAGCATTCGCAGGTGGAGTTTGCACACGGCCTCTGGTCCTGG,,CAAGCCGCATCCACTG 76838QF N E H K K 0 I A Y 1 V R Q 1 0 M S P 0 P K A S Q H F A R V A V V Q H A P 5 E S V 0 M~3 P P V 2450

AAGGGGAnCTCCTGCTGATATGCTCAAGAGAACTGTGGATTCTCAGAGGGAATACAAGTTCAGGI,ACAGGCC TAGCAGGCCTTGATACCCATGAGATGT TTGWGTCCCAAACC 38303K V E F 5 L T 0 Y G 5 K E K I V 0 F 1 5 R G N T Q 1 Q G T V A I S S A E Y T E N V F E S A P N P 2500

CGGGACCTGBAAMTTGTGGTCCTGATGCTGACGGGCG.AGGTGCCGGAGCAGCAGCTGGAGG.AGGCCCAGAG.AGTCATCCTGCAGGCC0 tA.AGG~A.SGCTACTTCTTCGTGGTCCTGGGCATTGGCAGGAGGTOG0,ACATCA.AGGAGGTA 7080R0 1 K I V V I M 1 T G E V P E Q Q 1 E E A Q R V I 1 0 A K ) K G Y F F V V I S G 8 K V N I K E A 2550

TACACTTGCCGTGGCCACGCGTTTTTCAATAGTGACAGTCACGAGTCACGGGACC TTGTGCCTCGGAGG T TTCCGCC TCTCACAG GAA.AGC MTA TTTCCCAGTATA 81303Y 0 F A S K P N 0 V F F K I V 0 K 5 0 K I N K K P 1 N 8 F G 8 1 1 P 5 F V S S K N A F Y 1 5 P D I 8 2600

AAACGUJATTGTTCAAGGGCCACCCCAAGAACTGTGAGTTGG CACACAGTAATGTCCAATACGTACTTC.AGTCCTACTCCACCCGTGACAC ACGAGCGGTAC ACGCGAGCCG 82808K Q(9)D W F Q C 0 Q P 0 K N I V K F S H K Q V N V P N S S P 0 5 N P V 0 0 0 K P V 0 0 0 K P V 2650

ACCACACACAGCCTTMCACCCMCAAGCTGGACATTTAATCAGCATTGTAAGCAGCGC GCAAGCCGCCC T CGAACCTGTGCTGCAACC T TGCCCWGAGGCAC GTTAACCC 843030 0 0 K P V 0 0 0 0 K P V 0 I N Q P 5 V K P A A A K P A P A K P V A A K P V A 0 K 0 A 0 V 8 P P 2700

GTGCGGTGAGCCGCACACAGGAACCGTGCGCAAGCAGCGCG AAGCCCC GC GCTGCTGCAAACAGT GC ACCAAGC7GAGTCCTGGCACAGCAGCCAACCACTCCCCAGCCGC 858058V A V K P A 0 A A K P V A A K P A A V R P P A A A A K P V A 0 K P K V P 8 P Q A A K P A A 0 K P A 0 2750

ACTAGCCGTGTTMATGTCCTGAGTCAGGGTTGAGTMCr.AAACGCGCWCTCCCTGGAGGGC TGACCCCCG T CTATT TTTGACTACCG 7CACTAGCCATATCGTCCTGTT 873073T K P V V K H 1 R K V Q V F K I 0 K N S A K I H W K 8 P K P P G P Y F Y 0 1 0 V 0 5 A H 0 Q S I V 1 2800

AAGCAGAACCTCACGGTCACGGACCGCGTCATTGGAGGCCTGCTCGCTGGGCAGACATACCATGTGGCTGTGGTC IGKTACCTGAGGTCTCAGGTCAGAGCCACCTACCACGGAAGTTTCAGTACVAAGAAATCTCAGCCCCCACCTCCA 8880K Q I~ V 0 D R V I G G I I A G Q 0 Y H V A V V(~V 1 8R S V R A 0 Y H 6 5 F 5 0 K K S Q P P P P 2850

TCAGCTTTAGTTCACCATCTCTAATGTGAGCCAGAACATTGGCCTCACTMA""TAAT,GTGTGCCGGAACGTAGAACTTVAGGr.MCATATA TATGAACTATGTCCAMCCC 9030Q P AR S A SS S T IN I N V STE P1410K E 001 (~)K 4. P K D0KG 0TS)R80 F I LK W400Y0 P N 0 2900

MAVAGCTN5GCAAGATTCTGGTATGGAGGTT TGGTGGAAACGAANAACHAATTTGGATCACAGAAAGAAT TGAAAAGGTMO GCTCCTGTGCTCGCCAAACCCGGAGSTCATCAGTGTGATGGGAACCTAAGCGTGGGTGGCCAACATC 9180K S ) A R F HOY GEG(9G G N E N K F G SQK K Ej) K K V (90A P V I A K P GOV I S V M 60' 2943

ATATCCTTTGAGAAAAGAGTAGCATCGCAATTGCTCGTAGAGCCCGGTGAGATCCCTTGAC GTATATTCATCTTTGAACA TCAAC CGGAGGAACTCC GC GCAGAC TATAGTT 93303

TGCTATGGTCGTGACCTCGTCTCGTCTCCGCATTCTCCGATACGAAGTGTGTAAAGTAGCACTCTGTG TTAGTAACTTC TACAACCAATTCC CTGAGTTTATGrTAGCCGTG 94808

GGCAATGTAAGCTAGAAAAATGAAAGTCAGGCACTCATAACTTTGCTGGTCArTTMTCCTTTAGTAAGATGC TTAGAGGGAGCCTGTGATCTGGAAAAAAGAACCAC MTCATC 96303

TGCCCCATTTCCAGCTAGTTTAAGCTAATCTMCTGAGCCCTACAAAGATTAGTCAMGGAGCAAATCCTTATCTTGTCACCTTGGACCATGCTTATTAAGATTTAAAAGTTTA 97808

AGAGATATTTTGCATCCTTCATGTTTGTTTTTTTTTTTTGTGTGAGGGGGGATTTATTTAATTTAAAT TTTAGAAAMTACAAGAAC T TTATAAGTAATTAAAG TTAAAAA 90303

Fig. 2. Nucleotide and deduced am-ino acid sequence of the human ci3(VI) chain. A polyadenylation signal in the 3' non-coding region is underlined.Open arrow heads denote two deletions found in several cDNA clones (see Figure 1). The predicted signal peptide cleavage site (von Heijne, 1986)is numbered -1. The second best prediction (-51-6) is less compatible with the overall structure of signal peptides (von Heijne, 1985). Thebeginning and end of the triple helical s,egment (Chui et al.,- 1988) is, marked by solid arrow heads. Cys, residues are circled and potentialN-glycosylation sites (NXT, NXS) and cell binding sites (RGD) are shown in boxes. Peptide sequences confirmed by Edman degradation areunderlined. An asterisk after P and K indicates hydroxylation as determined previously by Edman degradation (Chu et al., 1988). Overlinedsequences in the 3' non-coding region are T-rich segments implicated in rnRNA stability.

387

Page 4: Mosaic structure of globular domains in the human type

M.-L.Chu et al.

fragment from the 5' end of F376, we isolated from a primerextension library an overlapping 1.6 kb clone S27. Probinga random primed cDNA library with the insert of S27 yieldedseveral more clones with insert sizes of either 0.6, 1.2 or1.8 kb (R314, R303 or R301) (Figure 1). All these clonesincluding S27 did not extend beyond an internal EcoRI siteof a3(VI) mRNA ( - 2.9 kb from 5' end) which apparentlywas not well protected by EcoRI methylase. In order toovercome this obstacle, we constructed another cDNAlibrary using EcoRI adaptors instead of linkers, therebyeliminating the subsequent EcoRI digestion step. Screeningof this library with a 0.4 kb EcoRI-PstI fragment from the5' end of S27 yielded 30 cDNA clones (FO series) (Figure1) with inserts in the range 2.0-5.5 kb and containingseveral internal EcoRI sites. Together all these clones cover-9.9 kb of a3(VI) mRNA (Figure 1).Restriction enzyme mapping and/or sequence analysis of

the various cDNA clones showed that some of themcontained one or two deletions of 501 and 597 nt respectively(Figures 1 and 2). Since identical deletions were found inat least two independent clones they do not represent artifactsof the cDNA libraries but rather indicate alternative splicingof a3(VI) mRNA (see Discussion).

Nucleotide sequence of a3(VI) chain mRNA anddeduced amino acid sequenceThe various cDNA clones allowed us to determine acontinuous sequence of 9930 nt with an open reading frameof 8904 nt (Figure 2). The open reading frame is preceededat the 5' end by a short non-coding region (255 nt) and isfollowed at the 3' end by another 759 nt non-coding regionwith several stop codons in each reading frame and a shortpoly(A)+ tail. A typical polyadenylation signal (AATAAA)starts 21 nt upstream of the poly(A) tail. The deduced amino

01l

E 12213)

C1

,,,0.050 11(2211)

CDU) ~~~~~~~~~~~~~~~~~~~(2277(1

(2289)

acid sequence indicates a short signal peptide (25 residues),a large N-terminal (1804 residues) and C-terminal globule(803 residues), which are connected by a short triple helicalsegment (336 residues) characterized by repetitive Gly-X-Y triplets (Figure 2).There is an apparent clustering of some functional amino

acid residues in certain sections of the analyzed sequence(Figure 2). The 29 Cys residues are mainly restricted to thetriple helical and C-terminal globular segments with somemore found in the proximal and more distal portions of theN-terminal globule. A similar unequal distribution apparentlyexists for Asn residues present in nine N-linked glycosyla-tion sites. The five putative RGD cell binding sites areexclusively localized within the triple helical segment.An interesting feature of the 3' non-coding region is its

high AT content (72%) in the last 200 nt. In particular, longruns of Ts and a motif of one A followed by three or moreTs occur frequently in a region upstream of the poly(A)+tail (overlined in Figure 2). Similar sequence motifs havebeen found in the 3' non-coding regions of many transientlyexpressed mRNAs, e.g. those coding for oncogenes,cytokines and lymphokines (Shaw and Kamen, 1986; Caputet al., 1986) and were implicated as a recognition signal forthe selective degradation of these mRNAs.

Sequence of peptides obtained from the globulardomains of a3(VI) chainAbout 50% of the triple helical sequence of the a3(VI) chainwas previously supported by Edman degradation of peptides(Chu et al., 1987, 1988) and has allowed the unequivocalidentification of cDNA clones. Additional support wasobtained from a globular segment NC1 (Odermatt et al.,1983; Kuo et al., 1989) which showed a characteristic HPLCprofile of its tryptic peptides (Figure 3). Most of the major

Elution time, min

Fig. 3. HPLC separation of tryptic peptides obtained from fragment NC1 which corresponds to the Cl domain of c3(VI) chain. The startingpositions (see Figure 2) of identified sequences are denoted on top of individual peaks. The digest was separated on a C18 column equilibrated in0.1% trifluoroacetic acid which was eluted by an acetonitril (B) gradient (0-95%) as indicated by the dashed line.

388

Page 5: Mosaic structure of globular domains in the human type

Structure of globular domains in type VI collagen a3 chain

peaks were sequenced and shown to fit exclusively to asection of the C-terminal globule within positions 2140-2365 (Figure 2) which is in agreement with the size estimatefor this NC1 fragment. Sequence data for the N-terminalglobule were obtained from 14 small peptides obtained fromsome other collagenase-resistant peptides with Lys-Cprotease and from pepsin solubilized collagen (Chu et al.,1987, 1989). The peptide sequences fitted within positions341-1804 of the sequence (Figure 2). Thus, - 20% of thecDNA sequence was confirmed for the globular domainsavailable in the form of large fragments.

Comparison of amino acid sequences within distinctdomains of a3(VI) chainSequence comparison of both globules for internal repeatsand in comparison with other proteins revealed a numberof interesting features, indicating that the globules of a3(VI)chain are organized in 14 different domains referred to asN9 to Nl and C1 to C5 (Figures 1 and 7). Domains N9to N2 found in the N-terminal region are of rather uniformsize (-200 residues) and show 31-40% identity (Figure4). These domains show a distinct similarity to the three Adomains of von Willebrand Factor (Titani et al., 1986; Titaniand Walsh, 1988) and two similar domains found in a

cartilage matrix protein (Kiss et al., 1989) which share17-25% identical residues. As discussed previously (Chuet al., 1989) some integrins and complement componentspossess similar domains.Three more domains, NI, Cl and C2, which are more

variable in size (184-248 residues) are located directly atthe N- and C-terminal sites of the triple helix. They showa lower internal identity and identity to domain N2 of - 21 %(alignment scores 2.1-5.8 SD; Table I) but are still ofdistinct similarity to N9 to N2. Their identities to the Adomains of von Willebrand Factor and cartilage matrixprotein are, however, rather low (11-14%). These threedomains occupy positions in a3(VI) chain corresponding tothe whole globular segments of the al(VI) and a2(VI) chains(Bonaldo et al., 1989; Chu et al., 1989; Koller et al., 1989).A comparison of these regions in all three chains (Table I)shows identity in the range 18-42% with a large variationof alignment scores (0.9-19.3 SD). The highest scores wereobserved in the comparison of analogous domains of a 1 (VI)and c2(VI) chains while the corresponding domains ofa3(VI) were distinctly less similar. A particularly low andprobably insignificant score (0.9-2.2 SD) was obtained forthe NI domains. Among the comparisons with other proteinsa remarkable 32% identity was observed between a central

N N T G SV NS A L G L A NA N V GKT NA N L VG Q -E G V R S - GQ S A G P E -E GVR P D GI N F R R D SS R L S E A E

FFFIFIFFFFIF

A V I L D F L V N L L E K L P I G T Q Q IN A I Rk D F I A K V I Q R L E I GIQ D L IPYVRD F I M NLWMN S L DIGJNDNIP V V ID F L Y K I I D E L N V K P E G TPLLK E FJfQ R V E S LD V Q D R VQ Y V[1T L I E R L V]D Y L D VIGIF D T TA H I [Rjl S R I|V|R R LI NM P S K VQE- VL R F S E I V D T V Y E DE V L K A Wyj V D N N E R L R S Q K W -

- - - - vI AI

Q

IRIIVI LJVMIQIRL¶]T

RVAVVEQ- |E-|-- f-g-EVGjy

F IJ[f E PR T NF S[D T Y ST K A Q M[UG V - K AILIG®W WE LAN IA R - -

rAIDITVRP FIYFNTHPTIKIRE [I TIAJV- RKNKPLD nSALLY YSiGS[F DIOITPVTEFSEN]NTYQT|K|SDI MGHL-RQ IiQ Q Q GjGSGL[Wf|lSIA|[MISIIIDD V K V E S R F D E H Q SIKIP E I J N L V - K R N K I TIGI- K A LINILIGIYIAFS DDR T RPKEFL[Y N S Y M NIKlQD -V N T V - R Q RTL LKG GRP T PIN|E3|GA|AFS RITPDP K AIEIF L MLMNA H S S QODEIV NIAIV - Q RILIRP KG|G|R Q IINIVIG KNAIFISIN D V F PEIFYILIK T Y R S Q A P'V DIAII - R R L R L RRGIGIS P L TN JGJK A

N S D P T D JEFFL K D F S TKR Q I I DQIIDI - N K V V Y K GGLUR H A MTK V GH g G S H A Y I G IL K D R K R P S E - E R R I A S Q V K Y A LG S Q V A S S E V

L D FVNV EW H FNLID F VIR N NILIFIL S Y VYY A N H FLID Y A Q R Y IFL IE F V L R NWI LL E YIVIS RIN|I F

LIE F IA RINIL FlLIEHLRVIR JHIFILK Y T - - - LE

T R A G - S1T S S A -IGIT E A G - IGI1V K - S A G||S|V S - S A IGIISIK R - P LGA SVK - SAPG1 SV P - E A lQ I F S K I Di

VJE G1R]JPVIM II S A P DS E I R Y G VAAIElI[IPIKL LVIL I TG G K S L D E I S Q P AI R EHVP QIL LLLL T A G Q S E 0DS Y L Q A A

IT G V Q LWIIV L T A D R SSG D D - V R N PI E G V P Q F L I S S [K SOD D|E - V V V Pj G V P Q H LIJ V L G G K QDO- V S R FL D Q R lJ - IA FVI TjG K S V E D A Q D VP[flA S R I A LE L N A S Q E P Q R N S R N F V R

V A -[KQ A SE1F SFG LGA QQ E -ILlIR S S I M AIF IIGN KN A -ILIT R A G]I L TIFF)VG A SA S NM[KQ S G|1V P F A KS VV V IKIR GIGIA V P IG Ij I GA V EJI]JQ FIG|EA PEfjT I A RNA Q V I R S SIGII V S L G VI OD RS L A T Q RIGI K VL[fIA VIGIV RY Q G[E K K K I V I P VIJI G

A A S R A E Q H I A T D D N L TL[JP E R S F G DOrQ K L P Y -[V GG A PJQ A EIL E E I A F D S S L V I P A E FI R A A[P L G N - - L P G L L A PQIAIN K AIEIILIE QIIA P S L[Y L MD D FS SLIP A LP Q Q -|LII Q P L T TN AOP A EI&E QIIV L S P A - - FI L A A E - SLPK I G D - -L H P QIV NN I T M Q TLL S F I P D - -F AVi A I P T FR -LI G T - V Q V -IIS EAO Q EELVKISLSPE Y - FSVS T FR E S QK L T P U T TN I IR T WEE]QT T N D P R L IF T VR E FR ELP N I EIE|R I M N S F G P SN I SUENV G K A S S A T A FJR LV G N V Q E S E IS EQ V E3 E T F D D AP H A N L K Q I R L I E K Q A P E N K A F V L S S V D E - U Q Q R D E -WfV S

V A Q R H I VL Q G M L P GY V S G G V EL L K S V H NR V T Q L T RL T S E Q I QA A T P A P PI D T L - -

Y L(D L A P

Fig. 4. Alignment of the sequences of domains N9 to N2 of a3(VI) chain and comparison with the Al domain of human von Willebrand Factor(vWF-A1). Residues which are identical in at least five out of the nine compared sequences are shown in boxes. Odd Cys residues of N9, N7 andvWF-Al not found in the other sequences are shown by circles. The sequence of vWF (Titani et al., 1986) was taken from published data. Positionnumbers refer to those shown in Figure 2.

389

N9N8N7N6NSN4N3N2vWF-Al

9210404602794998

12011404512

S AK RIKIRKRE KF RA AKAL L

I II L

VIVI

Vj L

_ILV

GGGGGLGIS

IVLFLIILL

0

DD

0000

N9N8N7N6N5N4N3N2vWF-Al

55256450647839104312471450558

N9N8N7N6N5N4N3N2vWF-Al

102305499696888109212961499604

N9N8N7N6N5N4N3N2vWF-Al

150353547744936

114013441547655

L K PL L AE V PG A PE E LK L LG V D

E A P

ssssssssLS

Page 6: Mosaic structure of globular domains in the human type

M.-L.Chu et al.

Table I. Sequence comparison of the 200 residue domains N2, Ni, Cl and C2 of cx3(VI) chain with each other and with homologous domains ofa2(VI) and cal(VI) chain

% identity (alignment score) in comparison withDomain a3(VI) a2(VI) c I (VI)

NI C1 C2 NI C1 C2 NI C1 C2

a3(VI)N2 21(2.8) 20(5.7) 21(3.0) 23(4.4) 27(5.1) 21(6.0) 22 (3.9) 28 (6.5) 21 (4.4)ce3(VI)N1 - 21(5.8) 21(2.9) 18(2.2) 21(4.4) 19(4.6) 20 (0.9) 19 (2.3) 23 (4.8)ot3(VI)CI - - 22(2.1) 21(1.9) 25(5.7) 21(5.6) 21 (1.3) 21 (5.9) 19 (4.2)cO3(VI)C2 - - - 19(1.3) 21(8.1) 19(5.0) 17 (1.8) 25 (6.8) 19 (7.3)ox2(VI)Nl - - - - 26(4.8) 26(4.3) 35(14.2) 24 (5.8) 18 (3.5)cx2(VI)Cl - - - - - 21(6.4) 26 (5.4) 42(19.3) 24 (6.2)ct2(VI)C2 - - - - - - 19 (3.2) 17 (5.5) 22(10.6)al(VI)N I - - - - - - 29 (8.6) 24 (6.5)a I (VI)C1 - - - - - - 23 (6.7)

Determined with the program PIRALIGN and expressed as % identical residues and alignment scores given in SD units. Values of 3.09, 4.75, 6 and9.26 SD correspond to probabilities of 10-3, 10-6, 10-9 and 10-20 respectively, for being similar by chance. Sequence data were taken fromFigure 2 and from Chu et al. (1989) for the human ctl(VI) and oe2(VI) chains.

2754 V V K L R E V Q V F E EN A K L HE R E P P - G PYF1326 G L D S P T G I D F S DA LF T V I ALR A T I T GWR2803 N L T V T D R V I GG [L A1Q T[H A VMC Y L R S Q V R A T1376 P H S R N S I T L T NT P T E Vy S I A L N G R E E S P L

Y D L T V T S A H D Q S L V L K QI R H H P E H F S GR P R E D R V

*

Y H[S FL I Q Q

C5 2878 IK PKIK E T[ D L K IE] DPM6 NS[S 1E W YGGCIGMNE MN FGS E E K[ICHUTI 82 C IEV I IR GPC R IQ L A FD V KG C VL FIPIYIIGIGCGICQGIINIIKFFYYE K CIR E YICIHAP 291 C S A E G R M I S f V T E C P F F Y G G CG|G MRI FD E YFCIJA [JCHLCI 125 C F E E P I C EJ Y I TIRR YYT N Q Q C E F K Y G G C L G MN NINIF E T L EE C K N I CAP 40 C L BP YEIG C K LR I I Y F Y K A L CQ TLF V _ GSR A K§N NE K ALDgE R T

C:C1 2282 S N T P T RAnSP Q L R E |ijKjSD A I fP L F L T R Q E R Q L I N A L QIN N T - V GACT 126 gET F N V P Y V A I Q UVUS Y A SSR T G I V L D S G G V T H N V P Y E G Y L PC1 2331 nu L V - P - 16j F E N iACT 175 O I R jD L JUjI2Y NMK I

Fig. 5. Comparison of the C-terminal globular domains Cl, C4 and C5 of cs3(VI) chain with homologous protein sequences. (A) FN denotes theninth type Ill repeat in human fibronectin (Komblihtt et al., 1985). Asterisks above the sequence denote residues which are almost invariant in thetype III repeats found in human fibronectin and chick tenascin/cytotactin (Kornblihtt et al., 1985; Jones et al., 1989). (B) Comparison with bovineaprotinin (AP) (Kassell and Laskowski, 1965), the human urinary trypsin inhibitor (HUTI) (Wachter and Hochstrasser, 1981) and similar domainsfound in a human amyloid protein (HAP) (Ponte et al., 1988) and a lipoprotein associated coagulation inhibitor (HLCI) (Wun et al., 1988). Anasterisk denotes R or K within the active site. (C) Comparison of the C-terminal end of Cl with an internal sequence of human ca-actin (ACT)(Hanauer et al., 1983). A 14 residue sequence motif found in several actin binding proteins (Tellam et al., 1989) is underlined. Identical residues areboxed; numbers refer to the positions of Figure 2 and those in the compared protein sequences.

segment of a-actin (Hanauer et al., 1983) and a 78-residuesequence located at the C-terminal end of the a3(VI) chainCl domain (Figure 5C).

Three other domains C3, C4 and CS are predicted for themost distal sequence of the a3(VI) C-terminal globule. Theylack any relationship to the other globular domains of thea3(VI) chain and are not found in the al(VI) and a2(VI)chains. The 122-residue domain C3 shows a uniquecomposition with Lys, Pro, Thr, Ala and Val contributingto 90% of the sequence. This domain is characterized byrather invariant 22-fold repeats of the Lys-Pro sequence (KPrepeat) which are separated from each other by two to fiveother residues (Figure 6). A similar (37% identity), but

somewhat larger domain with less variable repeats (spacingsmainly Thr-Thr-Thr), is found in the salivary glue proteinsgs-3 of Drosophila (Garfinkel et al., 1983). Furthersignificant identities (22-31 %) were found for severalhuman proline rich salivary proteins (Ayen et al., 1984;Kauffman et al., 1986), calpain inhibitor (Emori et al., 1987)and several HI histones (Golas and Wells, 1985; Ohe et al.,1986). Such identities may, however, reflect the uniformsequence rather than structural or functional relationships.A 89 residue segment C4 directly adjacent to C3 has

features typical of type III domains originally detected infibronectin (Peterson et al., 1983). This identification ismainly based on the preservation of size and several invariant

390

A:C4FMC4FN

B:

Page 7: Mosaic structure of globular domains in the human type

Structure of globular domains in type VI collagen a3 chain

S P T S NP

V T T T T K P

A P A K P

P V A VIK P

P A A A AIK P

A A TIK P

V T T TKP

V T I I N QP

V A A K P

A T A A K P

V A T K P

A T T K P

V T T T K P V T

S V K P

V A T KT

V A A KP

E V P R P

Fig. 6. Internal alignment of the repetitive sequence present in domain C3 of the cO3(VI) chain. The alignment in segments of 20-25 residuesfollowed a similar one for human proline rich salivary protein (Ayen et al., 1984) keeping the most conserved KP repeat (shown in boxes) ininvariant positions. The numbering is different to that of Figure 2.

0 300 600 900 1200 1500 1800 2100 2400 2700 3000 a aI~ ~~~~~~II I I I Il1 -I

NH2

Cnlirinn

p'~~~~~~~~~~~~Nl TH C 1 C 2

3C 6C 3C

2C 2C 6C .C

N9 N8 N7 N6 N5 N4 N3 N2 Nl TH Cl C2 C3C4 C51 I I IpI p T2 SC 5I C 6C

JIlJ I llyeIKhomology

- COOH

mxl

oa2

oa3

,,, N,

vWF-A. integrin, CMP, complement collogen internal, KP III APactin repeot

Fig. 7. Schematic diagram of the mosaic structure of the human O3(VI) chain which consists of N-terminal globular domains N9 to N1, a triplehelical segment (TH) and C-terminal globular domains Cl to C5 and comparison to the domain structures of the c (VI) and c2(VI) chains. Shorthorizontal lines indicate small connecting segments. Localization of N-glycosylation sites (0) and cysteine residues (vertical lines) with additionalnumbers when clustered are indicated. Most obvious homologies to other proteins are indicated, including A domains of von Willebrand Factor(vWF-A), cartilage matrix protein (CMP), type III domain of fibronectin (1I1) and aprotinin (AP). KP repeat refers to a repetitive sequence similar tothose of salivary proteins. The domain organization of human al(VI) and ce2(VI) chains was reported by Chu et al. (1989). The two splice sitesrefer to those shown for several clones in Figure 1. The scale of amino acid numbers (aa) is shown above.

amino acid residues (Figure 5A) since the identity (18%)to the best fitting domain of human fibronectin (Kornblihttet al., 1985) is rather low. The identification of severalsimilar type III domains in the extracellular matrix proteintenascin/cytotactin has been based on analogous considera-tions (Jones et al., 1989; Spring et al., 1989).The last 70 residue segment of the C-terminal globule

(domain CS) shows six cysteines located within 51 residuesand 42% identity (Figure SB) to a comparable segment ofthe trypsin inhibitor aprotinin (Kassell and Laskowski, 1965).Even higher identities up to 50% were found for some relatedproteins, e.g. human urinary trypsin inhibitor (Wachter andHochstrasser, 1981) and a lipoprotein associated coagulationinhibitor (Wun et al., 1988) among many other serineprotease inhibitors (Laskowski and Kato, 1980).Homologous proteins include also a recently describedamyloid protein (Ponte et al., 1988; Tanzi et al., 1988;Kitaguchi et al., 1988). The identity to a3(VI) domain CSis not restricted to just the Cys residues but involves severalmore invariant Gly, Asn, Tyr, Phe residues (Figure 5B) anda basic amino acid in the active inhibitor (P1) site (position11 of CS in Figure SB, corresponding to position 1S inaprotinin).

Discussion

The whole amino acid sequence of the mature humancollagen a3(VI) chain (2943 positions) predicts a molecularmass of - 340 kd (including some glycosylation). This isa 30% higher mass than estimated previously from electro-phoretic analyses (Engvall et al., 1986; Trueb andWinterhalter, 1986; Colombatti et al., 1987). It couldindicate a low accuracy of the electrophoretic assay as foundbefore in similar molecular mass estimates for fibronectin(Kornblihtt et al., 1985; Ruoslahti, 1988) or limitedprocessing of the a3(VI) chain. Results of Edmandegradation (Figure 2) demonstrated the presence of globulardomains NI to N8 and Cl (Figure 7) in the tissue form oftype VI collagen. The N-terminal domains very likelycorrespond to a 160 kd fragment released from the a3(VI)chain by collagenase (Trueb and Winterhalter, 1986), whilethe same study failed to identify a 90-100 kd fragmentwhich could correspond to the whole C-terminal globule.These findings suggest processing of the C-terminal globuleincluding the release of the putative inhibitor domain CSdespite the failure to show substantial processing in cellculture (Colombatti and Bonaldo, 1987). No such processing

391

T T TK P

A A A K P

A T V P

A A V R P

Q A AP

25

48

68

90

112

122

Page 8: Mosaic structure of globular domains in the human type

M.-L.Chu et al.

has been found, however, for the a l(VI) and a2(VI) chain(Chu et al., 1989).

Further differences between the three chains of type VIcollagen exist in their sizes (Figure 7). They share a triplehelical segment of identical length and three similar globulardomains, Ni, Cl and C2. The similarity of these domainsis larger between a 1(VI) and a2(VI) chains when comparedwith the a3(VI) chain (Table I). Assuming that all threechains originated from a common ancestral precursor, thisobservation suggests an early divergence of the O3(VI) fromthe ctl(VI) and a2(VI) chains during evolution. This couldbe reflected by a different chromosomal localization in thehuman genome being 2q37 for the a3(VI) and 21q223 forboth ca I(VI) and a2(VI) chains (Weil et al., 1988). Yet, theother eight domains, N2 to N9, show a higher internalhomology compared with N1, Cl and C2 and may haveevolved from a more recent multiplication event.These various globular domains shared by all three chains

show distinct homology to the A domains of von WillebrandFactor for which binding to collagen, heparin and cell surfacestructures has been reported (Titani and Walsh, 1988). Thishas raised the possibility that the similar type VI collagendomains have a functional role in the assembly of microfibrilsand their heterotypic binding to large collagen fibrils andbasement membrane collagen (Bonaldo et al., 1989; Chuet al., 1989; Koller et al., 1989). Their abundance in thea3(VI) chain, however, also indicates that not all of themmay have binding activity and alternatively might serve asspacer elements.Three more domains unique to the a3(VI) chain, C3 to

C5 (Figure 7), complete the mosaic structure of type VIcollagen and could be important for different functions. TheKP repeat C3 may, as in other salivary proteins (Beckendorfand Kafatos, 1976; Korge, 1977; Garfinkel et al., 1983),be utilized for O-glycosylation. Another 31 residue segmentrich in Pro, Thr and Ser (see Figure 2, positions 2842-2873)preceeding C5 could also be used for the same purpose. Thetype III repeat of C4 is found 15-17 times in fibronectin(Kornblihtt et al., 1985; Ruoslahti et al., 1988), where theyform rod-like elements with binding sites for heparin andcells. The characteristic cell binding sequence RGD is,however, not found in C4; rather it is present in the triplehelical segment of all three type VI collagen chains (Figure2) (Chu et al., 1988; Aumailley et al., 1989).The best functional prediction, based on sequence

comparisons, exists for the inhibitor-related domain C5. Thepresence of an Arg in its presumed active site indicates thatthe domain, if active, should bind to proteases with trypsinlike specificity. A similar inhibitor motif present in apredicted amyloid precursor was shown recently byrecombinant technology to be active against trypsin(Kitaguchi et al., 1988). Most inhibitors of this type areexpressed as small proteins containing no extra domains(Laskowski and Kato, 1980), underscoring the possibilitythat CS could be released from type VI collagen in tissues.These possibilities can be studied now by using recombinantproducts and antibodies raised against them.A further variability of human the a3(VI) chain structure

was suggested by Northern hybridization which showed twoto four bands in the size range 8.3 -9.2 kb (Chu et al.,1987). The upper value could also be an underestimatebecause of the predicted maximal size (9.9 kb) shown here.The variations could be due to the use of different poly-

adenylation signals, yet three independent clones (FM 12,FM20 and FM22) showed the same 3' non-coding regionof 759 nt, with only one typical polyadenylation site(Figure 2). Other, more 5' clones show one or two deletionsnot seen in other clones (Figure 1). These deletions eliminate167 amino acid residues of domain N3 and almost entirelydomain N9. This suggests alternative splicing of the a3(VI)mRNA. Structural variants which differ in the C-terminuswere also predicted for the a2(VI) chain (Chu et al., 1989).They were shown to arise by the mutually exclusive use oftwo exons encoding domain C2 and the 3' non-coding region(Chu et al., 1990). No evidence exists so far for similarvariability at the 3' end of the al(VI) and a3(VI) chainmRNA. The functional consequences and tissue specificityof such variations remain to be determined.

Materials and methodsIsolation of cDNA clones and nucleotide sequencingTotal RNA was isolated from human fibroblasts (GM 3349) (Coriell Institutefor Medical Research, Camden, NJ) by acid guanidine thiocyanate-phenol-chloroform extraction. Poly(A)+ was then selected on anoligo(dT)-cellulose column (Aviv and Leder, 1972). Three XZAP cDNAlibraries were prepared from 5 jig each of poly(A)+ RNA using eitheroligo(dT)12-18 or mixed hexanucleotides (Pharmacia, Piscataway, NJ)as primers by established procedures (Gubler and Hoffmann, 1983) providedin cDNA synthesis kits (Bethesda Research Laboratories, Gaithersburg, MDand Pharmacia, Piscataway, NJ). The double stranded cDNA was sizeselected by chromatography on a Bio-Gel A-50m column, ligated with EcoRIlinker or adaptor, and cloned into the EcoRI site of XZAPII vector(Stratagene, La Jolla, CA). An additional primer extension library wasconstructed using as primer an oligonucleotide (19 mer) synthesized accordingto the cDNA sequence (SP308, Figure 1). The specific primer was annealedto the poly(A)+ RNA at 65°C for 30 min, prior to the synthesis of thefirst strand. The unamplified cDNA libraries were screened with 32P_labeled fragments from previously isolated cDNA clones by standardprocedures (Rigby et al., 1977; Benton and Davis, 1977). Following plaquepurification, the positive cDNA clones in Bluescript plasmids were excisedfrom the phage vectors by infecting with single strand helper phage R408,according to the protocol provided by Stratagene.The cDNA clones were sequenced by dideoxynucleotide chain termina-

tion method (Sanger et al., 1977) using [35S]thio-dATP (New EnglandNuclear, Boston, MA) and the modified T7 polymerase (Sequenase kit,USB, Cleveland, OH). Some sequences were determined with fluorescentM13 primers and analyzed using an automated DNA sequencer (AppliedBiosystems, Foster City, CA). The double stranded plasmids were firstsequenced from the 5 and 3' directions using two primers flanking theBluescript vector (Chen and Seeburg, 1985). To obtain the completesequence, the cDNA inserts were digested with appropriate restrictionenzymes and then subcloned into the single stranded M13 vectors. The singlestranded DNA templates were sequenced by using M 13 universal primers(Amersham, Arlington Heights, IL) or specific primers complementary tointernal cDNA sequences, which were synthesized with a DNA synthesizer(Dupont, Coder 300).

Peptide isolation and Edman degradationA 30 kd fragment NC1 was isolated from a collagenase digest of pepsinsolubilized human type VI collagen (Odermatt et al., 1983) and recentlyshown to comprise a portion of the C-terminal globule of a3(VI) chain (Kuoet al., 1989). The reduced and alkylated fragment was cleaved withTPCK-trypsin (2 h, 37°C) and the digest separated by HPLC on a reversedphase column (Chu et al., 1987). Another set of small peptides were obtainedfrom a Lys-C protease digest of collagenase resistant fragments obtainedfrom a 140 kd pool of human type VI collagen chains (Chu et al., 1989).The various purified peptides were subjected to Edman degradation (15-30cycles) in a gas phase sequencer (model 470A, Applied Biosystems) aspreviously described (Chu et al., 1987, 1989).

Comparison of amino acid sequencesThe deduced amino acid sequence was searched in the protein sequencedata bank (MIPSX Data base, F.Pfeiffer, Martinsried Institute of ProteinSequences) for homologous proteins by the FASTP program (Lipman andPearson, 1985). Alignment of two particular sequences was achieved with

392

Page 9: Mosaic structure of globular domains in the human type

Structure of globular domains in type VI collagen a3 chain

the PIRALIGN program (Dayhoff et al., 1983; George et al., 1986) usingthe mutation data matrix with a bias of +6 and a gap penalty of 6. Thealignments were compared with 100 random permutations to obtain thealignment scores. Multiple sequences were then aligned based on these com-parisons.

AcknowledgementsWe thank Loretta Renkart, Cynthia Bohan, Bruce Donaldson and KerryMaddox for excellent technical assistance. The study was supported by theNational Institutes of Health (grants AR 38912, AR 38923, AR 38188 andAR 39740), the Deutsche Forschungsgemeinschaft (SFB 266) and ShrinersHospital for Crippled Children.

References

Aumailley,M., Mann,K., von der Mark,H. and Timpl,R. (1989) Exp. CellRes., 181, 463-474.

Aviv,H. and Leder,P. (1972) Proc. Natl. Acad. Sci. USA, 69, 1408-1412.Ayen,E., Lyons,K.M., McGonigal,T., Barett,N.L., Clements,L.S.,

Maeda,N., Vanin,E.F., Carlson,M.D. and Smithies,O. (1984) Proc. Natl.Acad. Sci. USA, 81, 5561-5565.

Beckendorf,S.K. and Kafatos,F.C. (1976) Cell, 9, 365-373.Benton,W.D. and Davis,R.W. (1977) Science, 196, 180-182.Bonaldo,P., Russo,V., Bucciotti,F., Bressan,G.M. and Colombatti,A. (1989)

J. Biol. Chem., 264, 5575-5580.Caput,D., Beutler,B., Hartog,K., Thayer,R., Brown-Shimer,S. and

Cerami,A. (1986) Proc. Natl. Acad. Sci. USA, 83, 1670-1674.Chen,E.Y. and Seeburg,P.H. (1985) DNA, 4, 165-170.Chu,M.-L., Mann,K., Deutzmann,R., Pribula-Conway,D., Hsu-

Chen,C.-C., Bernard,M.P. and Timpl,R. (1987) Eur. J. Biochem., 168,309-317.

Chu,M.-L., Conway,D., Pan,T.-C., Baldwin,C., Mann,K., Deutzmann,R.and Timpl,R. (1988) J. Biol. Chem., 263, 18601-18606.

Chu,M.-L., Pan,T.-C., Conway,D., Kuo,H.-J., Glanville,R.W., Timpl,R.,Mann,K. and Deutzmann,R. (1989) EMBO J., 8, 1939-1946.

Chu,M.-L., Pan,T.-C., Conway,D., Saitta,B., Stokes,D., Kuo,H.-J.,Glanville,R.W., Timpl,R., Mann,K. and Deutzmann,R. (1990) Ann. N. YAcad. Sci., in press.

Colombatti,A. and Bonaldo,P. (1987) J. Biol. Chem., 262, 14461-14466.Colombatti,A., Bonaldo,P., Ainger,K., Bressan,G.M. and Volpin,D. (1987)

J. Biol. Chem., 262, 14454-14460.Dayhoff,M.D., Barker,W.C. and Hunt,L.T. (1983) Methods Enzymol., 91,524-545.

Emori,Y., Kawasaki,H., Imajoh,S., Imahori,K. and Suzuki,K. (1987) Proc.NatI. Acad. Sci. USA, 84, 3590-3594.

Engvall,E., Hessle,H. and Klier,G. (1986) J. Cell Biol., 102, 703-710.Furthmayr,H., Wiedemann,H., Timpl,R., Odermatt,E. and Engel,J. (1983)

Biochem. J., 211, 303-311.Garfinkel,M.D., Pruitt,R.E. and Meyerowitz,E.M. (1983) J. Mol. Biol.,

168, 765-789.George,D.G., Barker,N.C. and Hunt,L.T. (1986) Nucleic Acids Res., 14,

11-15.Golas,L.S. and Wells,J.R.E. (1985) Nucleic Acids Res., 13, 585-594.Gubler,U. and Hoffmann,B.J. (1983) Gene, 25, 263-269.Hanauer,A., Levin,M., Heilig,R., Daegelen,D., Kahn,A. and Mandel,J.L.

(1983) Nucleic Acids Res., 11, 3503-3516.Heckmann,M., Aumailley,M., Hatamochi,A., Chu,M.-L., Timpl,R. and

Krieg,T. (1989) Eur. J. Biochem., 182, 716-726.Jander,R., Troyer,D. and Rauterberg,J. (1984) Biochemistry, 23,

3675 -3681.Jones,F.S., Hoffman,S., Cunningham,B.A. and Edelman,G.M. (1989) Proc.

Natl. Acad. Sci. USA, 86, 1905-1909.Kassell,B. and Laskowski,M. (1965) Biochem. Biophvs. Res. Commun.,

20, 463-468.Kauffman,D., Hofmann,T., Bennick,A. and Keller,P. (1986) Biochemistry,

25, 2387-2392.Kiss,I., Deak,F., Holloway,R.G., Delius,H., Mebust,K.A., Frimberger,E.,

Argraves,W.S., Tsonis,P.A., Winterbottom,N. and Goetinck,P.F. (1989)J. Biol. Chem., 264, 8126-8134.

Kitaguchi,N., Takahashi,Y., Tokushima,Y., Shiojiri,S. and Ito,H. (1988)Nature, 331, 530-532.

Koller,E., Winterhalter,K.H. and Truieb,B. (1989) EMBO J., 8,1073- 1077.

Korge,G. (1977) Dev. Biol., 58, 339-355.

Kornblihtt,A.R., Umeyawa,K., Vibe-Pedersen,K. and Baralle,F.E. (1985)EMBO J., 4, 1755-1759.

Kuo,H.-J., Keene,D.R. and Glanville,R.W. (1989) Biochemistry, 28,3757-3762.

Laskowski,M. and Kato,I. (1980) Annu. Rev. Biochem., 49, 593-626.Lipman,D.J. and Pearson,W.R. (1985) Science, 227, 1435-1441.Odermatt,E., Risteli,J., van Delden,V. and Timpl,R. (1983) Biochem. J.,

211, 295-302.Ohe,Y., Hayashi,H. and Iwai,K. (1986) J. Biochem., 100, 359-368.Peterson,T.E., Thogersen,H.C., Skorstengaard,K., Vibe-Pedersen,K.,

Sahl,P., Sottrup-Jensen,L. and Magnusson,S. (1983) Proc. Natl. Acad.Sci. USA, 80, 137-141.

Ponte,P., Gonzalez-DeWhitt,P., Schilling,J., Miller,J., Hsu,D.,Greenberg,B., Davis,K., Wallace,W., Lieberburg,I., Fuller,F. andCordell,B. (1988) Nature, 331, 525-527.

Rigby,P.W.J., Dieckmann,M., Rhodes,C. and Berg,P. (1977) J. Mol. Biol.,113, 237-251.

Ruoslahti,E. (1988) Annu. Rev. Biochem., 57, 375 -413.Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA,

74, 5463-5467.Shaw,G. and Kamen,R. (1986) Cell, 46, 659-667.Spring,J., Beck,K. and Chiquet-Ehrisman,R. (1989) Cell, 59, 325-334.Tanzi,R.E., McClatchey,A.I., Lamperti,E.D., Villa-Komaroff,L.,

Gusella,J.F. and Neve,R.L. (1988) Nature, 331, 528-530.Tellam,R.L., Morton,D.J. and Clarke,F.M. (1989) Trends Biochem. Sci.,

14, 130-133.Timpl,R. and Engel,J. (1987) In Mayne,R. and Burgeson,R.E. (eds),

Structure and Function of Collagen Types. Academic Press, Orlando,FL, pp. 105-143.

Titani,K. and Walsh,K.A. (1988) Trends Biochem. Sci., 13, 94-97.Titani,K., Kumar,S., Takio,K., Ericsson,L.H., Wade,R.D., Ashida,K.,

Walsh,K.A., Chopek,M.W., Sadler,J.E. and Fujikawa,K. (1986)Biochemistry, 25, 3171-3184.

Trueb,B. and Winterhalter,K. (1986) EMBO J., 5, 2815-2819.von der Mark,H., Aumailley,M., Wick,G., Fleischmajer,R. and Timpl,R.

(1984) Eur. J. Biochem., 142, 493-502.von Heijne,G. (1985) J. Mol. Biol., 184, 99-105.von Heijne,G. (1986) Nucleic Acids Res., 14, 4683-4690.Wachter,E. and Hochstrasser,K. (1981) Hoppe-Seyler 's Z. Physiol. Chem.,

362, 1351-1355.Weil,D., Mattei,M.-G., Passage,E., Van Cong,N., Pribula-Conway,D.,

Mann,K., Deutzmann,R., Timpl,R. and Chu,M.-L. (1988) Am. J. Hum.Genet., 42, 435-445.

Wun,T.-Z., Kretzmer,K.K., Girard,I.J., Miletich,J.P. and Broze,G.J. (1988)J. Biol. Chem., 263, 6001-6004.

Received on October 9, 1989; revised on November 17, 1989

393