sequence of vibrio evidenceevolution: somerville andcolwell proc. natl. acad. sci. usa90(1993)...

5
Proc. Natl. Acad. Sci. USA Vol. 90, pp. 6751-6755, July 1993 Evolution Sequence analysis of the j3-N-acetylhexosaminidase gene of Vibrio vulnificus: Evidence for a common evolutionary origin of hexosaminidases CHARLES C. SOMERVILLE* AND RITA R. COLWELLt University of Maryland, Center of Marine Biotechnology, Maryland Biotechnology Institute, Baltimore, MD 20742 Communicated by Christian B. Anfinsen, March 1, 1993 ABSTRACT DNA cloned from the marine bacterium Vi- brio vulnificus into Escherichia coli HB101 can hydrolyze chitin oligomer analogs in the recipient. The nucleotide sequence of the cloned DNA was determined and a single long open reading frame of 2541 base pairs (initiation codon through termination codon) was found. The nucleotide sequence predicts a gene product of 847 amino acids and a molecular mass of 94.3 kDa. In vitro transcription and translation analyses indicated a single protein of 94 kDa encoded by the cloned DNA. The gene product hydrolyzes methylumbelliferyl S-D conjugates of chi- totriose, chitobiose, N-acetylglucosamine, and N-acetylgalac- tosamine and has, therefore, been termed a -N-acetylhex- osaminidase. The predicted protein shares a high degree of sequence similarity with the chitobiase of Vibrio harveyi and limited similarity with the a chain of human 13-hexosaminidase. Cluster analyses suggest a common evolutionary ancestor for all known hexosaminidase enzymes, with no detectable rela- tionship, to known chitinases. Chitin, a homopolymer of 81-4-linked N-acetyl-D-glucos- amine (GlcNAc) residues, is highly abundant in nature. Each year billions of tons of chitin are produced by fungi, insects, and crustaceans in both terrestrial and aquatic habitats (1-3). This abundance, along with the establishment of a variety of commercial uses for chitin and chitin derivatives (4), has stimulated research into the isolation and characterization of chitinolytic enzymes from different sources. The complete hydrolysis of chitin is thought to proceed via a two-enzyme system (5, 6). Chitinase hydrolyzes polymers of GlcNAc (particularly tetramers and above with reduced activity against trimers) to chitobiose. Chitobiose is further hydrolyzed to GlcNAc by chitobiase. Studies of prokaryotic chitobiases include those by Soto-Gil and Zyskind, who reported the cloning (7) and sequence analysis (8) of a chitobiase gene from the marine bacterium Vibrio harveyi. Their analyses indicated that the V. harveyi gene product is homologous with human hexosaminidase (8, 9). Wortman et al. (10) described the isolation of chitobiase activity from Vibrio vulnificus. Here we report further analysis of the V. vulnificus gene and characterize the gene product as a ,B-N-acetylhexosaminidase (P-N-acetyl-D-hexosaminide N-acetylhexoseaminohydrolase, EC 3.2.1.52). We demon- strate that known hexosaminidases, including the chitobiase of V. harveyi (8) and the a chain of human 3-hexosaminidase (9), form a phylogenetically coherent group. MATERIALS AND METHODS The insert region of the recombinant plasmid pATW501 (con- structed by the ligation of Sau3A1-cut V. vulnificus genomic DNA to BamHI-cut pBR322, as described in ref. 10) was subcloned in vectors M13mpl8 and M13mp19 (11) and trans- The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. fected into Escherichia coli JM109 (12). Single-stranded DNA was subsequently isolated and sequenced by the dideoxy chain-termination method (13, 14) using the M13 universal primer and custom oligomer primers constructed at the Uni- versity of Maryland Center of Marine Biotechnology. Se- quences were aligned and regions of overlap were identified by using the NUCALN program of Wilbur and Lipman (15). Plasmids pATW501, pATWOAV, pATW6EK, pATW8- H10-B, pATW8H10-C, and pATW6SX (10) were transcribed and translated in vitro with S-30 extracts of E. coli (Amer- sham). Translation took place in the presence of [35S]methio- nine, and the resultant proteins were separated by SDS/ PAGE along with 14C-labeled protein molecular weight stan- dards. Samples without DNA and vector-only reactions were included as controls. The gene product of pATW501 was also tested for the ability to hydrolyze a variety of 4-methylumbelliferyl (MU) conjugates using cultures grown in the presence or absence of chitin. Cultures from Luria-Bertani (LB) agar or LB agar overlaid with colloidal chitin (16) were spread on Whatman 3MM filter paper wetted with solutions of the MU conjugates (17). Hydrolysis of these substrates results in the liberation of 4-methylumbelliferone, which fluoresces under ultraviolet illumination. Where entire gene sequences were aligned for determina- tion of sequence similarity (termed overall sequence similar- ity), no attempt was made to optimize the alignments manu- ally. Thus, data representing percent overall similarity are only approximations. All shorter alignments were optimized by eye (DNA sequences) or by using the Needleman and Wunsch algorithm (peptide sequences; ref. 18) to yield defin- itive similarity measurements. The DM5 version 5.0 (Genetics PC-Software Center, Department of Molecular and Cellular Biology, University of Arizona, Tucson) and DNAstar (DNAstar, Madison, WI) sequence-analysis packages were used to search for open reading frames (orfs), translate nu- cleotide sequences into amino acid sequences, and detect regions of sequence similarity. Dot matrix comparisons were made between the sequences of the chitobiase of V. harveyi, the a chain of human 3-hexosaminidase, and the hexosamini- dase of V. vulnificus, in order to identify regions of protein sequence similarity. The SEQBOOT program (PHYLIP version 3.4, J. Felsenstein, University of Washington, Seattle) was used to generate 20 resampled variations of the V. harveyi chitobiase and human hexosaminidase protein sequences. These bootstrap files were then aligned (18) with the native V. vulnificus and V. harveyi protein sequences, and similarity scores were assigned by Abbreviations: MU, 4-methylumbelliferyl; orf, open reading frame; PIR, Protein Identification Resource. *Present address: Advanced Sciences Inc., P.O. Box 40070, Building 1117, Tyndall Air Force Base, FL 32403-0070. tTo whom reprint requests should be addressed. *The sequence reported in this paper has been deposited in the GenBank database (accession no. L04544). 6751 Downloaded by guest on September 10, 2020

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequence of Vibrio EvidenceEvolution: Somerville andColwell Proc. Natl. Acad. Sci. USA90(1993) 6753-35 -10 0 SD MHt Ala Ser Asp Ile Asp Gln Lys Asp Val Asp Tyr 12-38 M..ALAATMAGGTTACTALAXAMUMTA

Proc. Natl. Acad. Sci. USAVol. 90, pp. 6751-6755, July 1993Evolution

Sequence analysis of the j3-N-acetylhexosaminidase gene of Vibriovulnificus: Evidence for a common evolutionary originof hexosaminidasesCHARLES C. SOMERVILLE* AND RITA R. COLWELLtUniversity of Maryland, Center of Marine Biotechnology, Maryland Biotechnology Institute, Baltimore, MD 20742

Communicated by Christian B. Anfinsen, March 1, 1993

ABSTRACT DNA cloned from the marine bacterium Vi-brio vulnificus into Escherichia coli HB101 can hydrolyze chitinoligomer analogs in the recipient. The nucleotide sequence ofthe cloned DNA was determined and a single long open readingframe of 2541 base pairs (initiation codon through terminationcodon) was found. The nucleotide sequence predicts a geneproduct of 847 amino acids and a molecular mass of 94.3 kDa.In vitro transcription and translation analyses indicated a singleprotein of 94 kDa encoded by the cloned DNA. The geneproduct hydrolyzes methylumbelliferyl S-D conjugates of chi-totriose, chitobiose, N-acetylglucosamine, and N-acetylgalac-tosamine and has, therefore, been termed a -N-acetylhex-osaminidase. The predicted protein shares a high degree ofsequence similarity with the chitobiase of Vibrio harveyi andlimited similarity with the a chain ofhuman 13-hexosaminidase.Cluster analyses suggest a common evolutionary ancestor forall known hexosaminidase enzymes, with no detectable rela-tionship, to known chitinases.

Chitin, a homopolymer of 81-4-linked N-acetyl-D-glucos-amine (GlcNAc) residues, is highly abundant in nature. Eachyear billions of tons of chitin are produced by fungi, insects,and crustaceans in both terrestrial and aquatic habitats (1-3).This abundance, along with the establishment of a variety ofcommercial uses for chitin and chitin derivatives (4), hasstimulated research into the isolation and characterization ofchitinolytic enzymes from different sources.The complete hydrolysis of chitin is thought to proceed via

a two-enzyme system (5, 6). Chitinase hydrolyzes polymersof GlcNAc (particularly tetramers and above with reducedactivity against trimers) to chitobiose. Chitobiose is furtherhydrolyzed to GlcNAc by chitobiase. Studies of prokaryoticchitobiases include those by Soto-Gil and Zyskind, whoreported the cloning (7) and sequence analysis (8) of achitobiase gene from the marine bacterium Vibrio harveyi.Their analyses indicated that the V. harveyi gene product ishomologous with human hexosaminidase (8, 9). Wortman etal. (10) described the isolation of chitobiase activity fromVibrio vulnificus. Here we report further analysis of the V.vulnificus gene and characterize the gene product as a,B-N-acetylhexosaminidase (P-N-acetyl-D-hexosaminideN-acetylhexoseaminohydrolase, EC 3.2.1.52). We demon-strate that known hexosaminidases, including the chitobiaseof V. harveyi (8) and the a chain ofhuman 3-hexosaminidase(9), form a phylogenetically coherent group.

MATERIALS AND METHODSThe insert region of the recombinant plasmid pATW501 (con-structed by the ligation of Sau3A1-cut V. vulnificus genomicDNA to BamHI-cut pBR322, as described in ref. 10) wassubcloned in vectors M13mpl8 and M13mp19 (11) and trans-

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

fected into Escherichia coli JM109 (12). Single-stranded DNAwas subsequently isolated and sequenced by the dideoxychain-termination method (13, 14) using the M13 universalprimer and custom oligomer primers constructed at the Uni-versity of Maryland Center of Marine Biotechnology. Se-quences were aligned and regions ofoverlap were identified byusing the NUCALN program of Wilbur and Lipman (15).

Plasmids pATW501, pATWOAV, pATW6EK, pATW8-H10-B, pATW8H10-C, and pATW6SX (10) were transcribedand translated in vitro with S-30 extracts of E. coli (Amer-sham). Translation took place in the presence of [35S]methio-nine, and the resultant proteins were separated by SDS/PAGE along with 14C-labeled protein molecular weight stan-dards. Samples without DNA and vector-only reactions wereincluded as controls.The gene product of pATW501 was also tested for the

ability to hydrolyze a variety of 4-methylumbelliferyl (MU)conjugates using cultures grown in the presence or absenceof chitin. Cultures from Luria-Bertani (LB) agar or LB agaroverlaid with colloidal chitin (16) were spread on Whatman3MM filter paper wetted with solutions of the MU conjugates(17). Hydrolysis ofthese substrates results in the liberation of4-methylumbelliferone, which fluoresces under ultravioletillumination.Where entire gene sequences were aligned for determina-

tion of sequence similarity (termed overall sequence similar-ity), no attempt was made to optimize the alignments manu-ally. Thus, data representing percent overall similarity areonly approximations. All shorter alignments were optimizedby eye (DNA sequences) or by using the Needleman andWunsch algorithm (peptide sequences; ref. 18) to yield defin-itive similarity measurements. The DM5 version 5.0 (GeneticsPC-Software Center, Department of Molecular and CellularBiology, University of Arizona, Tucson) and DNAstar(DNAstar, Madison, WI) sequence-analysis packages wereused to search for open reading frames (orfs), translate nu-cleotide sequences into amino acid sequences, and detectregions of sequence similarity. Dot matrix comparisons weremade between the sequences of the chitobiase of V. harveyi,the a chain of human 3-hexosaminidase, and the hexosamini-dase of V. vulnificus, in order to identify regions of proteinsequence similarity.The SEQBOOT program (PHYLIP version 3.4, J. Felsenstein,

University of Washington, Seattle) was used to generate 20resampled variations of the V. harveyi chitobiase and humanhexosaminidase protein sequences. These bootstrap files werethen aligned (18) with the native V. vulnificus and V. harveyiprotein sequences, and similarity scores were assigned by

Abbreviations: MU, 4-methylumbelliferyl; orf, open reading frame;PIR, Protein Identification Resource.*Present address: Advanced Sciences Inc., P.O. Box 40070, Building1117, Tyndall Air Force Base, FL 32403-0070.tTo whom reprint requests should be addressed.*The sequence reported in this paper has been deposited in theGenBank database (accession no. L04544).

6751

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

10, 2

020

Page 2: Sequence of Vibrio EvidenceEvolution: Somerville andColwell Proc. Natl. Acad. Sci. USA90(1993) 6753-35 -10 0 SD MHt Ala Ser Asp Ile Asp Gln Lys Asp Val Asp Tyr 12-38 M..ALAATMAGGTTACTALAXAMUMTA

6752 Evolution: Somerville and Colwell

using the PAM250 replaceability matrix (19). The significanceof the similarity of the native sequences was assessed bycalculating a Zboot value essentially as described by Lipmanand Pearson (19), where Zboot = (similarity of native sequences- mean similarity ofbootstrap to native sequences)/(standarddeviation of bootstrap-to-native comparisons).

Searches of the entire National Biomedical Research Foun-dation Protein Identification Resource (PIR, release 30) data-base were performed with the PROSCAN program of DNAstarwith input sequences from the V. vulnificus, V. harveyi, andhuman protein sequences. Three distinct comparisons weremade to the database for each ofthese test sequences. The firstcomparison used the entire region of sequence similarityidentified by the dot matrix analysis (the "Si" region for eachprotein; see Fig. 5), the second comparison was made with asubset of this region (S2), and the third comparison used asubset of S2, which was termed S3. The three regions of eachof the three proteins were compared with the 33,989 entries inthe database plus the V. vulnificus sequence. Each of thesubsequent alignments were individually optimized using theNeedleman and Wunsch (18) subroutine of the DNAstarPROSCAN program, and the sequence files were sorted byoptimized alignment scores. Those sequence files that wereamong the top 10 numerical scores for any ofthe nine databasecomparisons were compiled into a rectangular data matrixconsisting of 114 sequence files and their alignment scoresrelative to each of the test sequence subsets. Addition of thealignment scores relative to the V. vulnificus sequence re-sulted in a data matrix of 115 cases and nine variables.Partitioned, exclusive clusters were identified from the datamatrix by using the KMEANS program of the Systat package,version 5.2 (Systat, Evanston, IL).

RESULTS AND DISCUSSIONThe extent and direction of individual sequence determina-tions and the position of the long orf are shown in Fig. 1. Thenucleotide sequence of the orf and flanking regions, as wellas the predicted protein sequence, is given in Fig. 2. Fig. 3shows results from the in vitro transcription and translationanalyses. A single major protein of =94 kDa was produced bythe insert DNA of plasmids pATW501 and pATWOSX, bothof which direct the hydrolysis ofMU conjugates of chitotri-ose, chitobiose, and GlcNAc. A truncated protein was pro-duced by pATWOAV that had no detectable hydrolyticactivity. Truncated proteins were also produced frompATWOEK, pATW8H10-B, and pATW6H10-C (data notshown), none of which hydrolyzed MU-chitotriose, MU-chitobiose, or MU-GlcNAc. From the sequence and the invitro transcription and translation data, it was concluded thata single protein was responsible for the hydrolytic activitydetected by Wortman et al. (10).Both induced (grown on LB/chitin agar) and uninduced

(grown on LB agar) cultures of E. coli JM109(pATW501)hydrolyzed MU conjugates of N-acetyl-f-D-glucosamine,N-acetyl-,-D-galactosamine, and chitotriose. The samestrain had no detectable activity against MU conjugates of

C S" H' H E K S' S A H HDATW5O1 _

xylose, glucose (a- or P-), or cellobiose regardless of theculture medium. Neither induced nor uninduced cultures ofE. coliJM109 or JM109 transformed with pBR322 hydrolyzedany of the MU substrates tested. V. vulnificus ATCC 27562hydrolyzed the glucosamine and galactosamine analogs only,and only after growth on LB/chitin agar. These data supportthe identification of the gene product as an N-acetyl-l3-D-hexosaminidase. Furthermore, they indicate that the gene isinducible in the native configuration but is expressed consti-tutively in the cloned state.

Fig. 4 shows dot matrix comparisons of the predictedamino acid sequences for the chitobiase gene of V. harveyi(8), the hexosaminidase gene of V. vulnif cus, and the a chainof human f-hexosaminidase (9). Only amino acid identitiesare scored in these comparisons. Fig. 4a represents a match(window size, 10; mismatches allowed, 4) between the V.harveyi and V. vulnificus enzymes and indicates significant,colinear sequence similarity between the two proteins. Amore stringent match (window size, 50; mismatches allowed,18), shown in Fig. 4b, indicates a conserved region commonto the two bacterial enzymes. The dot matrix in Fig. 4cindicates two regions of sequence similarity between thehuman and bacterial hexosaminidase enzymes (window size,20; mismatches allowed, 10). The homologous regions iden-tified in Fig. 4c coincide with the region of strong sequencesimilarity between the bacterial enzymes. Fig. 5 shows anamino acid alignment of all three enzymes which encom-passes part of the region of strong sequence similarity be-tween the Vibrio enzymes and the two regions of sequencesimilarity with the human hexosaminidase. The alignments inFig. 5 are annotated to indicate the degree of"replaceability"between two amino acid residues at analogous positions inthe alignment, based on the PAM250 matrix (19). Positivescores between amino acids in the PAM250 matrix indicatean increased probability that a substitution involving oneamino acid for the other would occur in related proteins,whereas negative scores indicate a decreased likelihood thatsuch a substitution would occur.The alignment of native sequences and bootstrap-

resampled data sets were used to calculate Zboot values asindicated in Materials and Methods. For comparison of thechitobiase of V. harveyi to N-acetyl-/-hexosaminidase of V.vulnificus, Zboot was equal to 48.99, or nearly 49 standarddeviation units above the mean for resampled data sets.Comparing the a chain of human j-hexosaminidase with theV. vulnificus enzyme yielded a Zbot value of 10.22. Thecomparison between the chitobiase and human enzymes gavea Zbot value of 8.43. For randomized data sets, Lipman andPearson (19) found that z values greater than 6 were probablysignificant, and those greater than 10 indicated significantsequence similarity. Researchers using the PAM250 replace-ability matrix typically use z values greater than 5 to indicateprotein sequences of homologous origin (19, 20). The boot-strap technique appears to yield a more conservative estimateof similarity compared to randomized sequences, as Soto-Giland Zyskind (8) reported a value of 19.1 standard deviation

S' S" H S'D H' D X H'I 'I ' l

FIG. 1. Restriction map of pATW501 (10). Thin open boxes represent vector DNA (pBR322). The thick open box indicates the extent ofthe insert DNA derived from V. vulnificus ATCC 27562, and the black portion of the insert indicates the location of the long orf. Arrows underthe restriction map indicate the extent and direction of individual sequence determinations. Positions of restriction sites were determined fromthe nucleotide sequence and supercede the placement of restriction sites determined by Wortman et al. (10). A, Ava I; C, Cla I; D, Dra I; E,EcoRV; H, HindIII; H', Hpa I; K, Kpn I; S, Sph I; S', Ssp I; S", Spe I; X, Xba I.

Proc. NatL Acad. Sci. USA 90 (1993)

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

10, 2

020

Page 3: Sequence of Vibrio EvidenceEvolution: Somerville andColwell Proc. Natl. Acad. Sci. USA90(1993) 6753-35 -10 0 SD MHt Ala Ser Asp Ile Asp Gln Lys Asp Val Asp Tyr 12-38 M..ALAATMAGGTTACTALAXAMUMTA

Evolution: Somerville and Colwell Proc. Natl. Acad. Sci. USA 90 (1993) 6753

-35 -10 0 SD MHt Ala Ser Asp Ile Asp Gln Lys Asp Val Asp Tyr 12-38 M..ALAATMAG GTTACTALAXAMUMTA TGOAAAOTGTC TTATTOOCA CG TT CACTAOCT TAGSCG AIU OCT AC GAT ATC GAC CAA aAa GaT GTT GAC TAT 84

13 Ala Ala Lys Ann Lou Lys Lou Thr Thr Ser Lou Vol Ala Asn Lys Pro Lys Asp Cys Pro Pro Glu Ala Pro Trp Gly Ala Cys Tyr Arg Val Glu Il Asn Lou Glu 4885 OCT OCT AAG aaT TTA AAA CTA ACO ACT AOC CTA OTT OCT AMC AAG OCT AMG GAC TGT COG COG GAA aCG OCT TOO GOC 0cc TOC TAC COT GTT GAA ATC AAC CTA GAA 192

49 Asn Thr Gly Ser Lys Sor Lou Asn Glu Asn Val Glu Ile Tyr Ph. Ser Ser Ile His Arg Thr Lou Gly Sar Lys Ser Glu Glu Ph Lys Val Glu His Ile Asn Gly 84193 AAC ACA GGT AC AAa TCG CTT MAC GAG AMT GTG GAA ATT TAT TTT TCA AOC ATT CAC CGT aCc CTT OGT TOC AAG AC GMa GAG TTT MAa GTT GAG CAC ATC AAC 00c 300

85 Asp Lou His Lys Ie Thr Thr Thr Glu Lys Ph. Lys Gly Lou Lys Gly Gly Lys Thr Lys Ser Ph. Gln Val Asp Ph. Mt Ann Tip Ile Val BSr Asn S.r Asp Ph. 120301 CAT TTG CAC AMA ATC ACA ACA ACT GAA MAA TOC AA OOC TTG AMG OC GOC AMA aM AA TTIC CAAOTTGAT TTC ATG AMC TGG ATT GTG TCT AAC TCT GAC TTC 408

121 Met Pro Asn Tyr Tyr Val Ala Ser Glu is LoU Glu Gly Arg Asn Ile lou Asn Thr Val Pro Ile Asp Ala Val His Ile Thr Glu Glu Val Sor Gly Ph. Thr Thr 156409 ATG OCT AAC TAC TAT GTT GCA AGC GaA CAC TTG GCA GOT COG AC ATT CTA AC ACM GTA CCT ATT GAC OCT GTC CAC ATT ACT GAA GA GTA TCA GOC TTC ACG ACG 516

157 Gly Iie Lys His Thr Pro Asn Gln Lou Lys Arg Thr Ala Asn Asp Lou Lou Pro Ala Ala Thr Ala Thr Thr Arg Tyr Glu Gln Tyr Ser Lys Val Lys Asp Lou Gly 192517 GOT ATT AMA CAT ACT Cca AAT CAA CTT AAG CGT ACT OCG MAT GAC TTA TTG CcaRCA OCT ACa WCA ACT ACT COT TAC GAG CAA TAC TCT AAA GTG AAA GAC CTG WGT 624

193 Ala Asp Ala Val Ser Ala His Ile Lou Pro Thr Pro Lou Glu Thr Ser Val His Glu Gly Ser Iou Asn Ile Ala Gln Gly Ile Asn Ile Val Ser Asp Ala Lou Pro 228625 OCT GAT OCT GTA TcT OCO CAC ATC CTG CCA ACT CCG CTA GAG ACG TCT GTT CAC GAA GOT TCT CTA aaT ATT GCA Caa GGT ATc AMC ATC GTT AGC GAT OCT CTG cCT 732

229 Ala Asp Gln Val Glu Ala Lou Asn Ph. Arg Ph Glu Thr LAu Gly Val Asn Thr Gly Thr Gly Val Pro Val Asn Val Thr Ii. Lys Ala Asp Ser Ser Lys Lys Ser 264733 OCT GAC CAA GTT GRA CG CTG AAC TTT COC TTC GAA ACT TTG GGT GTA aaT ACG GOC ACA GOC GTT OCA GTT MT GrA ACC ATT Aa OCA GAT TCT TCC AAG aAG TOa 840

265 Gly Ser Tyr Thr Lou Asp Val Thr Ser Ser Gly I1e Arg Ile Val Gly Val Asp Lys Ala Gly Ala Ph. Tyr Gly Val Gln Ser Lou Ala Gly Lou Vol Thr Vol Gly 300841 GGT TCT TAC ACT CTG GAT GrA ACT AOC AGT GOT ATT COGA aT GTC GGT GTT GAC A GCA GOT GCOG TTT TAC GOT GTT CAG TCT CTA GCA GOT CTG GTA ACT GTT GOT 948

301 Lys Asp Thr Ile Asn Gln Val Ser Ile Asn Asp Glu Pro Arg Lou Asp Tyr Arg Gly MHt His Het Asp Val Ser Arg Asn Ph. His S.r Lys Glu Lau Vol Ph. Arg 336949 MAG GAC ACG ATT AaT CAG GTG TCT ATC AaT GAT GAa CCT COC CTA GAT TAC COGT GGC ATG CAT ATG GAT GTT TCT COGT AC TTC CAC TOT AMA CA CTC GTA TT COC 1056

337 Ph. Lou Asp Gln Het Ala Ala Tyr Lys Hot Asn Lys Ph. His Ph. His Lou Ala Asp Asp Glu Gly Trp Arg Lou Glu Ile Asn Gly Lou Pro Glu LAu Thr Gln Vol 3721057 TTC CTA GAC CMA ATG OCA CG TAC AAG ATG AAC MAA TTC CAC TTC CAT CTA GCA GAT GAT GAA OGT TOG COT TOG GAa ATT AAC GOT CTT CCA CAA CTA ACA CA GTT 1164

373 Gly Ala His Arg Cys His Asp Vol Glu Gln Asn Lys Cys Het MHt Pro Gln Lou Gly Ser Gly Ala Giu Lou Pro Asn Asn Gly Ser Gly Tyr Tyr Thr Arg Glu Asp 4061165 GOT GCT CAC COT TOC CAT GAC GTA GCAA CAAC AaG TGT ATG ATG CCT CAG TTA GOC TCG GOT OCT GAa CTA oCa MT AMC GOT TCT G0C TAT TAC ACT CGT GAA GAC 1272

409 Tyr Lys Gou Il Lou Ala Tyr Ala Ser Ala Arg Asn Ie Gln Vol Ile Pro Ser Het Asp Met Pro Gly His Ser Lou Ala Ala Vol Lys Ser MNt Glu Ala Ar; Tyr 4441273 TAC aMA GAG ATT CTA OCT TAC OCT AGOC OCT CGT AMC ATC CAA GTA ATC CCT TCA ATG CAT ATG CCG G0a CAC AGT CTG GCOG GCA GTG MAA TCA ATG GAA OCT CGT TAC 1380

445 Ar; Lys Ph. MHt Ala Glu Giy Asp Vol Val Lys Ala Glu MHt Tyr Lou Lou Ser Asp Pro Asn Asp Thr Thr Gln Tyr Tyr Ser Ile Gln His Tyr Gln Asp Asn Thr 4801381 COGT hAG TTC ATG GCA GAA GOT GAT GTT GTT MAA GCA GAA ATG TAC CTA CTT TOT GAT OCT AMC GAC ACA ACT CAG TAT TAC TCA ATT CAG CAC TAC CMA GAC AAC ACG 1488

481 Ile Asn Pro Cys Not Giu Ser Ser Phe Vol Ph. Met Asp Lys Val Ile Asp Glu Iie Asn Lys Lu HRis Lys Glu Gly Gly Gln Pro Lou Thr Asp Tyr His Ile Gly 5161489 ATT AaT CCA TGT ATG GAa TCA AGOC TTT GTC TTC ATG GAC AMA GTA ATT GAT GAA ATC MAT AMG CTA CAC AAG GAA GOT GOT CAG CCT CTA ACT GAT TAC CAC ATC GGT 1596

517 Ala Asp Glu Thr Ala Gly Ala Trp Gly Asp Ser Pro Glu Cys Ar; Lys Met Ph. Val Ala Pro Glu Ser Gly Val Lys Asn Ala Lys Asp Iie Asn Gly Tyr Ph. Jle 5521597 GCT GAT GAG ACA 0CA GOC OCA TOG GOT GAT TCT OCT GAA TGT COT AAG ATG TTC GTA GCC CCT GAA AGT G¢C GTT M AaT GCT AA GAC ATT MAC GOT TAC TTC ATC 1704

553 Asn Ar; Ii. Ser His Iie Lou Asp Ala Lys Gly Lou Thr Lou Gly Ala Trp Asn Asp Gly LAu Ser His Lys Ala Lou Asp Ala Ser Ser LIu Ala Gly Asn Pro Pro 5881705 AAC COT ATC AOC CAC ATC CTA GAT OCT AAA GOT TTG ACT CTT aGT OC TGO AAC GAT 0GC TTA TCT CAT AMG OcA CTT GAT OCA TCT AGT CTA GOCO CA0 T OCA CCT 1812

589 Lys Ala Trp Val Trp Gly Thr Met Ph. Trp Gly Gly Val Asp Gln Tyr Asn Ser Ph. Ala Asn Lys Gly Tyr Asp Val Val Vol Thr Pro Pro Asp Ala Tyr Tyr Ph. 6241813 AAA OCT TOG GTG TOO GOT ACA ATG TTC TOG aGT GOT GTT GAC CMA TAC AAC AGOC TTC OCT AAC MAA G0C TAC CAT GTT GTG GTA ACT COG CCA GAT OCA TAC TAC TTC 1920

625 Asp Met Pro Tyr Glu Asn Asp Pro Glu Glu Ar; GOy Tyr Tyr Trp Ala Thr Ar; Ph. Asn Asp Thr Lys Lys Vol Ph. Ser Ph. MHt Pro Glu Asn Val Pro Ala Asn 6601921 GAT ATG CCC TAC GAG MAT GAC OCA GAA GAG COC aGT TAC TAO TOG OCO ACT COT TTC AAC GAC ACT AAG AAa GTG TTC TCA TTC ATG OCT GAA AMC GTG CCT OCT AMC 2028

661 Vol Glu Tip Met Thr Asp Ar; MHt Gly Ala Lys Ile Ser Ala Thr Thr Gly Glu Lys Thr His Asp Ph. Lou Gly Vol Gln Gly Ala Lou Trp Ser Glu Thr Ile Ar; 6962029 GTT GAG TOG ATG ACT GAC COT ATG tGT MGO AA ATC TCA OCA ACa ACA GGT GAG AAG ACT CAT CAT TTC CTC GOT GTT CM GOT OCT CTC TOO TCT GAA ACA ATA CGT 2136

697 Thr Asp Ala Gln Vol Glu Tyr Met Vol Lou Pro Ar; Met Ile Ala Vol Ala Glu Ar; Gly Trp His Lys Ala Her Tip Glu Gio Glu His Lys Glu Gly Ile Thr Tyr 7322137 ACA GAT OCT CMA GTT GAG TAC ATG GTT CTG OCT CGT ATO ATT GCC GTT OCT CAG CGT GOT TOG CAT AA 0cc TCT TOO GAa GAa GAG CAT AAa GCAA GG ATT ACA TAC 2244

733 Thr Ser Asn Vol Asp Gly His Glu Gly Thr Thr His lou Asn Asp Asn Ile Ala Thr Ar; Asp Ala Asp Trp Ala His Ph. Ser Asn Ile Lou Gly Tyr Lys Glu Mat 7682245 ACT TCA AAC GTT GAC 0G0 CAT GAA GGT ACT ACT CAC CTG AAC GAC AAC ATT OCT ACT COGT CAC OCT GAC TOG GCT CAC TTC TCT AAC ATC CTA 0GC TAC AAa GMA ATG 2352

769 Pro Lys Lou Asp Lys Ala Gly Ile Thr Tyr Ar; Lou Pro Val Lou Gly Ala Vol Il. Lys Asn Asn Il Lou Asp Vol Val Thr Glu Ph. His Gly Vol Ala Ile Gln 8042353 OCT MAG CTA GAT Ma OCA 0GC ATT ACT TAC CGT CTG cCA GTT CTT GOT GCT GTT ATC AA AC AaT ATC CTT CAT GTT GTG ACA GAG TTC CAC GOT GTT GCC ATC CAM 2460

805 Tyr Ser Lou Asp Gly Lys Thr Trp His Lys Tyr Asp Asp Thr Lys Lys Pro Gln Vol Her Thr Lys Ala Lou Val Ar; Ser Vol Ser Thr Asn Gly Ar; Thr Gly Ar; 8402461 TAT TCA TTG GAT G0C AMA ACG TOG CAT AMG TAC CAC GAT ACT aAG MOG CM CAA GTG AGT ACT aAG GCG CTC GTT CGT TCA GTA TCG ACA AAC GOT CT ACa GOT COT 2568

841 Ala Val Glu Vo1 Lou Ala Lys TER TER loop2569 GOG GTT GMa GTT CTA OCT aAG Taa TaM CGa TOAATTGACT AGTGOAGCTT TTTCAATCOG TGTAOTTTCC ATTCTATG 2693

FIG. 2. Sequence of the N-acetyl-(-D-hexosaminidase gene of V. vulnificus and the 3' and 5' flanking regions. Putative control regions areunderlined and include the RNA polymerase binding sites (at -35 and -10), the initiation of transcription (0), and the ribosome binding site[Shine-Dalgarno (SD) sequence]. The predicted amino acid sequence is shown in three-letter code above the DNA sequence. Amino acidresidues are numbered relative to the N-terminal methionine, and nucleotide residues are numbered relative to the transcription initiation site.The position of a putative stem-loop structure downstream of the translational stop codon is indicated by underlining. The stem portions of thestructure are also overlined, and the position of the loop is noted. These data, along with further 3' and 5' flanking sequences, have been enteredin GenBank under accession no. L04544.

units above the mean randomized alignment score for the similarity is also apparent at the level of the nucleic acids. Thechitobiase/human hexosaminidase comparison. overall DNA sequence similarity between the two bacterialFrom the amino acid sequences, it is clear that the two genes was S57.7%, increasing to 67.8% in sequences coding

Vibrio enzymes share a recent common ancestor, and their the S1 regions, 78.8% in the S2 coding regions, and 87.7% in

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

10, 2

020

Page 4: Sequence of Vibrio EvidenceEvolution: Somerville andColwell Proc. Natl. Acad. Sci. USA90(1993) 6753-35 -10 0 SD MHt Ala Ser Asp Ile Asp Gln Lys Asp Val Asp Tyr 12-38 M..ALAATMAGGTTACTALAXAMUMTA

6754 Evolution: Somerville and Colwell

kDa100 ,.92.5

60 -

46 -

30 -

1 2 3 4 5 6

FiG. 3. Autoradiograph from an SDS/polyacrylamide gel of thein vitro transcription and translation products produced frompATW501 and related plasmids. Lanes: 1, [14C]methylated proteinmolecular weight standards; 2, pBR322; 3, pATW501; 4, pATW8AV;5, pATW8SX; 6, no-DNA control. Details of the construction of theplasmids and the position of deletions relative to plasmid pATW501are given in ref. 10.

the S3 comparisons. In addition, the two bacterial genes sharea 33-bp stretch of sequence identity which codes for the aminoacids LDQMAAYKMNK (part ofthe S3 region shown in Fig.5).The similarities between the bacterial genes and the se-

quences encoding the a chain of human ,B-hexosaminidaseare less striking but follow a similar trend. The overallsimilarity between the V. vulnificus and human genes was-49.5%. For those sequences encoding the Si, S2, and S3regions of the enzymes (Fig. 5), nucleotide sequence simi-larities were 52.3%, 60.4%, and 74.1%, respectively. Whilethe two Vibrio genes share a similar codon bias for 14 of 18amino acids (methionine and tryptophan excluded), the hu-man gene and that of V. vulnificus have a similar codon biasfor only 8 of 18 amino acids. Isofunctional enzymes ofdisparate evolutionary origin may share limited protein se-quence similarity at their active centers through convergentevolution, but one would not expect to find convergence atthe level of the gene sequence. This is particularly true whenthe genes can be demonstrated to differ in codon usage bias.In this case, the similarity at the level of the amino acidsequence correlates well with that found at the level of theDNA, arguing against a convergent evolution of the aminoacid sequences. Both protein and nucleic acid sequence datasupport the theory that all three enzymes share a commonevolutionary ancestor. All three proteins catalyze the cleav-age of ,3 linkages between hexosamine residues. The strong

VULNIF PRO250 500

0

'U

IC

.750

S1 SZ S3 S3

v L 'VFRF LDQMA AYKMNKFHFH LADD EGWRLI .. ** ** *-.*** * ***

m SIILDT LDVMA YNKLNVFHWH LVDD PSFPY. I** *** ** *-* .******.

h ALILAT LDQMA AYKMNKLHLH LTDD EGWRL

S2

v EINGLPEL:TQV GAHRCHDVEQ NKCMMPQLGS*E ****PEL

* ESFTFPEL'MRK GSYNPVTHIY TAQDVKEVIE*.***

h EIPGLPEL'TEV GANRCFDTQE KSCLLPQLGS

Siv GAELPNNGSG YYTREDYKEI L

m YARLRGIRVL AEFDTPGHTL S

h GPTTDNFGSG YFSKADYVEI L

FIG. 5. Partial protein sequence alignments of the N-acetyl-(-hexosaminidase of V. vulnificus (v; this study), the a chain ofhumanf-hexosaminidase (m; ref. 9), and the chitobiase of V. harveyi (h; ref.8). Amino acid identities are indicated by asterisks between se-quences. Amino acid residue comparisons that have a positive scorein the PAM250 amino acid replaceability matrix (19) are designatedby double dots, and those comparisons which have a PAM score ofzero are designated by single dots. No indications are made betweenresidue pairs with a negative PAM score. The Si region for eachprotein comprises the entire amino acid sequence shown here. TheS2 and S3 regions are indicated by the dashed and the solid lines,respectively. Whereas the Si region was identified by dot matrixcomparisons of the three amino acid sequences, the S2 and S3regions were chosen by visual inspection of the Si alignments.

conservation of amino acid sequences at and around the sitethat is common to both bacterial enzymes and that is largelyconserved in the human protein suggests that this region maybe important in the formation of the active site.The database comparisons and cluster analyses described

in Materials and Methods were made to test the hypothesisthat the amino acid sequences shown in Fig. 5, particularlythose in the region designated S3, are conserved amonghexosaminidase enzymes. The results of the cluster analysesare summarized in Table 1. These data indicate that the Siand S2 alignment scores are not sufficient to form twoclusters that are significant relative to all three variables.Alignments with the S3 regions do, however, produce twostatistically significant clusters. In this case, the smaller ofthe two clusters contains the two Vibrio sequences, all of thefull-length eukaryotic hexosaminidase sequences in the da-tabase, all of the partial hexosaminidase sequences thatoverlap this region, and no other sequences. No other PIR

VULNIF PRO250 500

l

0

iu

s

250-

500-

750-

750

VULNIF PRO250 500

0a:

z

250-

500-

750

FIG. 4. Dot matrix comparisons between the amino acid sequences of V. vulnificus hexosaminidase (VULNIF PRO), V. harveyi chitobiase(HARVEYI PRO), and the a chain of human 3-hexosaminidase (HUMAN PRO). Parameters (window size/mismatches allowed): in a, 10/4;in b, 50/18; in c, 20/10.

a b

11.1~~~~~~~~~~

N

IC- i

Proc. Natl. Acad Sci. USA 90 (1993)

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

10, 2

020

Page 5: Sequence of Vibrio EvidenceEvolution: Somerville andColwell Proc. Natl. Acad. Sci. USA90(1993) 6753-35 -10 0 SD MHt Ala Ser Asp Ile Asp Gln Lys Asp Val Asp Tyr 12-38 M..ALAATMAGGTTACTALAXAMUMTA

Proc. Natl. Acad. Sci. USA 90 (1993) 6755

Table 1. Summary of cluster analyses based on optimized alignment scores

Clusters Summary statisticsScores clustered* allowed Clusterst Variablet F ratio Probability

115 cases by S1 scores 2 (38,119) + (all others) vulS1 666.176 0.999 x 10-15harSl 583.2% 0.999 x i0-15humSl 0.106 0.745

115 cases by S2 scores 2 (38,119) + (all others) vulS2 169.989 0.999 x 10-15harS2 149.257 0.999 x 10-15humS2 1.295 0.258

115 cases by S3 scores 2 (4, 7, 24, 25, 26, 27, 36, 38, 89, 115)§ vulS3 21.601 0.915 x 10-5+ (all others)l harS3 44.123 0.113 x 10-8

humS3 155.330 0.999 x 10-15*The 115 cases were selected from the PIR database as described in the text. S1, S2, and S3 regions refer to the amino acidsequences indicated in Fig. 5.

tPartitioned exclusive clusters identified by using the KMEANS cluster analysis program of Systat version 5.2.*The abbreviations vul, har, and hum correspond to the V. vulnificus, V. harveyi, and human enzymes, respectively.§Case numbers refer to the following: 4, PIR entry A22081, P-N-acetylhexosaminidase 3-chain fragment, human (21); 7,PIR entry A23842, 3-N-acetylhexosaminidase a-chain precursor fragment, human (22); 24, PIR entry A30153, (-N-acetylhexosaminidase ,3-chain precursor, human (23); 25, PIR entry A30766, 3-N-acetylhexosaminidase A precursor, slimemold (24); 26, PIR entry A31250, 3-N-acetylhexosaminidase (3 chain, human (25); 27, PIR entry A31778, (-N-acetylhexosaminidase A precursor, slime mold (24); 36, PIR entry A36511, 3-N-acetylglucosaminidase precursor, V.harveyi (8); 38, PIR entry AOHUBA, (-N-acetylhexosaminidase a-chain precursor, human (9); 89, PIR entry S01328,3-N-acetylhexosaminidase , chain, mouse (26); 115, f-N-acetylhexosaminidase, V. vulnificus (this study).

iThis cluster contains no hexosaminidase sequences.

entries which contain hexosaminidase sequence data werefound among the 115 cases included in the cluster analyses,due to alignment scores which were too low to be among thetop 10 scores for any of the nine database scans. All hex-osaminidase entries that are not among the 115 cases containfragmentary sequences of human proteins which are repre-sented by full-length sequences in the hexosaminidase clus-ter. It is important to note that the database searchedincluded ceilulases, lysozymes, chitinases, and amylasesamong several other eukaryotic and prokaryotic proteinsequences, and yet hexosaminidase enzymes consistentlyclustered together to the exclusion of all other types. Fur-thermore, a region of only 19 amino acid residues is requiredas a basis for these clusters. While enzymes with hexosamini-dase activity may yet be found which are not related to thosediscussed here, it seems clear that those described to dateform a phylogenetically coherent group.

In conclusion, this study supports the observation ofSoto-Gil and Zyskind (8) regarding the relatedness of thechitobiase of V. harveyi and the a chain of human /3-hex-osaminidase. Furthermore, the hexosaminidase of V. vulnifi-cus shares a high degree of similarity with the chitobiose, andthese similarities have allowed us to identify a region of 19amino acid residues which is largely conserved among knownhexosaminidases. The data suggest that those hexosamini-dase enzymes that have been characterized by gene sequenceanalysis form a phylogenetically coherent group. Finally, thelack of similarity with known chitinases suggests that theclassic two-enzyme system of chitin hydrolysis has evolvedfrom at least two nonhomologous hydrolase ancestors.We thank Dr. Tamar Barkay (Environmental Protection Agency,

Gulf Breeze, FL) for access to the Systat software package. Wethank Dr. Wade Jeffrey (University of West Florida Center forEnvironmental Diagnostics and Bioremediation) for use of theDNAstar system. This work has been supported in part by NationalScience Foundation Grant BSR-9020268 and Environmental Protec-tion Agency Cooperative Agreement CR817791-01.1. Alexander, M. (1977) in Introduction to Soil Microbiology, ed.

Alexander, M. (Wiley, New York), 2nd Ed., pp. 188-202.2. Campbell, L. L. & Williams, 0. B. (1951) J. Gen. Microbiol. 5,

894-905.3. Zobell, C. E. & Rittenburg, S. C. (1938) J. Bacteriol. 35,

275-287.

4. Muzzarelli, R. A. A. (1977) Chitin (Pergamon, Oxford).5. Jeuniaux, C. (1966) Methods Enzymol. 8, 644-650.6. Monreal, J. & Reese, E. T. (1969) Can. J. Microbiol. 15,

689-696.7. Soto-Gil, R. & Zyskind, J. W. (1984) in Chitin, Chitosan and

Related Enzymes, ed. Zikakis, J. P. (Academic, New York),pp. 169-177.

8. Soto-Gil, R. & Zyskind, J. W. (1989) J. Biol. Chem. 264,14778-14783.

9. Myerowitz, R. R., Piekarz, P., Neufeld, E. F., Shows, T. B. &Suzuki, K. (1985) Proc. Natl. Acad. Sci. USA 82, 7830-7834.

10. Wortman, A. T., Somerville, C. C. & Colwell, R. R. (1986)Appl. Environ. Microbiol. 52, 142-145.

11. Norrander, J., Kempe, T. & Messing, J. (1983) Gene 26, 101.12. Yanisch-Perron, C., Vieira, J. & Messing, J. (1985) Gene 33,

103-119.13. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.

Acad. Sci. USA 74, 5463-5467.14. Tabor, S. & Richardson, C. C. (1987) Proc. Natl. Acad. Sci.

USA 84, 4767-4771.15. Wilbur, W. J. & Lipman, D. J. (1983) Proc. Natl. Acad. Sci.

USA 80, 726-730.16. Lingappa, Y. & Lockwood, J. L. (1962) Phytopathology 52,

317-323.17. O'Brien, M. & Colwell, R. R. (1987) Appl. Environ. Microbiol.

53, 1718-1720.18. Needleman, S. B. & Wunsch, C. D. (1970) J. Mol. Biol. 48,

443-453.19. Lipman, D. J. & Pearson, W. R. (1985) Science 227, 1435-

1441.20. Doolittle, R. F. (1986) Of Urfs and Orfs: A Primer on How To

Analyze Derived Amino Acid Sequences (University ScienceBooks, Mill Valley, CA).

21. O'Dowd, B. F., Quan, F., Willard, H. F., Lamhonwah, A. M.,Korneluk, R. G., Lowden, J. A., Gavel, R. A. & Mahuran,D. J. (1985) Proc. Natl. Acad. Sci. USA 82, 1184-1188.

22. Korneluk, R. G., Mahuran, D. J., Neote, K., Klavins, M. H.,O'Dowd, B. F., Tropak, M., Willard, H. F., Anderson, M. J.,Lowden, J. A. & Gavel, R. A. (1986) J. Biol. Chem. 261,8407-8413.

23. Proia, R. L. (1988) Proc. Natl. Acad. Sci. USA 85, 1883-1887.24. Graham, T. R., Zassenhaus, H. P. & Kaplan, A. (1988) J. Biol.

Chem. 263, 16823-16829.25. Neote, K., Bapat, B., Dumbrille-Ross, A., Troxel, C.,

Schuster, S. M., Mahuran, D. J. & Gravel, R. A. (1988) Ge-nomics 3, 279-286.

26. Bapat, B., Ethier, M., Neote, K., Mahuran, D. & Gravel, R. A.(1988) FEBS Lett. 237, 191-195.

Evolution: Somervdle and Colwefl

Dow

nloa

ded

by g

uest

on

Sep

tem

ber

10, 2

020