j. biol. chem.-1984-uhlén-1695-702

Upload: venkata-suryanarayana-gorle

Post on 23-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    1/8

    THE

    OURNALF BIOLOGICAL CHEMISTRY

    0 1984by The American Society of BiologicalChemists, Inc

    Vol.

    259,No.

    3,

    ssue

    of February

    pp. 1695-1702,1984

    Printed in U.S.A.

    Complete Sequence of the Staphylococcal Gene Encoding rotein A

    A GENE EVOLVED THROUGHMULTIPLE DUPLICATIONS*

    (Received for publication, August 4, 1983)

    Mathias Uhlen QlI, Bengt GussQ, Bjorn NilssonSTi,ten Gatenbeck , Lennart hilipsonQII and

    Martin Lindberg **

    From the Department

    of

    Biochemistry, Royal Institute of Technology, S 100

    44

    Stockholm, Swedenand the Department of

    Microbiology, University of Uppsala,

    The

    Biomedical Center, Box

    581,

    S 751 3 Uppsala, Sweden

    The gene coding for proteinA from

    Staphylococcus

    aureus

    has been isolated by molecular cloning, and a

    subclone containing an 1.8-kilobase insert was found

    to give a functional proteinA in

    Escherichia coli.

    The

    complete nucleotide sequence of the nsert, including

    thestructuralgeneand he 5 and 3 flanking se-

    quences, has been determined. Starting from a TTG

    initiatorcodon,anopenreading ramecomprising

    1527 nucleotidesgives a preprotein of 09 amino acids

    and a predicted M

    =

    58,703. The structural gene is

    flanked on both sides by palindromic structures fol-

    lowed bya stretch ofT residues, suggesting transcrip-

    tional termination signals. Thus, it appears that pro-

    tein A is translated froma monocistronic mRNA.

    The sequence reveals extensive internal homologies

    involving a 58-amino acid unit, responsible for IgG

    binding, repeated 5 times and an 8-amino acid unit,

    possibly responsible for binding o the cell wall of

    S.

    aureus

    repeated 12 times. Comparisons between the

    repeated regions show a marked preference forsilent

    mutations, indicating an evolutionary pressure to keep

    the amino acid sequence preserved. The structure of

    the gene also uggests how the gene has volved.

    Evolution by gene duplication is a well known phenomenon

    among eukaryotic genes. The globin clusters, the immuno-

    globulins, and the nterferon genes probably all have ancestral

    genes which have been duplicated and hen diverged into

    functionally distinct genes (1). Examples of internally, repet-

    itive sequences have also been reported; rabbit skeletal tro-

    pomysin contains a 7-residue amino acid periodicity through-

    out the molecule (2), andsimilar repeats have been reported

    for chicken fibronectin (3 ) and mammalian serum albumin

    (4). Among prokaryotes, most reports of duplicated genes

    have involved

    in

    vitro

    constructions (5), which seem to be

    stable in

    Escherichia

    coli, but dramatically unstable n

    Bacillus

    subtilis

    (6).

    However, the amino acid sequences of a few cell

    wall-bound proteins from Gram-positive bacteria have re-

    vealed remarkable periodicity, i.e. staphylococcal protein A

    (7,8) andstreptococcal M protein (9).

    We have earlier reported on the molecular cloning of the

    * The costs of publication of this article were defrayed in part by

    the payment of page charges. This article must therefore be hereby

    marked advertisement in accordance with

    18

    U.S.C. Section 1734

    solely

    to

    indicate this fact.

    T Supported by grants from the Swedish National Board for Tech.

    nical Development.

    Present address, European Molecular Biology Laboratory, Hei.

    delberg, Federal Republic of Germany.

    ** Supported by grants from the Swedish Medical Research Coun-

    cil and Pharmacia Fine Chemicals, Uppsala.

    gene for staphylococcal protein A in

    E .

    coli (10). This protein

    interacts with the F, (constantpart of immunoglobulins)

    domain of several immunoglobulins from many species in-

    cluding man an d has herefore been used extensively for

    quantitative and qualitative immunological techniques (11).

    Amino acid sequence analysis of proteinA revealed two

    functionally distinct regions of the molecule

    (7,

    8). Both

    regions have remarkably repetitive structures.

    The NH2-terminal part contains four or five homologous

    IgG-binding units consisting of approximately 58 amino acids

    each. The COOH-terminal par t which is thought to bind to

    the cellwall of Staphylococcus

    aureus

    consists of several

    repeats of an octapeptide (Glu-Asp-Gly-Asn-Lys-Pro-Gly-

    LYS) 8).

    In a previous report

    ( l o ) ,

    we determined the nucleotide

    sequence of the promoter region, as well as the egion coding

    for

    the NH2-terminal part of the protein. Here we report the

    complete nucleotide sequence of the protein A gene including

    the 5 and 3 flanking regions from the

    S.

    aureus strain 8325-

    4. Thestructural gene is 1,527 nucleotides long giving a

    preprotein consisting of 509 amino acids and a

    M ,

    = 58,703.

    The repetitive structure of the gene has been clarified which

    suggests how the gene has evolved.

    EXPERIMENTALPROCEDURES

    Bacterial Stra ins and Plasmids-E. coli strains HBlOl (12) and

    pBR322 (14), TR262 (15), and pEMBL9 (16).

    GM161 (13) were used as bacterial hosts. The plasmid vectors were

    DNA Preparations-Plasmid DNA was prepared by the alkaline

    extraction method (17). Transformation of E. coliwasmade as

    described by Morrison 18). Restriction endonucleases, T4 DNA

    ligase (New England Biolabs), alkaline phosphatase, and T4polynu-

    cleotide kinase (Boehringer-Mannheim) were used according to the

    suppliers recommendations.

    Isolation of the 2.15-kilobase DNAfragment containing the entire

    protein A gene was made by digesting the plasmid pSPA3

    (10)

    with

    EcoRV. The digested material was electrophoresed on a 5% polyac-

    rylamide gel, and the 2.15-kilobase fragment was eluted electropho-

    retically. The isolated fragment was passed over an anion exchange

    column, eluted, and precipitated with ethanol. The precipitated ma-

    terial was washed in 80% ethanol, dried, resuspended in water, and

    used for DNA sequence analyses.

    DNA Sequencing Determinutions-DNA fragments were se-

    quenced by the method of Maxam and Gilbert (19) or Sanger et al.

    (20). The samples were analyzed on 6, 8, and 20% denaturing poly-

    acrylamide gels using the thermostatic LKB Macrophor system.

    Computer Anulysis-All the sequencing analyses were performed

    on a Hewlett-Packard desktop computer (HP-85) equipped with a

    HP7225A plotter. The software was constructed by M. Uhlen.

    RESULTSANDDiSCUSSION

    D N A Sequence-We have earlier reported that theprotein

    A gene from S aureus strain 8325-4 is located ona 1.8-

    kilobase insert of staphylococcal DNA cloned in the plasmid

    1695

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    2/8

    1696

    D N A Sequence of Staphylococcal Protein

    A

    pSPA8

    \ \

    e t I

    fl

    FIG. 1. Structure of plasmid pSPA8 with relevant restric-

    tion sites. The protein A gene is contained in a

    1.8

    kilobase

    TuqI-

    EcoRV insert in the plasmid pBR322.

    Boxes

    show the positions of

    the replication origin O R 0 and the enes coding forprotein A

    PROT

    A )

    and p-lactamase

    ( A M P ) .

    B=

    0

    kb

    C.

    ToqI

    E c o R V

    Bcl

    P s t I

    HlndI

    I

    S o u 3

    Rea1

    EcoRI

    FIG.

    2.

    Restriction map

    and

    sequencing strategy

    of

    the in-

    sert. A schematic drawing of the gene coding for protein A with its

    different regions.

    S

    is a signal sequence,

    - D

    are IgG-binding regions,

    E is a region homologous to A-D and X is the COOH-terminal part

    of protein A which lacks IgG-binding activity. B , partial restriction

    map of the corresponding DNA sequence.

    C,

    sequencing strategy of

    the 1.8-kilobase insert.

    pBR322 (21) .Theplasm id was designatedpSPA8and s

    shown schematically n Fig. 1. Expression of the gene was

    demonstrated in E . coli. Th e sequence of the prom oter region

    and the 5 ' end of the structural gene has been reported (10)

    as well

    as

    th e sequence of the epetitiv e region

    X

    which

    probably

    is

    responsible for the ell wall binding of the pr ote in

    in S . aureus.

    Using the stra tegy outl inedn Fig. 2C, the entire insert as

    sequenced according to the meth od of M axam and Gilbert

    (19). It was not possible to obtain sequence on both strand s

    in all parts of th e gene, and the refor e additio nal sequencing

    using the enzym atic m ethod (20, 16) was performed in order

    to confirm th e sequence in these parts. As no palindrom ic

    sequence indicat ing transcript ion termination was found in

    the 3' end f th e gene, th e sequence

    a

    few hund red nucleotides

    downstream from the EcoRV si teon he originalplasmid

    pS P Al (10 )was determined using both method s (19,20). The

    complete nucleotide sequence of the prote in A gene is shown

    in Fig. 3 . No te hat he previouslypublishedsequence of

    Lofdahl et al. (10) lacks one of the three thym idines atosition

    183-185.

    Guss, B., Uhlh,

    M.,

    Nilsson,

    B.,

    Lindberg,

    M.,

    Sjoquist,

    J.,

    and

    Sjodahl, J. (1984)Eur.

    J.

    Biochem., in press.

    Start ing from

    a

    TT G c odon

    at

    nucleotide 184, the re i s an

    open reading framef 1,527 nucleotides term inating in TAG

    stop codon at nucleotide 1,711. The prep rotein , ncluding the

    putative sign al peptide, con sists of 509 amino acids giving

    a

    M , = 58,703. Although we have not shown th at th e codon at

    nucleotide 184 is the ran slati on al star t, her e are several

    reasons to postulate this . First , TTG iscommon s tart codon

    in G ram-posit ive bacteria (21), unlike E. coli in w hich it is

    very rare (22).Second, th is sta rt odon gives a putative signal

    peptide with a reasonable size (36 am ino acids) and structu re

    (a few basic residues followed by a stre tch of 23 hydrophobic

    residues). Third, this codon

    is

    preceded by a possible Shin e-

    Dalgarno sequence (23) that has man y features in common

    with other Gram -positive ribosomal binding sequences (24).

    8

    out of 11 nucleotides are complementary to the

    '

    end of B.

    subtilis

    16 S rRNA, similar to other Gram -posit iveenes (25).

    In addit ion, the pace between the lastG in this equence and

    the start codon is sevennucleotides, also sim ilar o other

    Gram -positive genes (24, 25).

    Tw o upstream overlapping promoter sequences similar to

    the consensus sequences (TTG AC A and TAT AA T) of pro-

    karyotes (26) have een indicated in Fig. 3, although the first

    -35 sequence shows relatively poor comp lementarity (only

    three out of six) with TTGA CA. T he gene is both preceded

    an d followed by palindromic sequences indicat ing transcrip-

    tion erminations.Theseare ndicated n Fig.

    3,

    and he

    possible m RNA hairpin structures hat can be formed are

    schematically drawn in Fig. 4. Both palindrom es are ollowed

    by a T-rich stre tch of residues (T TT AT TT T) . Although we

    do not have any experimental data to show where th e tr an -

    scription of the protein A mRN A starts or terminates, it thus

    appears l ikely that protein A is t ranslated from a monocis-

    tronic mRNA.

    Amino Acid Sequence-The am ino acid sequence deduced

    from th e DNA sequence as well as am ino acids th at differ in

    the partial prote in sequence established in Sjodahl (27) are

    also indica ted in ig. 3. Among t he IgG -binding regions D, A,

    B, and C,

    a

    high degree of homology exi sts and only 4 ou t of

    th e 235 amino acids comprising all four regions vary.ll these

    changes can e explained by single point mutation s. Since the

    DNA sequence was obtained from strain 8325-4 and he

    protein sequence rom stra in Cowan I the divergence is

    probablydue to train variation. Th epart ia lamino acid

    sequence of region X also shows high similarityo the educed

    sequence although about 10%of the am ino acids are differ-

    ent. ' The amino acid numb ering starts with the alanine at

    nucleotide 292 which h as been shown to be th e first amin o

    acid of the ma ture pr otein A.' Th e s top codon at nucleotide

    1,711 thu s gives a mature protein A of 473 amino acids a nd a

    resulting

    M ,

    = 52,752.

    Amino Acid Composition-Attempts to deter min e the pro-

    tein sequence of protein A have involved digestion of staph-

    ylococcal cell walls with ly sosta phin (28) or analyzing pro tein

    A rom mu tant bacter ia which secrete the prod uct (8). In

    order to compar e the sequences deduced from the DNA se-

    quence with those obtained experimentally, the amin o acid

    compositions of differen t parts of the pro tein , as deduced

    from th e DNA sequence, are tabulated in Table

    .

    The amino

    acid compositions of purified protein A from differ ent strains

    of

    S.

    aureus are also presented in Tab le I. A direct comparison

    of structu res from deduced an d purified pro teins is difficult,

    due tostrain differences and proteolyticdigestion during

    isolation of the prote in. According to Sjodahl (27) and Lind -

    mark et al. (8), there arenly a few amino acids NH Z-terminal

    U. Hellman, unpublished results.

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    3/8

    D N A

    Sequence of Staphylococcal Protein

    A

    1697

    *

    11

    - 0

    U L

    U L

    am

    a z

    a J

    a -

    a s

    am

    am

    a >

    u a

    a d

    e n

    o a

    m

    n a

    a 3

    am

    a >

    a 1

    am

    a u

    e

    n o

    a

    m r r

    o a c

    - u

    u

    m o v

    c u

    u c

    e d

    e m

    u -

    e u

    r

    u c

    - a

    1o

    m a

    m a

    a -

    am

    a r

    a J

    u n

    am

    o a

    a t

    a c

    U U

    L e m

    u -

    r u

    e

    E

    u -

    ? L

    U T

    C L

    U L

    u >

    0 0

    0 -

    u c

    am

    a a

    am

    ma

    am

    a >

    a J

    u -

    y7

    u n

    am

    ma

    am

    a >

    a J

    I - >

    c u

    U J

    c L

    w u

    am

    a c

    e -

    a -

    ::

    L

    o-

    O W

    u >

    u c

    a a

    m

    u a

    e o

    m i

    ;

    l 4

    B 2

    5

    si2

    >

    am

    a a

    a >

    u c

    e u

    e J

    u c

    e o

    U L

    U L

    W Y

    ern

    u c

    O >

    c u

    e J

    u r

    8 X P

    a s

    am

    a a

    E

    S

    c L

    I-

    u u

    e L

    t L

    a z

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    4/8

    1698

    DNA

    Sequence of Staphylococcal Protein A

    A.

    A-T

    6-C

    T-A

    C C

    C-G

    T-A

    CT-A

    T -A

    A ~ - ~

    T T

    f 5

    T-A,

    5 - . .

    .

    . TAAGCC ' TTTATTTTAT

    ..... -3

    .

    851

    C

    -T, T

    T /T

    A-T

    C-G

    A-T

    A-T

    C

    -G

    G-C

    A-T

    G-CA

    C-G

    A-T

    C -G

    G-C

    T-A

    A-T

    A-T

    A-r

    5 I

    -

    . . .

    ATCATCT/ TTATTTTAC.

    .

    3

    FIG. 4. Hypothetical secondary structures a t the

    5

    and 3

    regions flanking theprotein

    A

    coding sequence. The

    numbers

    refer to nucleotides in Fig. 3.

    of region D in protein A isolated from cell walls of Cowan I.

    However, the exact NHAerminal sequence could not be ob-

    tained due to a blocked terminus (27). Table I shows that the

    size of the deduced protein from 8325-4 is larger than two

    independent determinationsof the protein from Cowan I even

    if

    region

    E

    is omitted (A-E). At present,

    it

    is unclear if this

    difference in size and amino acid composition is due to pro-

    teolysis both in the NH2-terminal and OOH-terminal part s

    of the protein or if it reflects genomic differences. The protein

    A

    gene of Cowan I has recently been cloned in our laboratory,

    which will help to clarify this point.

    Incontrast, t appears likely that he secreted form of

    protein A from stra in A676 does contain region

    E.

    The NH2-

    terminal sequence of this protein

    (8)

    fits

    well with the NH2-

    terminus of protein A from strain 8325-4 when determined

    both by Edman degradation of the purified protein' and by

    DNA sequence starting at nucleotide 292 in Fig. 3. The size

    of protein A from A676 would then indicate that the protein

    is truncated at theCOOH-terminal lacking approximately

    80

    amino acids. The amino acid composition, as deduced from

    the DNA sequence, of a mature protein A lacking 107 amino

    acids in the COOH-terminal part shows good agreement with

    the composition of purified protein A from strain A676 as

    shown in Table I . However, the DNA sequence does not

    contain the COOH-terminal -Val-Ala-Lys which has been

    reported for A676

    (8).

    Codon Usage-The codon usage for the preprotein of pro-

    tein A

    is

    compared in Table

    I1

    with other Gram-positive

    genes. Chromosomal genes are represented by four Bacillus

    TABLE

    Am ino acid comp osition of deduced pro tein A gene or purified protein

    from different strains of. aureus

    Amino acids

    Lysine

    Histidine

    Arginine

    Aspartic acid

    Threonine

    Serine

    Glutamic acid

    Proline

    Glycine

    Alanine

    Valine

    Methionine

    Isoleucine

    Leucine

    Tyrosine

    Phenylalanine

    Total

    Deduced protein A from

    Purified protein

    A

    Prot-A Mat-Ab A-E' A -Xd Cowan

    I

    Cowan

    I'

    A67W

    69 65 62 45 52 53 48

    7 7 6 3 4 4 3

    6 5 4 5 5 4 4

    105 103 915 82 83 82

    10

    7 7 2 5 6 4

    252

    18

    207 16 16

    78 78 67 68 650 64

    31 30 27 24 2767

    33 28 268 30 302

    42 38 31 31 3461

    15 12 10 4 5 8 7

    6 6 5 3

    2 3 3

    18430

    9 121

    4161 29

    2787

    9

    8 7 5 5 4 4

    14

    1424

    1223

    509

    4731766 381 39566

    8325-4

    a

    Protein

    A

    including the signal peptide.

    * Mature protein A, amino acids 1-473 in Fig. 3.

    dMature protein

    A

    except COOH-terminal part, amino acids

    1-

    e

    From Movitz (2), solated by lysostaphin treatment of bacteria.

    From Lindmark

    et al.

    (8), solated by lysostaphin treatment

    of

    8 From Lindmark

    et al.

    (8), extracellular protein A produced by a

    Mature protein A except region

    E,

    amino acids 57-473.

    366.

    bacteria.

    methicillin-resistant strain.

    genes and plasmid-coded genes by the four putative proteins

    encoded by the staphylococcal plasmid vector pC194 (26).

    Also indicated by

    +

    or are the codon pairs which, according

    to Grosjean and Fiers (33), are most likely to be preferred or

    not preferred, respectively, by highly expressed genes. Their

    hypothesis predicts that efficient in-phase translation is fa-

    cilitated by proper choice of degenerate codewords, and the

    codon pairs marked in Table I1 are most dependent on max-

    imal codon-anticodon interaction energy.

    Table I1 shows that among the chromosomal genes the

    codon usage is randomly distributed. The per cent G/C of the

    degenerate third base is 42%, similar to the verall GC content

    of the Bacillus species involved, which is 42-47% (34). In

    contrast, the plasmid-coded genes have a marked preference

    for A/U bases, only

    22

    G/C. Although the repetitive nature

    of the protein Agene makes statistical analysis risky, it seems

    to exhibit aclear preference for third position A/U bases with

    a few exceptions, UUC (Phe), AAC (Asn), and AGC (Ser).

    Two of these exceptions can be explained by the Grosjean

    and Fiers (32) hypothesis. Furthermore, among the four codon

    pairs n which, according to the theory, selection for C is

    preferred, this nucleotide is indeed chosen 64% of the time

    (67/105). In contrast, he four codon pairs with predicted

    selection for U show a reversed ratio, and only 21 C (18/85)

    can be found. The GC content

    at

    the thirdbase of the codons

    is 32%, similar to theGC content of chromosomal DNA from

    S.

    aureus which is 30-33% (34). Therefore, the codon usage

    of the proteinA gene shows a preference for A/U bases

    adapting to theoverall GC content of the host cell with some

    exceptions, mainly following the Grosjean-Fiers (33) rules for

    highly expressed genes.

    Homology

    Plot

    Analysis-In order to search for homologous

    regions, the DNA sequence and

    its

    deduced amino acid se-

    quence were scanned by a computer program. Every point in

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    5/8

    D N A

    Sequence of Staphylococcal Protein A

    TABLE

    1699

    Prot-A Chromb Plasmid Prep

    -

    _

    Phe

    U U U

    uuc

    Leu U U A

    U U G

    cuu

    CUC

    CUA

    CUG

    Ile

    AUU

    AUC

    AUA

    Met

    AUG

    Val

    G U U

    GUC

    GUA

    GUG

    Ser U C U

    ucc

    UCA

    UCG

    Pro

    CCU

    ccc

    CCA

    CCG

    Thr ACU

    ACC

    ACA

    ACG

    Ala GCU

    GCC

    GCA

    GCG

    2

    12

    20

    5

    7

    1

    6

    2

    8

    9

    1

    6

    5

    2

    6

    2

    5

    0

    3

    2

    21

    0

    8

    2

    5

    1

    4

    0

    25

    1

    11

    5

    45

    20

    34

    22

    31

    7

    3

    31

    38

    30

    12

    29

    21

    21

    21

    30

    20

    21

    31

    22

    16

    11

    11

    25

    13

    16

    48

    45

    29

    36

    40

    38

    39

    11

    35

    13

    10

    4

    5

    4

    27

    5

    18

    12

    12

    1

    14

    4

    16

    1

    7

    4

    10

    5

    3

    1

    14

    4

    15

    5

    9

    1

    6

    1

    Tyr U A U

    UAC

    Term UAA

    UAG

    His

    CAU

    CAC

    Gin CAA

    CAG

    Asn AAU

    AAC

    Lys AAA

    AAG

    Asp GAU

    GAC

    Glu

    GAA

    GAG

    CysGU

    UGC

    Term UGA

    Trp

    UGG

    Arg CGU

    CGC

    CGA

    CGG

    Ser AGU

    AGC

    Arg AGA

    AGG

    GlyGU

    GGC

    GGA

    GGG

    Sum

    8 49

    1 33

    0

    0

    6 27

    1

    8

    383

    2 35

    20

    68

    451

    51 79

    18 26

    21 81

    195

    379

    1 35

    0 2

    0 2

    0

    0 35

    3 18

    3

    5

    0 10

    0

    9

    3 19

    127

    0 11

    0 14

    18 22

    146

    1

    46

    0 20

    509 1654

    -

    -

    Per cent G/c

    32 42

    29

    9

    -

    17

    1

    16

    6

    43

    12

    56

    12

    22

    5

    19

    10

    7

    4

    9

    4

    1

    3

    0

    13

    3

    11

    4

    11

    2

    3

    3

    655

    22

    -

    B

    Protein

    A

    including the signal peptide (preprotein).

    The sum of four Bacillus chromosomal genes, B. amyloliquefaciens a-amylase (25), B. subtilis a-amylase (29).

    e

    Four putative proteins

    of

    pC194 (32).

    As

    the

    start

    codons are yet to be identified, the total open reading frames

    The eight codon pairs which aremost likely to be preferred (+) or not preferred (-) by highly expressedgenes

    .

    subtilis

    SpoOF

    (30),

    andB.

    licheniforrnis

    penicillinase

    (31).

    are taken into account.

    (331.

    e Per cent G/C in the third degenerate base. The codons AUG (Met),UGG (Trp), and AUA (Ile) are omitted.

    the homology plots represents an identical residue (1). The

    nucleotide triplets and the educed amino acids are compared

    in Fig. 5,

    A

    and 8 espectively. As the sequence is compared

    with itself, a line of identity occurs from the left upper corner

    to the ight lower corner, and homologous repeats show up as

    parallel lines, which disappear when no homology exists. The

    plots reveal two structurally distinct regions with internal

    homology, flanked by unique sequences without homology in

    the

    5

    and the

    3

    ends of the struc tura l gene. Thus , the part

    of the gene coding for the signal peptide (S)as well as the

    promoter region (5) seems to be totally unrelated to the gG-

    binding regions ( E , D, A ,

    B

    and

    C )

    located in the middle of

    the gene. The partof the gene coding for the COOH-terminal

    part of region X as well as the

    3

    flanking sequence seems to

    be unrelated to both the repetitious region X and the IgG-

    binding regions. Comparisons between the plots show that

    the homology lines in Fig. 5A are more broken than those in

    Fig.

    5B,

    which means that many of the nucleotide changes

    between the codons in the homologous regions have occurred

    in bases giving no amino acid change. These results strongly

    support the previously suggested hypothesis (27) of an evo-

    lutionary pressure in these regions keeping the amino acid

    sequence preserved.

    Structure of

    IgG binding

    Regions-The IgG-binding regions

    of protein

    A

    have been defined by trypsin cleavage of the

    mature protein nto functional IgG-binding uni ts D, A, B, and

    C (7, 27). Recently, we showed (10) hat strain 8325-4 also

    contains a fifth region

    E

    homologous to the four repetitive

    regions earlier identified by protein sequencing. In Fig. 6 the

    sequence of the regions are aligned to enable comparisons. In

    order to achieve maximal homology, the boundary of these

    regions has been moved 15 nucleotides towards the

    3

    end of

    the gene. This choice is of course arbitrary as the

    end and

    the 3 end of the repetitive region have diverged slightly.

    However, although the last ive amino acids of region C

    (292-

    296)

    are changed compared to region

    B,

    more than half of

    the nucleotides (8/15) are homologous, indicating a relation-

    ship. The same holds for the other endf the repetitive region

    located in the beginning of region E. Although the first three

    amino acids are different from region D, five out of nine

    nucleotides are identical. The cleavage points for trypsin are

    marked with arrows. There exists a nine-nucleotide insertion

    in region E giving three amino acid residues

    (59-61)

    not

    homologous to the othe regions. Also shown in Fig. 6 are the

    sequences flanking the repetitive regions.

    As

    already pointed

    out in the homology analysis (Fig. 5, A and

    B )

    these regions

    seem to be nonhomologous

    to

    the IgG-binding regions.

    A changed nucleotide compared to region B in Fig.

    6

    is

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    6/8

    1700

    A .

    DNA

    Sequence

    of

    Staphylococcal ProteinA

    B.

    5

    3'

    5

    S E

    D A

    B

    FIG.5.

    Dot matrix comparisons

    of

    the protein

    A

    sequence.

    A, the entire nucleotide sequence and the

    immediate 5' and 3' flanking sequences are compared with itself. Each

    dot

    represents the center

    of

    a three-base

    identity, and direct repeats appear as arallel lines across the grid. R the deduced amino acid sequence compared

    with itself.

    REGI ON C

    FIG.

    6.

    Comparisons

    of

    the IgG-binding regions and flanking regions.

    The sequences of the repetitive

    regions have been aligned to achieve maximal homology. The comparison

    is

    based on region

    B',

    and a nucleotide

    is marked with an

    asterisk

    and an amino cid is

    underl ined

    when different from the B' region. T he cleavage points

    for trypsin are marked with arrows.

    marked with anasterisk, and a changed amino acid is under-

    lined. Table I11 summarizes the aminocid changes and Table

    IV

    the codon changes between the regions.

    A

    comparison of

    th e five regions with respect to mutual relationship reveals a

    pronounced homology gradient along the protein molecule,

    i.e. the closer the location of two regions, he higher the degree

    of homology.

    As

    already pointedout by Sjodahl (27) , one

    interpretation of thisphenomenon s hat he primordial

    structural gene coding for the IgG-binding part of protein A

    has been subjected to stepwise gene duplications involving

    only one region followed by a period in which point mutations

    have occurred, thus generating slight ly dissimilar nucleotide

    and amino acid sequences. As a result of these evolutionary

    events, a homology gradient will evolve. The fac t tha t odons

    (Table IV) have changed much faster than aminocids (Table

    111) indicates that an volutionary pressure exists tokeep the

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    7/8

    D N A

    Sequence of Staphylococcal ProteinA

    TABLE11

    Com paris on of am ino acid s of the ZgG-binding regions

    The values listed represent the numberf changed amino acids of

    identically positioned residueswhen the regions are compared n

    pairs.

    Region

    E D A B C Total

    E

    0 11

    124

    21 57

    D

    11 0

    7 11

    176

    A

    12

    0

    5 15 40

    B

    14

    11

    5 0

    10

    41

    C

    21 17

    15

    10

    0

    64

    TABLEV

    Com paris on of codons of

    the

    ZgG-binding regions

    The values listed epresent henumber of changednucleotide

    triplets of identically pos itioned codons when the regions are cam-

    pared in pairs.

    Region E

    D

    A B C Total

    E

    0 31

    25 266

    118

    D

    31 0

    21 258

    105

    A 25

    21

    0 1 4 30 101

    B 26 25

    14 0 20

    8 6

    C

    36 28

    30 20 0

    115

    amino acidsequencepreserved. Sinc e he num ber of total

    changes of codons is lowest for region B (Table IV), his

    region was chosen for the com paris on in Fig. 6.

    Structuralstudies of protein A have suggested th at 11

    amino acids of the IgG-b inding egions are essential or bind-

    ing to the , part of th e immunoglobulins (35). Mo st f these

    amin o acids are assum ed to e located in two a-helical regions

    (35). In region

    B,

    the corre spon ding residues ar e 183-192

    and 198-211. As seen in Fig. 6, there are strikin g homologies

    in these two a- helices between th e diffe rent regions, suggest-

    ing an evolut ionary pressure to keep these residues intact.

    The chan ges observed are often out side the two helical areas,

    for instance, hechangedHis-Leu,atpo sitio n 193-194 of

    region B, to Asn-Met , inegions E, D, and A. This pressure

    is evenmore pronounced when comp aring he residues in

    these a-hel ice s hat nteract with IgG. In region B, these

    amino acids are 184-186 (G ln-Gln-Asn), 188-189 (P he-T yr),

    192 (Leu ), 203 (A sn) , 206-207 (Ile-G lu), and 210 (Ly s). As

    seen in Fig. 6, there s a serine instea d f aspargin e at position

    70, but all the other 49 residues are identical. Clearly, there

    is a strong pres sure to eep these ami no acids preserved.

    Apa rt from the mu tual homology between t he five regions,

    there also seem to exist internalhomologies in each region as

    revealed by trace s of lines n Fig.

    5,

    A a n d

    B.

    Hence , he

    nucleotide sequence coding for am ino acids 179 (L ys) t o 188

    (Phe) and 96 (AAC) o 205 (Phe) a l l wi th inegion B contains

    24 identical out

    of

    30 nucleotides. Ano ther subregion of inter-

    est is the nine-nucleotide insert, giving the amino acids 59-

    61, which has been observed in protein

    A

    both f rom

    S .

    a u r e u s

    Cowan I an d 8325-4. Th is subregion (residues 57-62) is pos-

    sibly related oother regions ike am ino acids 4-9 in the

    beginn ing of region

    E.

    A com parison nucleotide by nucleotide

    reveals th at 14 ou t of

    18

    bases are identical between these

    two regions.

    Struc ture of egion X-The repetitive nature of region

    X

    is indicated as mult iple l ines in Fig. 5, A a nd B , giving an

    appro ximately 300-base pair repetitive region

    (X,)

    followed

    by a constant region coding for

    81

    amino acids (Xc) . n Fig.

    7, the 24-nucleotide repeats are a l igned an d a mutual com-

    parison was performed.Again, a changed nucleotide is mark ed

    with an asterisk, a n d a changed aminoacid is underlined. T h e

    3

    end of the repetitive region is obviously located

    at

    amino

    acid 392 (see Fig. 7) which is directly followed by the con stan t

    2 0 9

    237

    305

    313

    32

    1

    329

    337

    3 4 5

    353

    361

    369

    377

    385

    3)

    3

    1701

    x 1

    x 2

    x3

    x 4

    x5

    X6

    x7

    X8

    x9

    x10

    x 1

    1

    x12

    FIG. 7. Comparisonof the repetitive units of region X and

    flanking regions.

    The sequences of the repetitive region have been

    aligned to achieve maximal homology. Th e comparison is based on

    region XI, and an altered nucleotide is marked with a n asterisk and

    an altered amino acid is

    underlined.

    The cleavage point for trypsin

    which defines region X (7, 20) is immediately before amino acid 292

    Glu).The numbers refer to the amino acids in Fig. 3.

    region. Since region C erm ina tes at am ino acid 296, the

    repe titive part of region

    X

    consists of exactly 12 units each

    with a length

    of

    24 nucleotides. The bou nda ry etween region

    C an d region

    X

    is, however, not clearly defined sinc e the 12

    last nucleotides, coding for he last four am ino acids of region

    C, are identical with th e corresponding am ino acids of region

    X1

    (Fig. 7).

    Stru ctura l stud ies based o n the cleavage with trypsin (7,

    20) have suggested that region X start s a t am ino acid 292

    which differs five amino acids from the bou nda ry chosen in

    Fig. 7. As discussed above, the end of region C is probably

    related to the other gG-binding regions, but this region has

    obviously diverged in the C OO H-term inal end, generat ing a

    few am ino acids identical with region X I. Th erefo re, struc-

    turally the o ctapep tide f region

    X

    seems tobe repeated 12.5

    times.

    Acomparison of the 12 repeated units reveals striking

    homologies. The six first amino acids (Lys-Pro-Gly-Lys-Glu-

  • 7/24/2019 J. Biol. Chem.-1984-Uhln-1695-702

    8/8

    1702 D N A Sequence of Staphylococcal Protein

    A

    Asp) are dentical hrougho ut he X, region. Th e two last

    amino acids are changed in a regular pat tern between Asn-

    Asn, Gly-Asn, or Asn-Lys. Although th e biological function

    of this extremely conserved octapeptide is no t known, clearly

    there has been a strong pressure to preserve i ts amino acid

    sequence. Hence , 12 nucleotides have changed when com par-

    ing the ix conserved amino acids in the12

    X,

    compartments,

    a l l occurring in a wobble posi t ion and therefore representing

    silent mutations.

    Apart from the dist inct24-nucleotide repeat, there arealso

    signs of

    a

    48-nucleotide rep eat. Th us, the obble base A/G in

    th e codon coding for th e first lysine is changed periodically

    in regions X7 to X12, and amino cid 7 is changed periodically

    between Asn an d Gly in regions 5 to 10 (see ig. 6).

    The re also seem s to be omeevidence or

    a

    homology

    gradient throughout the Xregion, a l though the gradient must

    be based on a 48-nucleotide repea t rathe r than the prim ordia l

    24-nucleotide sequence.

    In conclusion, th e evolution of the repe titive part f region

    X probab ly involved stepwise gene d uplications of an ances -

    tral 24- or 48-nucleotide long sequence. How this evolved at

    th e molecular level is unclear, bu t th e nucleotide sequence of

    the protein

    A

    gene from other stra ins, aswell as genes coding

    for prote ins with sim ilar repeated structures, may help in

    resolving th e molecular events causing tepwise multiple DNA

    duplications.

    Acknowledgments-We are grateful 50 Dr. Jo hn Sjoqu ist for critical

    comments an d advice. We thank Hans-Olof Pette rsson and Bjorn

    Jansson for skillful technical assistance and ChristinaPellettieri and

    Gerd Benson for patient secretarial help. We also thank Dr. Andras

    Gaal for introducing us to the thermostatic LKBMacrophor system

    and Dr. S tephe n Fahnestock for a correction of th e nucleotide se-

    quence.

    REF ERENCES

    1. Jeffreys, A.

    J.

    (1981) in Genetic Engineering (Williamson, R., ed)

    2. Fishetti,

    V.

    A., and Manjula, B. N. (1982) Semin. Infect.

    Dis.

    4,

    3. Hirano, H., Yamada, Y., Sullivan, M., de Crombrugghe, B., Pas-

    tan, I., andY am ada , K. M. (1983) Proc. Natl.

    Acad.

    Sci.

    U. s.

    A.

    Vol. 2, pp. 1-48, Academ ic Press, New York

    411-418

    80,46-50

    4. Ohno,

    S.

    (1981)

    Proc.

    Natl. Acad. Sci.

    U S.

    A.

    78,

    7657-7661

    5. Hartley,

    I. L.,

    and Gregori,

    T. J.

    (1981) Gene (Amst.)

    13,

    347-

    353

    6. Tanaka,

    T.

    1979)

    J .

    Bacteriol. 139,775-782

    7. Sjodahl, J. (1977) Eur. J . Biochem. 73, 343-351

    8.

    Lindmark, R., Movitz,

    I.,

    and Sjoquist,

    J.

    (1977)

    Eur. J .

    Biochem.

    74,623-628

    9. Beachey, E. H., Seyer, I. M., and Kang, A .H. (1982) Semin.

    10. Lofdahl,

    S.,

    Guss , B., U h lh ,

    M.,

    Philipson, L., and Lindberg, M.

    11. Langone, J. J. 1982) Adu. Zmmunol. 32,157 -252

    12. Boyer, H.

    W.,

    and Roulland-Dussoix, D. (1969)

    J.

    Mol. Biol. 4 1 ,

    13. M arinus, M. G. (1973) Mol.

    Gen.

    G en et. 1 2 7 , 4 7 4 5

    14. Bolivar, F., Rodriquez, R. L., Greene,

    P.

    J.,

    Betlach, M. C.,

    Heyneker, H. L., Boyer, H. W., Crosa, J. H., and Falkow,

    S.

    (1977)

    Gene

    (Amst.) 2,95-113

    15. Roberts, T. M., Swanberg, S. L., Poteete, A., Riedel, G., and

    Bachman, K. (1980) Gene (Amst.) 12, 123-127

    16. D ente, L., Cesaren i, Y., an d Cortese, R. (1983) Nucleic Acids Res.

    17. Birnboim, H. C., and Doly, J. (1979) Nucleic Acids Res. 7, 1513-

    18. Morrison, D. A. (1979) Methods Enzymol. 68,326-3 31

    19. Maxam, A.M., an d Gilber t, W. (1977) Proc. Natl. Acad. Sci.

    20. Sanger, F., Nicklen,

    S.,

    and Coulson, A. R. (1977) Proc. Natl.

    21.UhlBn,M., Nilsson, B., Guss, B., Lindberg, M., Gaten beck,

    S.,

    22. Kozak,

    M.

    1983) M icrobiol.

    Reu.

    47.

    1-45

    23. Shine, J., and D algarno, L. (1975) Nature Lord.) 54, 34-38

    24. McLaughlin, J. R., Murray, C. L., and Rabinowitz, C. (1981)

    J .

    Biol. Chem. 256,11283-11291

    25. Takkinen, K., Pettersson, R. F., Kalkkinen, N., Palva, I., Soder-

    lund, H., and Kaariiiinen, L. (1983)

    J.

    Biol. Chem. 258 , 1007-

    1013

    26. Johnson, W. C., Moran, C. P., and Losick,

    R.

    (1983) Nature

    (Lond.) 302,80 0-804

    27. Sjodahl, J. (1977)

    Eur. J .

    Biochem. 78, 471-490

    28. Movitz,

    J.

    (1976)

    Eur. J .

    Biochm.

    68,

    291-299

    29. Y ang, M., Galizzi, A,, and Hen ner , D. (198 3) Nucleic Acids

    Res.

    30. Shimotsu, H., Kawamura,

    F.,

    Kobayashi, Y., and Saito, H. (1983)

    31. Neugebauer, K., Sprengel, R., and Schaller, H. (1981) Nucleic

    32. Horinouchi, S., and W eisblum, B. (1982) J.Bacteriol.

    150,

    815-

    33. Grosjean,

    H.,

    nd Fiers,

    W.

    (1982)

    Gene

    (Amst.)

    18,

    199-209

    34. Fasm an, G. D. (ed) (1976) CRC Handbook of Biochemistry and

    Molecular Biology: Nucleic Acids Section 3rd Ed., Vol.

    11

    pp.

    69-183, CRC Press, Inc., Boca Raton, FL

    35. Deisenhofer,

    J.

    (1981) Biochemistry

    20,

    2361-2370

    Infect. Dis. 4,401-410

    (1983) Proc. Natl. Acad. Sci. U.

    S.

    A.

    80

    697-701

    459-472

    11,1645-1655

    1523

    U S . A. 74,560-564

    Acad. Sci. U. S. A. 74,5463-5467

    and Philipson, L. (1983)

    Gene

    (Amst.) 23,369-37 8

    11.237-249

    Proc. Natl. Acad. Sci.

    U

    S. A.

    80

    658-662

    Acids Res. 9 2577-2588

    825