genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary...

Upload: ripudaman-singh

Post on 29-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    1/9

    Genetic polymorphism and sequence evolution of an alternativelyspliced exon of the glial fibrillary acidic protein gene, GFAP

    Ripudaman Singh,a Anders L. Nielsen,a,b Marianne G. Johansen,a and Arne L. Jrgensena,*a Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark

    b Department of Molecular Biology, University of Aarhus, DK-8000 Aarhus C, Denmark

    Received 5 September 2002; accepted 29 March 2003

    Abstract

    Isoform GFAP of the human cytoskeletal protein GFAP carries, as the result of alternative splicing of exon 7a of GFAP, a novel

    42-amino-acid-long C-terminal region with binding capacity for the presenilin proteins. Here we show that exon 7a is present in a variety

    of mammals but absent from GFAP of chicken and fish. Comparison of the mouse and human GFAP exons showed an increased rate of

    nonsynonymous nucleotide substitutions in exon 7a compared to the other exons. This resulted in 10 nonconservative and 2 conservative

    amino acid substitutions and suggests that exon 7a has evolved under different functional constraints. Exons 7a of humans and higher

    primates are 100% identical apart from alanine codon 426, which is conserved in only 9% of the human alleles, while 21 and 70% of the

    alleles, respectively, have a valine or a threonine codon at that position. Threonine represents a potential phosphorylation site, and positive

    selection of that effect could explain the high allele frequency.

    2003 Elsevier Science (USA). All rights reserved.

    Keywords: Alternative splicing; Polymorphism; Allele frequency; Selection; Evolution

    Glial fibrillary acidic protein (GFAP) is the principal

    intermediate filament (IF) protein of the mature astrocytes

    of the central nervous system. It belongs to type 3 of the IF

    protein family and has a characteristic monomeric structure

    composed of a highly conserved central -helical rod do-

    main flanked by nonhelical head and tail domains. The

    monomers form homodimers and homotetramers or het-

    erotetramers with other IF proteins. Further multimerization

    produces the intermediate fibers of the cytoskeleton. Thus,

    GFAP provides structural stability to the astrocyte and may

    take part in modulating its shape and motility. Regulatory

    elements directing astrocyte-specific transcription havebeen identified, and synthesis of GFAP is rapidly upregu-

    lated in activated astrocytes. The cell-limited expression of

    GFAP is the basis for the routinely and widespread use of

    the protein as an antigen marker specific for the astrocyte

    [15].

    The human GFAP is a 432-amino-acid-long polypeptide

    of 55 kDa encoded by the nine exons of GFAP, which

    extend over 10 kb on chromosome 17q21 [68]. GFAP is

    phylogenetically old. Compared with mouse Gfap [9] the

    nucleotide sequence and exon/intron organization of the

    human gene are highly conserved and the polypeptide

    shows more than 90% homology to the mouse and pig

    GFAP and about 85% homology to GFAP of the goldfish

    [6,8,10]. Accordingly, antimammalian GFAP antibodies

    have been used successfully in comparative immunohisto-chemical studies of astrocytes in brains from bird, reptile,

    and fish [1115].

    We have previously characterized a novel human GFAP

    isoform, designated GFAP [16]. This isoform results from

    alternative splicing of a novel exon embedded in intron 7

    and the use of a new polyadenylation signal present in this

    exon, termed exon 7a. Hereby, the exons 8 and 9-encoded

    tail region of the classical isoform GFAP is replaced by a

    new tail region encoded by exon 7a. The generated isoform

    Sequence data from this article have been deposited with the EMBL/

    GenBank Data Libraries under Accession Nos. AY142187AY142200.

    * Corresponding author. Fax: 45-86123173.

    E-mail address: [email protected] (A.L. Jrgensen).

    R

    Available online at www.sciencedirect.com

    Genomics 82 (2003) 185193 www.elsevier.com/locate/ygeno

    0888-7543/03/$ see front matter 2003 Elsevier Science (USA). All rights reserved.

    doi:10.1016/S0888-7543(03)00106-X

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    2/9

    GFAP has protein binding capacity for the presenilin pro-

    teins in vitro [16]. In the present study we show that exon 7a

    is present also in GFAP of higher primates, the pig, and the

    mouse, but absent from GFAP of chicken, zebrafish, and

    goldfish. Interspecies comparison showed that the coding

    region of exon 7a has been under evolutionary constraints

    different from those on the other exons of the gene and wediscovered a high-frequency polymorphism in this exon

    among humans. We will argue that exon 7a is mammalian

    specific and propose that it may confer new and advanta-

    geous functions to the GFAP isoform.

    Results

    Species comparison of the nucleotide sequences

    of exon 7a

    The head and especially the highly conserved rod do-mains of the IF proteins secure proper dimer and tetramer

    formation and higher order polymerization, while the less

    conserved tail domains of the IF proteins are available for

    interaction with other cytosolic proteins [17]. Fig. 1A shows

    the exon/intron organization of the 3 end of human GFAP,

    and the two mRNA splice forms GFAP and GFAP are

    indicated. Exon 7a contains a functional polyadenylation

    site and GFAP is created by splicing of exon 7a directly

    onto exon 7 [16]. This results in a tail domain of the isoform

    GFAP whose amino acid sequence is different from and

    one amino acid shorter than the tail domain of GFAP

    (Fig. 1B).

    To study whether the nucleotide sequence of exon 7a has

    been conserved during evolution we obtained genomic

    DNA from nonhuman primates, including pygmy chimpan-

    zee (Pan paniscus), common chimpanzee (Pan troglodytes),

    gorilla (Gorilla gorilla), orangutan (Pongo pygmaeus), and

    baboon (Papio), and from the domestic pig (Sus scrofa

    domesticus), the mouse ( Mus musculus), the rat (Rattusnorvegicus), the chicken (Gallus gallus domesticus), the

    goldfish (Carassius auratus), and the zebrafish (Danio re-

    rio) and used these DNAs to identify and to sequence the

    coding region and some of the 3 UTR of exon 7a ofGFAP.

    The primers used to PCR amplify and sequence exon 7a are

    described in Table 4 and under Materials and methods.

    We were able to identify exon 7a only in the mammalian

    species. With respect to the nonmammalian species we

    amplified and sequenced the entire intron 7 of GFAP. Intron

    7 is about 2.3 kb long in the human and the mouse gene, but

    only 88 and 82 bp in goldfish and zebrafish, respectively,

    and 675 bp in chicken (Fig. 2). The nonmammalian intron 7sequences contained no indications of the presence of exon

    7a or other alternative splicing and polyadenylation signals

    (for specific intron 7 sequence information the accession

    numbers for zebrafish, goldfish, and chicken are given under

    Materials and methods).

    In Fig. 3A are shown the nucleotide sequences of the

    coding regions of exons 7a, identified in the species listed.

    The human sequence represents 12 unrelated individuals

    having identical sequences apart from a polymorphism at

    codon 426 of which the most frequent codon is shown. The

    sequence of the common chimpanzee represents four unre-

    lated individuals whose exon 7a sequences were 100%

    Fig. 1. Alternative splicing of human GFAP. (A) Exon/intron organization of the 3 end of the gene and the corresponding two mRNA splice forms GFAP

    and GFAP. Note polyadenylation signal pA in exon 7a. (B) Amino acid sequences of the tail domain of GFAP and GFAP. Sequences were obtained

    from Nielsen et al. [16].

    186 R. Singh et al. / Genomics 82 (2003) 185193

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    3/9

    identical, while the other sequences represent one individual

    from each species.

    The human exon 7a nucleotide sequence is 100% iden-

    tical to the exon 7a sequences in the three most closely

    related higher primates (pygmy chimpanzee, commonchimpanzee, gorilla) except for codon 426. This codon

    encodes alanine in all the nonhuman species listed: in the

    nonhuman higher primates the alanine codon is GCG, in the

    baboon it reads GCA, and in the pig and the mouse it reads

    GCC. Alanine at position 426 of the polypeptide, therefore,

    appears to be conserved. In humans, codon 426 can be

    either a threonine codon, ACG, shown in Fig. 3A, or a

    valine codon, GTG, or the ancestral alanine codon GCG.

    The threonine codon results from a G to A transition at the

    first position of the GCG alanine codon and represents a

    nonconservative amino acid substitution, while a C to T

    transition at the second position creates the valine codon

    and represents a conservative amino acid substitution. The

    tyrosine codon TAT at position 406 is found only in hu-

    mans, the chimpanzee, and the gorilla and most likely re-

    sults from a C to T transition at the first position of the

    histidine codon CAT present in the orangutan, the baboon,

    and the pig. The mouse has a proline codon, CCG, at

    position 406.

    In addition to the species-specific A in the third position

    of the alanine codon 426, the baboon sequence contains the

    proline codon CCA at position 428, shared only by the

    mouse, while the other higher primates have the proline

    codon CCG at this position. Thus, the pattern of sequence

    deviations in exon 7a among the primates, including hu-

    mans, is consistent with their evolutionary relatedness.

    The pig sequence has accumulated only one nucleotide

    change not shared by the other species, namely the neutral

    T of the glycine codon GGT at position 400. The corre-sponding glycine codon in the mouse reads GGC, while

    humans and the nonhuman primates have the asparagine

    codon AAT at that position. All other deviations of the pig

    sequence from the human and the nonhuman primate se-

    quences are shared by the mouse: the glutamic acid codon

    GAA at position 397, the glutamine codon CAA at position

    413, the alanine codon GCC at position 426, and the leucine

    codon CTC at position 430.

    Five codons of the mouse sequence encode amino acids

    not shared by any of the other species at these positions:

    glutamine codon CAA at position 401, proline codon CCT

    at position 406, valine codon GTC at position 415, glutamic

    acid codon GAA at position 423, and proline codon CCT at

    position 431. But the mouse sequence contains no neutral

    nucleotide deviation from the human sequence that is not

    shared by, at least, the rat.

    The rat sequence is unique, having experienced an insertion

    of the dinucleotide GC between codons 420 and 421 (Fig. 3A).

    The resulting shift in reading frame has changed the specificity

    of codons 421, 422, and 423 and created a stop codon, TAA,

    from the TA of codon 423 and the first A of codon 424. The

    tail region of the rat GFAP, therefore, not only is truncated but

    also contains four amino acids at the very C-terminus that are

    not found in any of the other species.

    Fig. 2. Species comparison of intron 7 of GFAP. Exon 7a is present only in intron 7 of the mammalian species and is flanked by direct repeats (arrows) in

    the mouse gene. Numbers refer to lengths in base pairs. UTR, 3 untranslated region of exon 7a, i.e., from stop codon to polyadenylation signal pA. Mouse

    and rat intron 7 sequences were obtained from Refs. [9] and [18]. Accession numbers for determined sequences are given under Materials and methods.

    187R. Singh et al. / Genomics 82 (2003) 185193

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    4/9

    The amino acid sequences encoded by exon 7a of the

    different species are aligned in Fig. 3B. Threonine at posi-

    tion 426 represents the most frequent of the 3 amino acid

    variants (threonine, valine, alanine) of the human-specific

    polymorphism at that position. Otherwise, the amino acid

    sequences are identical among the higher primates except at

    position 406, where the orangutan, instead of tyrosine,

    shares histidine with the baboon and the pig. The amino acid

    sequences diverged 30% between humans and the mouse,

    i.e., amino acid substitutions at 12 of 41 positions. Ten of

    these changes are nonconservative, only the changes of

    glutamic acid to aspartic acid at position 397 and valine to

    isoleucine at position 415 are conservative. By contrast, the

    only amino acid substitution that has occurred in the corre-

    sponding 42-amino-acid-long tail region of the isoform

    GFAP is a conservative aspartic acid to glutamic acid

    substitution at position 423 [16].

    The coding region of exon 7a has accumulated a unique

    pattern of nucleotide changes

    We conducted a sequence comparison between all 10

    exons of human and mouse GFAP. The numbers listed in

    Table 1 show that synonymous substitutions are more fre-

    quent than nonsynonymous ones in all exons except exon

    7a, for which the pattern is the opposite, with 15 nonsyn-

    onymous and 5 synonymous substitutions. Exons 8 and 9

    together contain only 1 nonsynonymous and 6 synonymous

    substitutions. In Table 1 are also listed the numbers of

    nonsynonymous and synonymous sites in exon 7a, exon 8,

    and exon 9. Synonymous and nonsynonymous sites are

    counted as follows: If the number of possible synonymous

    changes at a particular position in a codon is i, then this site

    is counted as i /3 synonymous and (3 i)/3 nonsynony-

    mous. The numbers of synonymous and nonsynonymous

    Fig. 3. Species comparison of the coding region of exon 7a. (A) Nucleotide sequences relative to the human sequence from codon 391 to stop codon TAGat position 432, indicated by an asterisk. Codon 426, which is polymorphic in the human population, is marked by a dot. Note the GC insertion in the rat

    sequence between codons 420 and 421. (B) Amino acid sequences derived from the nucleotide sequences in (A). Alanine at position 426, marked by a dot,

    is conserved among the nonhuman species. In humans, this position is most frequently occupied by threonine, less frequently by valine, and only rarely by

    the ancestral alanine. Note the truncated rat sequence due to the GC insertion indicated in (A). Asterisk corresponds to stop codon in (A). Abbreviations: C.

    and P. chimpanzee, common and pygmy chimpanzee. Accession numbers for determined sequences are given under Materials and methods.

    188 R. Singh et al. / Genomics 82 (2003) 185193

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    5/9

    sites are counted in both the human and the mouse sequence

    and the average is calculated. From these numbers we cal-

    culated the frequency of nonsynonymous substitutions per

    nonsynonymous site (KA) and the frequency of synonymous

    substitutions per synonymous site (KS) and their ratios (Ta-

    ble 2). More synonymous than nonsynonymous nucleotidesubstitutions are expected to accumulate, over time, in a

    coding sequence and the tighter a functional constraint is,

    the fewer nonsynonymous substitutions are allowed. Com-

    parisons between human and mouse genes have identified

    the KA/KS ratios to be 1, with an average of 0.2 [19,20];

    in genes encoding highly conserved amino acid sequences

    KS may exceed KA by more than 25 times [21]. Accord-

    ingly, we found that KS exceeds KA by some 30 times in the

    tail region of GFAP, encoded by the two exons 8 and 9

    (KA/K

    S 0.0344). In exon 7a, the nonsynonymous substi-

    tution rate is 20 times higher than in the combined exons 8

    and 9 (0.1819 vs 0.0103) and the synonymous substitutionrate is lower (0.1873 vs 0.2997). Thus, the KA/K

    Sratio of

    exon 7a is 0.9716, which is close to the theoretical ratio of

    1 expected for a sequence under no functional constraint. A

    KA/KS ratio 1 is normally regarded as a sign of positive

    selection since nonsynonymous substitutions are far more

    likely than synonymous substitutions to improve the func-

    tion of a protein [19,21].

    Table 1 contains the numbers of CpG dinucleotidespresent in the exons of the human and mouse GFAP. Seven

    CpGs are present in exon 7a of the human gene but none in

    exon 7a of the mouse gene. This discrepancy is unique to

    exon 7a, as the numbers of CpGs in all the other exons of

    human and mouse GFAP proved to be similar. We also

    counted the numbers of CpGs in the intronic sequences,

    presumably under no functional constraint, between exon 7

    and exon 7a and found no difference between the human

    and the mouse sequences (data not shown). Because of

    spontaneous deamination of the methylated C-residue of

    CpG dinucleotides, these dinucleotides tend to change to

    TpG or CpA, especially for CpGs present in a sequence thatis no longer subject to any functional constraint. To this end

    it is interesting that the seven CpG dinucleotides present in

    the human sequence do occur as TpG or CpA in the mouse

    sequence, suggesting that the human sequence is under

    different functional constraints.

    Codon 426 is polymorphic in the human population

    Our first sequenced exon 7a of human GFAP had thre-

    onine instead of the evolutionarily conserved alanine codon

    at position 426. Additional exon 7a sequences obtained

    from DNA from 12 unrelated individuals confirmed that the

    Table 1

    Characteristics of the nucleotide changes in human and mouse GFAP

    Exon Species Amino acids

    (n)

    Syn. subst.

    (n)

    Nonsyn. subst.

    (n)

    CpG

    (n)

    Syn. sites

    (n)

    Nonsyn. sites

    (n)

    1 Human 154a 48 25 26

    Mouse 153 27

    2 Human 20 4 0 1Mouse 20 1

    3 Human 32 7 5 4

    Mouse 32 2

    4 Human 54 17 9 9

    Mouse 54 8

    5 Human 51 17 2 15

    Mouse 51 16

    6 Human 65 23 3 14

    Mouse 65 12

    7 Human 14 5 0 2

    Mouse 14 1

    7a Human 41 5 15 7 92 5/6 30 1/6

    Mouse 41 0

    8 Human 29 5 0 3 66 21

    Mouse 29 39 Human 13 2 1 0 31 2/3 7 1/3

    Mouse 14b 2

    Note. Abbreviations: Syn. and Nonsyn. subst., synonymous and nonsynonymous substitutions.a Human exon 1 carries a duplication of alanine codon 9.b The last valine codon is duplicated in the mouse gene.

    Table 2

    Exon 7a has a distinct nucleotide substitution profile

    Amino acids KA

    KS

    KA/K

    S

    Exon 7a 41 0.1819 0.1873 0.9716

    Exons 8 and 9 42 0.0103 0.2997 0.0344

    Note. KA, nonsynonymous substitutions per nonsynonymous site; KS,

    synonymous substitutions per synonymous site; KA/KS, the ration between

    KA and KS.

    189R. Singh et al. / Genomics 82 (2003) 185193

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    6/9

    human sequence deviates from the higher primate sequence

    only at codon 426 and that the site is polymorphic with two

    variant codons, threonine codon ACG and valine codon

    GTG. We did not find the primate alanine codon GCG in

    this sample, suggesting that the frequency of this ancestral

    allele is less than 10%. The frequencies of the two variants

    and possibly the ancestral allele were determined by geno-typing 64 unrelated healthy individuals of Danish extraction

    with respect to codon 426. In our screening assay we took

    advantage of a HhaI recognition site, GCGC, created by the

    alanine codon GCG at position 426 and the C in the first

    position of codon 427, and another HhaI site 41 bp farther

    downstream in the 3 UTR of exon 7a (P1 and P2 in Fig.

    4A, see also Materials and methods). A PCR product (342

    bp long) including these restriction sites will cut with HhaI

    at codon 426 only if it contains alanine codon GCG (Fig.

    4B). A PCR product that does not cut here will contain

    either the threonine codon ACG or the valine codon GTG.

    These codons deviate from each other at positions 1 and 2

    and that allowed us to distinguish between the two alleles by

    a subsequent PCR assay with allele-specific primers

    (Fig. 4C).

    In Table 3 is listed the observed genotypes and the

    distribution of individuals with respect to these genotypes.

    The frequencies of the three alleles containing threonine

    codon ACG or valine codon GTG or alanine codon GCG

    were calculated from these figures. Assuming HardyWein-

    berg equilibrium we calculated the expected numbers of the

    genotypes, also listed in Table 3, and found no significant

    deviation from the observed numbers (p 0.7, using Wil-

    coxon nonparametric test). The frequency of the threonine-

    containing allele was 0.70 and by far the most frequentallele in the human population, followed by a frequency of

    0.21 for the allele with the valine codon, while the fre-

    quency of the ancestral allele with the alanine codon was

    only 0.09.

    Discussion

    We found that exon 7a is present in GFAP of humans,

    higher primates, the pig, and rodents, but absent from GFAP

    of zebrafish, goldfish, and chicken. Exon 7a may thus have

    originated in the common ancestor of the mammals. To thisend it is interesting that 10-bp-long direct repeats have been

    identified in intron 7 of the mouse gene flanking a 1.4-kb

    pyrimidine- and repeat-rich sequence that contains the en-

    tire exon 7a, including the polyadenylation signal. Flanking

    direct repeats are the signature of an insertion event and we

    found the 3 flanking repeat at the same position in the

    human and the rat sequences, whereas the 5 repeat could

    not be identified (Fig. 2). The polypyrimidine tract just

    upstream of exon 7a may be significant because polypyri-

    midine tract-binding proteins have been found to regulate

    tissue-specific alternative splicing [22].

    We have previously shown that exon 7a is alternatively

    spliced in-frame to exon 7 of human GFAP, creating a novel

    isoform, termed GFAP, with a new tail domain. This tail

    domain was shown to have a different protein binding

    capacity compared to the tail domain of isoform GFAP

    encoded by the evolutionarily conserved exons 8 and 9. A

    comparison of the human and mouse sequences shows that

    the coding part of exon 7a has accumulated more nucleotide

    changes than have exons 8 and 9 (Table 1). Furthermore,

    75% of these changes are nonsynonymous (15 of 20), which

    is the opposite of the distribution found in the other exons of

    GFAP. In fact, the number of nonsynonymous substitutions

    per nonsynonymous site (KA

    ) equals the number of synon-

    Fig. 4. Genotyping human individuals for the polymorphic codon 426 of

    exon 7a ofGFAP. (A) Map (not drawn to scale) of positions of primers and

    the polymorphic HhaI restriction sites P1 and P2 used to study the codon

    426 polymorphism. HhaI recognizes the sequence 5 GCGC and will cut at

    P1 only if codon 426 is alanine codon GCG. Primer CHK1 will prime if

    codon 426 is threonine codon ACG and primer CHK2 will prime if it is

    valine codon GTG. Gray area represents coding region and white area

    represents 3 untranslated region (see also Materials and methods). (B)

    HhaI cutting assay for codon 426 polymorphism. A 342-bp-long PCR

    product including the coding region of GFAP exon 7a was amplified from

    human genomic DNA, purified, and cut with HhaI. Lanes 2 and 3 representuncut and cut DNA, respectively, for a DNA sample with cutting at P1 on

    one allele and P2 on the other allele. Lanes 4 and 5 represent uncut and cut

    DNA, respectively, for a DNA sample with cutting at P2 on both alleles.

    No DNA samples were detected with cutting at P1 on both alleles. Lane 1

    contains a DNA size marker with the fragment sizes indicated to the left.

    (C) PCR assay to distinguish between ACG and GTG codons at position

    426. A PCR assay was employed using S2R as reverse primer in combi-

    nation with each of two new forward primers, CHK1 and CHK2 (Table 4),

    in which the last two nucleotides at the 3 end have specificity for either the

    ACG allele (CHK1) or the GTG allele (CHK2). In lanes 1 to 6 PCR

    fragments obtained from three different DNAs were analyzed by agarose

    gel electrophoresis. A DNA size marker was loaded in lane 7, with the

    fragment sizes indicated to the right.

    190 R. Singh et al. / Genomics 82 (2003) 185193

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    7/9

    ymous substitutions per synonymous site (KS) and could

    suggest that the sequence has accumulated nucleotide

    changes at random and may have lost function (Table 2).

    We will argue, however, that the nucleotide changes in exon

    7a result from positive (adaptive) selection of a new func-

    tion conferred by exon 7a to isoform GFAP.

    Positive selection is often defined by a KA/K

    Sratio 1.

    But a cut-off level of 1 means that significant functional

    changes in proteins will be missed, as illustrated by a num-

    ber of adaptively evolving genes whose KA/KS values lie

    between 1 and 0.6 [21]. Accordingly, a KA/KS value of

    0.9716 found in exon 7a is not inconsistent with positive

    selection.

    All but 2 of the 12 amino acid substitutions that result

    from the 15 nonsynonymous nucleotide changes in exon7a

    are nonconservative. Survival of these 10 nonconservative

    substitutions is more likely to result from positive selectionthan from loss of selection due to loss of function since all

    but one are present in the primates and some of these are

    shared by the pig. Note that all deviations from the mouse

    sequence of the pig amino acid sequence are shared by the

    primates. No amino acid substitution is found only in the pig

    (Fig. 3B). Also the discrepancy in the numbers of CpG

    dinucleotides in exon 7a of human and mouse GFAP (7 vs

    0) is unique (Table 1). No such discrepancy was found in

    the noncoding adjacent intronic sequences or in the other

    exons. Together, these observations may indicate that exon

    7a, with respect to sequence evolution since the split from

    the mouse lineage, has been under a constraint that does notinclude other exonic or noncoding sequences.

    The human polymorphism at codon 426 in exon 7a may

    elucidate this point (Fig. 4 and Table 3). In all mammals

    studied codon 426 encodes alanine and in all nonhuman

    higher primates the alanine codon is GCG. In the human

    population, however, the frequency of this ancestral allele is

    only 0.09. A variant allele carrying a valine codon, GTG, as

    the result of a nonsynonymous C to T transition at the

    second position of the alanine codon, is more frequent

    (0.21). The most frequent allele (0.70) carries a threonine

    codon, ACG, at position 426, created by a G to A transition

    at the first position of the alanine codon. It is intriguing that

    the only nucleotide differences between exon 7a of humans

    and that of the nonhuman primates are two nonsynonymous

    substitutions having occurred in codon 426 and which have

    created two variant alleles more frequent than the ancestral

    allele. In general, nonsynonymous substitutions are func-

    tionally disadvantageous and selected against and more so

    for nonconservative amino acid substitutions [19]. But pos-itive selection may work on nonsynonymous substitutions

    in a sequence that has conferred a new function to a protein.

    It may therefore be significant that by far the most frequent

    allele carries a nonconservative alanine-to-threonine substi-

    tution. Hereby, GFAP acquires a potential phosphorylation

    site that may have an advantageous functional effect. Pos-

    itive selection of this effect could explain the high fre-

    quency of the human-specific threonine allele.

    Materials and methods

    PCR amplification and sequencing of the coding region of

    exon 7a

    PCR-based analyses were done on genomic DNA puri-

    fied from whole blood from humans and primates, brain

    tissue from the pig, lever tissue from the chicken, and whole

    organisms of zebrafish and goldfish. In Table 4 are listed the

    primers used for PCR amplification and sequencing for each

    species. The primer combinations used were for humans,

    PCR primers 3 and 7 (annealing temperature (Ta) 56C),

    sequencing primers 6 and 7; common chimpanzee, PCR

    primers 2 and 5 (Ta

    56C), sequencing primers 3 and 7;

    pygmy chimpanzee, PCR primers 1 and 5 (Ta 56C), se-quencing primers 3 and 7; gorilla, PCR primers 2 and 4 (Ta56C), sequencing primers 6 and 7; orangutan, PCR primers

    2 and 4 (Ta 56C), sequencing primers 6 and 7; baboon,

    PCR primers 2 and 4 (Ta

    56C), sequencing primers 6 and

    7; domestic pig, PCR primers 8 and 9 (Ta

    57C), sequencing

    primers 8 and 9; chicken, PCR primers 10 and 11 ( Ta

    55C),

    sequencing primers 10 and 11; goldfish, PCR primers 12

    and 13 (Ta

    56C), sequencing primers 12 and 13; zebrafish,

    PCR primers 12 and 13 (Ta 56C), sequencing primers 12

    and 13; mouse, PCR primers 16 and 17 (Ta 60C), sequenc-

    ing primers 16 and 17; and rat PCR primers 18 and 19 (Ta

    60C), sequencing primers 19 and 20. The amplificationprogram, using Taq DNA polymerase (Amersham Pharma-

    cia Biotech, Inc.), was as follows: An initial denaturation

    step at 94C for 2 min followed by 30 cycles of PCR (94 C

    for 2 min, annealing at temperatures as indicated for 1 min,

    and extension at 72C for 1 min). Quality and quantity of

    each PCR product was evaluated by electrophoresis of 1/10

    of its volume in a 1.25% agarose gel along with a known

    amount of a 100 bp DNA ladder. The PCR product was

    purified from the rest of the amplification solution (45 l)

    using a GFX PCR DNA and gel band amplification kit

    (Amersham Pharmacia Biotech, Inc.) and dissolved in 40 l

    of double-distilled water and quantified again. DNA se-

    Table 3

    Genotype distribution of the polymorphic codon 426 of exon 7a of

    human GFAP

    Genotype

    Individuals (n)

    Observed alleles

    (n) carrying

    Observed Expected ACG GTG GCG

    ACG/ACG 35 32 70 0 0

    ACG/GTG 12 19 12 12 0

    GTG/GTG 6 3 0 12 0

    ACG/GCG 8 8 8 0 8

    GTG/GCG 3 2 0 3 3

    GCG/GCG 0 0 0 0 0

    Sum 64 64 90 27 11

    191R. Singh et al. / Genomics 82 (2003) 185193

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    8/9

    quencing of both strands was done by following the proto-

    col of the DYEnamic ET Terminator Cycle Sequencing Kit

    (Amersham Pharmacia Biotech, Inc.).

    Assay for codon 426 polymorphism

    DNA samples collected from 64 unrelated healthy adults

    of Danish extraction were PCR amplified using primersSFP2 and S2R and the protocol described above. The prim-

    ers define a 342-bp-long fragment that contains the coding

    sequence of exon 7a and adjacent 3 UTR sequences (Fig.

    4A). The ancestral alanine codon GCG at position 426 and

    the first C of the proline codon CCG at position 427 together

    form the HhaI recognition site 5GCGC3 (P1 in Fig. 4A).

    Another HhaI recognition site is located 41 bp farther

    downstream in the 3 UTR (P2 in Fig. 4A). Both HhaI sites

    are polymorphic, and cutting at P1 is in linkage disequilib-

    rium with absence of cutting at P2 and vice versa. Cutting

    at P1 results in two fragments of 179 and 163 bp and cutting

    at P2 produces two fragments of 220 and 122 bp (Fig. 4B).With a combination of PCR amplification and HhaI diges-

    tion it is possible to detect homozygosity and heterozygosity

    for the presence or absence of the ancestral alanine codon at

    position 426. One-fifth of the PCR product was cut by HhaI

    under conditions recommended by the supplier (New En-

    gland BioLabs, Inc.) and the restriction fragments were

    visualized as bands by electrophoresis in an ethidium bro-

    mide-stained 2% agarose gel. Among the 64 samples we

    never observed a banding pattern consistent with HhaI cut-

    ting at P1 on both alleles, i.e., homozygosity for the ances-

    tral alanine codon GCG. Samples that showed a heterozy-

    gous banding pattern (Fig. 4B, lane 3) had either ACG or

    GTG on the other allele and were genotyped by sequencing.

    Absence of the GCG alanine codon on both alleles produces

    the HhaI banding pattern shown in Fig. 4B, lane 5. Lack of

    HhaI cutting at P1 is due to either a G to A substitution at

    position 1 or a C to T substitution at position 2 of the GCG

    alanine codon and hence either an ACG threonine or a GTG

    valine codon at position 426. To distinguish between these

    two possibilities we employed a PCR assay using S2R asreverse primer in combination with each of two new for-

    ward primers, CHK1 and CHK2 (Fig. 4A and Table 4), in

    which the last 2 nucleotides at the 3 end have specificity for

    either the ACG allele (CHK1) or the GTG allele (CHK2).

    Each sample was tested in two corresponding PCRs, per-

    formed essentially as mentioned above. Production of a

    PCR fragment of 184 bp using CHK1 as forward primer and

    absence of a PCR product using CHK2 as forward primer

    indicated the presence of the ACG (threonine) allele; the

    opposite result indicated the presence of the GTG (valine)

    allele, while production of a PCR product with each of the

    forward primers would indicate the presence of both the

    ACG and the GTG allele in the sample tested (Figs. 4A and

    4C).

    Accession numbers.

    The DNA sequences determined have the following ac-

    cession numbers: human exon 7a GTG polymorphism

    (AY142187), human exon 7a GCG polymorphism

    (AY142188), human exon 7a ACG polymorphism

    (AY142191), baboon exon 7a (AY142190), common chim-

    panzee exon 7a (AY142192), pygmy chimpanzee exon 7a

    (AY142189), gorilla exon 7a (AY142193), orangutan exon

    Table 4

    Primer description

    No. Name Orientation Location Sequence

    1 FW-1 Forward Exon 7 CTC TCC CTC TGC TTT CTT TC

    2 SFP-1 Forward Exon 7 CTG CTT TCT TTC AGG ATC AC

    3 SFP-2 Forward Intron 7 CTG CAG ATC CCT GAG CAA G

    4 SRP-2 Reverse UTR of exon 7a CAG TTA CTC TGT ACC ACG TC5 SRP-3 Reverse UTR of exon 7a GAA CTG AGT CAG CAC TGA G

    6 S2F Forward Intron 7 GCC CTT CTG AGT GTT TTC TG

    7 S2R Reverse UTR of exon 7a CTG CAG TTC CTG GGA AAA TG

    8 Pig-F Forward Intron 7 CTT CTC CAA TCT GCA GAT CC

    9 PR1 Reverse Exon 7a pA ARC ATA AAR CTT TAT TCA CT

    10 ZOO-E7 Forward Exon 7 AGA ATC ACY RTT CCK GTR CAG A

    11 ZOO-E8 Reverse Exon 8 ACC TCT CCA TCM CGM RTC TCM AC

    12 FISH-Ex7 Forward Exon 7 CAG AAC TTC ACC AAC TTA CAG

    13 FISH-Ex8 Reverse Exon 8 CGG TTC GCA CAA CTA TGC TCC

    14 CHK1 Forward Exon 7a CAC CAG ATT GTA AAT GGA AC

    15 CHK2 Forward Exon 7a CAC CAG ATT GTA AAT GGA GT

    16 Delta Forward Intron 7 TAT GCT AAA GGT TAG GTT GTA TTA AC

    17 Delta Reverse Exon 7 TTA AAA TGA ACA GCA GGG AGC ATA A

    18 R-f (int) Forward Intron 7 GGT CTG CAA GCC ATG AAC AA

    19 R-f (exon) Forward Exon 7a GGG GCA AAG CAC CAA AGA20 R-Rev Reverse Intron 7a CAA GCC GGG AAA AGT ACA CA

    Note. The last 2 nucleotides of primers 14 and 15 are specific for threonine and valine, respectively, at codon 426.

    192 R. Singh et al. / Genomics 82 (2003) 185193

  • 8/9/2019 Genetic polymorphism and sequence evolution of an alternatively spliced exon of the glial fibrillary acidic protein g

    9/9

    7a (AY142196), pig exon 7a (AY142199), rat exon 7a

    (AY142198), mouse exon 7a (AY142200), chicken intron 7

    (AY142197), goldfish intron 7 (AY142194), zebrafish in-

    tron 7 (AY142195).

    Acknowledgments

    The Danish Medical Research Council (ldreforskning

    II Grant 9502112) supported this work. We thank Samir

    Deeb (University of Washington, Seattle, WA, USA) for the

    primate samples. The study was done in accordance with the

    guidelines of the Aarhus County Research Ethical Commit-

    tee.

    References

    [1] E. Fuchs, K. Weber, Intermediate filaments: structure, dynamics,

    functions, and disease, Annu. Rev. Biochem. 63 (1994) 345 382.[2] L.F. Eng, R.S. Ghirnikar, Y.L. Lee, Glial fibrillary acidic protein:

    GFAPthirty-one years (1969 2000), Neurochem. Res. 25 (2000)

    1439 1451.

    [3] F. Besnard, et al., Multiple interacting sites regulate astrocyte-specific

    transcription of the human gene for glial fibrillary acidic protein,

    J. Biol. Chem. 266 (1991) 1887718883.

    [4] R. Kaneko, N. Sueoka, Tissue-specific versus cell type-specific ex-

    pression of the glial fibrillary acidic protein, Proc. Natl. Acad. Sci.

    USA 90 (1993) 4698 4702.

    [5] R. Kaneko, N. Hagiwara, K. Leader, N. Sueoka, Glial-specific cAMP

    response of the glial fibrillary acidic protein gene in the RT4 cell

    lines, Proc. Natl. Acad. Sci. USA 91 (1994) 4529 4533.

    [6] S.A. Reeves, L.J. Helman, A. Allison, M.A. Israel, Molecular cloning

    and primary structure of human glial fibrillary acidic protein, Proc.

    Natl. Acad. Sci. USA 86 (1989) 5178 5182.[7] E. Bongcam-Rudloff, et al., Human glial fibrillary acidic protein:

    complementary DNA cloning, chromosome localization, and messen-

    ger RNA expression in human glioma cell lines of various pheno-

    types, Cancer Res. 51 (1991) 15531560.

    [8] A. Isaacs, M. Baker, F. Wavrant-De Vrieze, M. Hutton, Determina-

    tion of the gene structure of human GFAP and absence of coding

    region mutations associated with frontotemporal dementia with par-

    kinsonism linked to chromosome 17, Genomics 51 (1998) 152154.

    [9] J.M. Balcarek, N.J. Cowan, Structure of the mouse glial fibrillary

    acidic protein gene: implications for the evolution of the intermediate

    filament multigene family, Nucleic Acids Res. 13 (1985) 55275543.

    [10] I. Cohen, M. Schwartz, cDNA clones from fish optic nerve, Comp.

    Biochem. Physiol. 104B (1993) 439 447.

    [11] M. Kalman, A.D. Szekely, A. Csillag, Distribution of glial fibrillary

    acidic protein-immunopositive structures in the brain of the domestic

    chicken (Gallus domesticus), J. Comp. Neurol. 330 (1993) 221237.[12] M. Kalman, M.B. Pritz, Glial fibrillary acidic protein-immunoposi-

    tive structures in the brain of a crocodilian, Caiman crocodilus, and

    its bearing on the evolution of astroglia, J. Comp. Neurol. 431 (2001)

    460 480.

    [13] R.C. Marcus, S.S. Easter, Expression of glial fibrillary acidic protein

    and its relation to tract formation in embryonic zebrafish (Danio

    rerio), J. Comp. Neurol. 359 (1995) 365381.

    [14] M. Kalman, Astroglial architecture of the carp (Cyprinus carpio)

    brain as revealed by immunohistochemical staining against glial

    fibrillary acidic protein (GFAP), Anat. Embryol. 198 (1998) 409

    433.

    [15] M. Kalman, R.M. Gould, GFAP-immunopositive structures in spiny

    dogfish, Squalus acanthias, and little skate, Raia erinacea, brains:

    differences have evolutionary implications, Anat. Embryol. 204(2001) 59 80.

    [16] A.L. Nielsen, et al., A new spliceform of glial fibrillary acidic protein,

    GFAP, interacts with the presenilin proteins, J. Biol. Chem. 277

    (2002) 2998329991.

    [17] E. Fuchs, D.W. Cleveland, A structural scaffolding of intermediate

    filaments in health and disease, Science 279 (1998) 514 519.

    [18] D.F. Condorelli, et al., Structural features of the rat GFAP gene and

    identification of a novel alternative transcript, J. Neurosci. Res. 56

    (1999) 219 228.

    [19] D. Graur, Li, W-H., Fundamentals of Molecular Evolution, 2nd

    edition, Sunderland, MA, Sinauer, 2000.

    [20] W. Makalowski, M.S. Boguski, Evolutionary parameters of the tran-

    scribed mammalian genome: an analysis of 2,820 orthologous rodent

    and human sequences, Proc. Natl. Acad. Sci. USA 95 (1998) 9407

    9412.

    [21] D.A. Liberles, D.R. Schreiber, S. Govindarajan, S.G. Chamberlin,

    S.A. Benner, The adaptive evolution database (TAED), Genome Biol.

    2 (2001) 1 6.

    [22] A.D. Polydorides, H.J. Okano, Y.Y.L. Yang, G. Stefani, R.B. Darnell,

    A brain-enriched polypyrimidine tract-binding protein antagonizes

    the ability of nova to regulate neuron-specific alternative splicing,

    Proc. Natl. Acad. Sci. USA 97 (2000) 6350 6355.

    193R. Singh et al. / Genomics 82 (2003) 185193