encyclopedia of life sciences || non-b dna structure and mutations causing human genetic disease

15
Non-B DNA Structure and Mutations Causing Human Genetic Disease Albino Bacolla, The University of Texas M.D. Anderson Cancer Center, Smithville Texas, USA David N Cooper, Institute of Medical Genetics, School of Medicine, Cardiff University Cardiff, UK Karen M Vasquez, The University of Texas M.D. Anderson Cancer Center, Smithville Texas, USA In addition to the canonical right-handed double helix, several noncanonical deoxyribonucleic acid (DNA) sec- ondary structures have been characterised, including quadruplexes, triplexes, slipped/hairpins, Z-DNA and cruciforms. The formation of these structures is mediated by repetitive sequence motifs, such as G-rich sequences, purine/pyrimidine tracts, direct (tandem) repeats, alternating purinepyrimidines and inverted repeats, respectively. Such repeats are abundant in the human genome and are found in association with specific classes of genes, supporting a role for them in gene regulation or protein function. Repetitive sequence motifs are also commonly found at sites of chromosomal alteration, including gross rearrangements and copy number vari- ations (CNVs) associated with both disease and pheno- typic variation. Finally, variable number tandem repeats (VNTRs) or microsatellites are present in many gene regulatory regions. Characterised by an inherent capacity to expand spontaneously, such sequences are not only known to cause 430 neurological diseases but may also contribute to human disease susceptibility. The formation of alternative non-B DNA structures is believed to promote genomic alterations by impeding efficient DNA repli- cation and repair. Introduction Soon after the discovery that deoxyribonucleic acid (DNA) was an antiparallel right-handed double helix (Watson and Crick, 1953), work on synthetic single-stranded DNA molecules of defined sequence composition revealed the formation of a three-stranded structure, in addition to the expected duplex (Felsenfeld and Rich, 1957). This implied the existence of more than one type of DNA conformation. During the subsequent 50 years, the repertoire of con- formations in which synthetic DNA molecules of repeating sequence composition were found to assemble, increased steadily. To date, several unique DNA structures, quite distinct from the canonical B-form, have been character- ised, including left-handed Z-DNA, cruciforms, looped- out or slipped folds, parallel DNA, triplexes, quadruplexes and higher-order arrangements. These conformations are collectively called non-B DNA. In parallel to these biophysical investigations, DNA sequencing has revealed the existence in chromosomes of several different types of repetitive motifs known to adopt non-B forms in vitro (Christophe et al., 1985; Lyamichev et al., 1985; Wells et al., 1988), spurring speculation as to their biological function. Pioneering work in bacteria (Glickman and Ripley, 1984) suggested a role for non-B DNA-forming sequences in mediating chromosomal dele- tions in vivo. However, it was not until comparatively recently (Bacolla et al., 2004; Kurahashi and Emanuel, 2001; Repping et al., 2002; Wang and Vasquez, 2004; Wells and Ashizawa, 2006) that the concept of DNA secondary structure as a promoter of genomic rearrangements acquired broad support. Critical to these developments was the discovery of microsatellite repeat diseases (MRDs), a novel class of neurological disorders caused by the expan- sion of triplet (and other) microsatellite repeats (Brouwer et al., 2009; Dion and Wilson, 2009; Lee and Cooper, 2009; Lopez Castel et al., 2010; Wells and Ashizawa, 2006). The completion of the human and other mammalian genome-sequencing projects (Lander et al., 2001) has made it possible to explore the frequencies and locations of Advanced article Article Contents . Introduction . Non-B DNA Structures . Non-B DNA-forming Repeats and Human Genes . DNA Structure, Phenotypic Variation and Genetic Disease . Mechanisms Underlying Non-B DNA Induced Mutations . Prospects for the Future . Acknowledgements Online posting date: 18 th October 2010 ELS subject area: Genetics and Disease How to cite: Bacolla, Albino; Cooper, David N; and Vasquez, Karen M (October 2010) Non-B DNA Structure and Mutations Causing Human Genetic Disease. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd: Chichester. DOI: 10.1002/9780470015902.a0022657 ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 1

Upload: albino

Post on 09-Dec-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Non-B DNA Structure andMutations Causing HumanGenetic DiseaseAlbino Bacolla, The University of Texas M.D. Anderson Cancer Center, Smithville

Texas, USA

David N Cooper, Institute of Medical Genetics, School of Medicine, Cardiff University

Cardiff, UK

Karen M Vasquez, The University of Texas M.D. Anderson Cancer Center, Smithville

Texas, USA

In addition to the canonical right-handed double helix,

several noncanonical deoxyribonucleic acid (DNA) sec-

ondary structures have been characterised, including

quadruplexes, triplexes, slipped/hairpins, Z-DNA and

cruciforms. The formation of these structures is mediated

by repetitive sequence motifs, such as G-rich sequences,

purine/pyrimidine tracts, direct (tandem) repeats,

alternating purine–pyrimidines and inverted repeats,

respectively. Such repeats are abundant in the human

genome and are found in association with specific classes

of genes, supporting a role for them in gene regulation or

protein function. Repetitive sequence motifs are also

commonly found at sites of chromosomal alteration,

including gross rearrangements and copy number vari-

ations (CNVs) associated with both disease and pheno-

typic variation. Finally, variable number tandem repeats

(VNTRs) or microsatellites are present in many gene

regulatory regions. Characterised by an inherent capacity

to expand spontaneously, such sequences are not only

known to cause 430 neurological diseases but may also

contributeto humandisease susceptibility. Theformation

ofalternativenon-BDNAstructures isbelievedtopromote

genomic alterations by impeding efficient DNA repli-

cation and repair.

IntroductionSoon after the discovery that deoxyribonucleic acid (DNA)was an antiparallel right-handed double helix (Watson andCrick, 1953), work on synthetic single-stranded DNAmolecules of defined sequence composition revealed theformation of a three-stranded structure, in addition to theexpected duplex (Felsenfeld and Rich, 1957). This impliedthe existence ofmore than one type ofDNA conformation.During the subsequent 50 years, the repertoire of con-formations inwhich syntheticDNAmolecules of repeatingsequence composition were found to assemble, increasedsteadily. To date, several unique DNA structures, quitedistinct from the canonical B-form, have been character-ised, including left-handed Z-DNA, cruciforms, looped-out or slipped folds, parallel DNA, triplexes, quadruplexesand higher-order arrangements. These conformations arecollectively called non-B DNA.In parallel to these biophysical investigations, DNA

sequencing has revealed the existence in chromosomes ofseveral different types of repetitive motifs known to adoptnon-B forms in vitro (Christophe et al., 1985; Lyamichevet al., 1985; Wells et al., 1988), spurring speculation as totheir biological function. Pioneering work in bacteria(Glickman and Ripley, 1984) suggested a role for non-BDNA-forming sequences in mediating chromosomal dele-tions in vivo. However, it was not until comparativelyrecently (Bacolla et al., 2004; Kurahashi and Emanuel,2001; Repping et al., 2002;Wang and Vasquez, 2004;Wellsand Ashizawa, 2006) that the concept of DNA secondarystructure as a promoter of genomic rearrangementsacquired broad support. Critical to these developmentswasthe discovery of microsatellite repeat diseases (MRDs), anovel class of neurological disorders caused by the expan-sion of triplet (and other) microsatellite repeats (Brouweret al., 2009; Dion andWilson, 2009; Lee and Cooper, 2009;Lopez Castel et al., 2010; Wells and Ashizawa, 2006).The completion of the human and other mammalian

genome-sequencing projects (Lander et al., 2001) hasmadeit possible to explore the frequencies and locations of

Advanced article

Article Contents

. Introduction

. Non-B DNA Structures

. Non-B DNA-forming Repeats and Human Genes

. DNA Structure, Phenotypic Variation and Genetic

Disease

. Mechanisms Underlying Non-B DNA Induced Mutations

. Prospects for the Future

. Acknowledgements

Online posting date: 18th October 2010

ELS subject area: Genetics and Disease

How to cite:Bacolla, Albino; Cooper, David N; and Vasquez, Karen M (October

2010) Non-B DNA Structure and Mutations Causing Human Genetic

Disease. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd:Chichester.

DOI: 10.1002/9780470015902.a0022657

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 1

Page 2: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

chromosomal sites containing non-B DNA-formingsequences, as well as their evolutionary conservation inorthologous genomes. From a number of studies (reviewedin Zhao et al., 2010) it can be concluded that such sitesoccurmuchmore frequently than expected by chance aloneand that different non-BDNA-formingmotifs are found inassociation with different genomic regions. Hence, a novelview of genome structure and function has emerged inwhich repetitive DNA, once regarded as ‘junk DNA’,participates in the regulation of genes, telomere mainten-ance, protein function and genome stability.

The human genome sequencing project (HGSP) has alsoled to the realisation thatmany individual genomes exist (incontrast to the early concept of a nearly identical standardgenome common to all human beings) in which a largenumber of gross variations involving duplicated anddeleted regions (copy number variations or CNVs) com-prising large (41 kb in length, on average) segments ofDNA distinguish one individual from another. The adventof array comparative genomic hybridisation (aCGH)techniques over the past few years has made it possible toscreen the human genome for CNVs in genes/gene regionsthat either lead/predispose to disease or give rise tophenotypic variation in healthy individuals and popu-lations. The picture emerging is that some forms of non-BDNA, along with other sequence signatures, have a role inpromoting CNV formation. Indeed, these composite datasuggest that cells may have exploited the dynamicallypolymorphic nature of DNA to create functional genomicdiversity; at the same time, the instability of certainsequences can give rise to deleterious changes that lead, orpredispose, to genetic instability and disease.

Herein, we review the main forms of non-B DNAstructure, outline the distribution of the underlyingrepetitive motifs in the human genome, and present recentwork on the relationship between repetitiveDNAsequenceand human genetic disease as well as phenotypic variationthat may predispose to disease.

Non-B DNA Structures

To date, at least five types of non-B DNA structures havebeen associated with human genetic disease: cruciform,triplex, slipped/hairpin DNA, quadruplex and Z-DNA(Figure 1).MostDNA sequences in the human genome existin the B-form. However, repetitive sequences may alsoadopt alterative conformations as a consequence of mul-tiple hydrogen bonding interactions between bases or,as in the case of Z-DNA, rotational freedom about theN-glycosidic bond of G residues. Triplex and quadruplexDNA also rely on additional hydrogen bonding donor andacceptor groups existing in purines (A and G).

Cruciform DNA

Inverted repeats (IR), defined as a series of nucleotidesfollowed on the same strand by their complementary bases

and usually separated by a spacer, are required for cruci-form formation (Figure 1a). The term ‘complementary’ hererefers to the fact that A always pairs with T whereas Galways pairs with C in B-form DNA (so-called ‘Watson–Crick’ base-pairing).The IR symmetry makes it possible for the bases along

the same strand of DNA to pair with each other, ratherthan with the complementary strand, thereby giving riseto a cross-shaped structure (Figure 1a). In solution, twointerconvertible conformations have been observed: anextended conformation, in which each cruciform armoccupies the vertex of a simple tetrahedron, and a closedconformation, in which the two pairs of arms lie almostparallel to each other. The closed conformation is favouredat physiological salt concentrations because of the shield-ing of negatively charged phosphates. Thus, it is considereda good approximation of cruciform (and Holliday junc-tion) structures in vivo.

Triplex DNA (H-DNA)

Triplex DNA is a three-stranded structure in which a third(DNA or ribonucleic acid (RNA)) strand binds to B-DNAby occupying its wide major groove. Watson–Crickhydrogen bonds are not altered in H-DNA; rather, purinebases along one strand of duplex DNA engage theincoming third strand throughHoogsteen-hydrogenbondsusing available exocyclic groups not involved in Watson–Crick interactions. This third strand may originate from anearby sequence on the same molecule (intramoleculartriplex) or from separatemolecules (intermolecular triplex),comprising either RNA, DNA or exogenously added tri-plex-forming oligonucleotides (TFOs). Hence, triplexDNArequires a succession of purines on the same strandofDNA. Four types of Hoogsteen-hydrogen bonds typicallyoccur: A:T, A:A, G:C+ (C+, protonated base) and G:G(Figure 1b and c), which yield two types of triplexes: a YRYtype, inwhich the third strand is pyrimidine-rich (Figure 1c),and an RRY type, in which the third strand is purine-rich(Figure1b). In intramolecular triplexes, amirror repeat (MR)symmetry is required in duplex DNA, whereby bases fromone repeat separate from the complementary strand andfold back onto the second repeat to engage in Hoogsteenpairing. This reaction requires energy, which can be pro-vided by negative supercoiling. If it is the pyrimidine-richstrand that engages in Hoogsteen base-pairing (such asoccurs at low pH), the third strand runs parallel to thepurine-rich accepting strand; however, if it is the purine-rich strand that engages in Hoogsteen pairing (at neutralpH), the third strand runs antiparallel to the purine-richaccepting partner, as shown in Figure 1a.

Slipped/hairpin DNA

The reiteration of bases, such as CTGCTGCTG, or directrepeats (DR), provides the basis for the bulging out ofsmall loops in duplex DNA when, after separation ofthe complementary strands, the repeat array reanneals

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net2

Page 3: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

out-of-register. Thus, slipped DNA occurs during DNAreplication and transcription, two processes that neces-sarily entail strand separation. An alternative source ofsmall loops in DNA (particularly at mononucleotide runs,such as AAA) originates from stuttering or skippingduring DNA replication. Slippage during DNA repli-cation is believed to lead to large expansions of triplet

repeat sequences, and hence represents an impor-tant mechanism of mutation associated with MRDs.In such cases, long looped-out sequences may fold intodouble helices (hairpins) stabilised by mismatches, suchas T.T base pairs (Zheng et al., 1996), in addition tothe Watson–Crick A.T and G.C base pairs along thehairpins.

anti

G

G

G

GGG

GG G

GG G

( YR YR )n

AAGAGGxGGAGAAT TCTCC C CTC T Ty

(R Y)n mirror

repeatsTriplex

Left-handedZ - DNA

Cruciform Invertedrepeats

TCGGTAGCCA

C AGA CG TCT G

xy

QuadruplexOligo (G)n

tractsAG3(T2AG3)3

Single strand

Slipped(hairpin)structure

Directrepeats

B-Z junctions

Name Conformation General seq.requirements

Sequence

C GTCGCGTG G TGG CAGCGCAC C AC

C TGTCGGTT GG ACAGCCAA C

(a)

(b)

(c) (d)

L

D

CR

syn

Figure 1 Non-B DNA structures formed by genomic repetitive sequences. (a) Most common non-B DNA conformations, ribbon models of helical foldings,

repetitive motifs requirement and example of sequences. Center dot, Watson–Crick hydrogen bond interactions; x,y, nucleotides in the spacer between

repeats; L, lateral loop; D, diagonal loop and CR, chain reversal loop. For cruciform DNA, an extended conformation is shown. For triplex DNA, a 3’ RRY

isomer is depicted in which the 3’-half of the purine-rich strand folds back to form the Hoogsteen-bound third strand. For quadruplex DNA, an idealised

structure is drawn to highlight the loop characteristics and the relative orientation of the syn and anti N-glycosidic configurations. (b) RRY base triplets

showing the Hoogsteen-bound base (left). Thymine can be incorporated into RRY triplexes due to the symmetry of the carbonyl groups. (c) YRY base triplets

showing the Hoogsteen-bound pyrimidine and the stabilisation afforded by cytosine protonation. (d) G-tetrad.

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 3

Page 4: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Quadruplex (tetraplex, G4) DNA

G4 DNA has received considerable attention duringthe past few years, in part because human telomeres,whose function is often dysregulated in cancer, are com-posed exclusively of hexameric (TTAGGG)n repeatsequences capable of forming this type of non-B DNAstructure. For quadruplexes to form, the sequence patternrequired comprises four closely spaced sets of 2–4Gs, suchas the 3-setGGG(n1)GGG(n2)GGG(n3)GGG,where n1, n2and n3 represent 1–7bases of anykind (loop). Thebasic unitin G4 DNA, a G-tetrad or G-quartet, comprises a planararray of four guanines (one from each of the four sets)connected to eachother throughHoogsteen-type hydrogenbonds (Figure 1d). A core of at least two (three in theexample shown in Figure 1a) G-quartets stack on top ofone another, stabilised by cation coordination (potassiumion is most effective) between any two stacks while theloops provide strand connectivity along the edges ofthe stacked G-quartets. A high degree of structural poly-morphism has been revealed, in which loops connect ina lateral, diagonal or chain reversal fashion, strands runparallel or antiparallel to one another, and guanine re-sidues may adopt either the syn or the anti N-glycosidicconformation (Neidle, 2009; Figure 1a). Thus, compet-ing conformations usually form in solution. The mostcommon isomer, as evidenced by nuclear magneticresonance (NMR) and X-ray crystallography, is repre-sented by the (3+1)mixed topology, characterised by onechain reversal and two lateral loops and by a three syn plusone anti-arrangement of guanines per G-tetrad (Neidle,2009).

Z-DNA

Z-DNA is unusual among the non-B DNA structures inthat strandedness is reversed by the rotation from the anti(in right-handed B-DNA) to the syn (in left-handed Z-DNA) conformation of every G residue within tracts ofalternating GY (Y, pyrimidine) sequences, such as (GC)nor (GC)m(GT)n. In sharp contradistinction to B-DNA,which possesses both amajor and aminor groove, Z-DNAis characterised by a single deep and narrow groove and anoverall tube-like shape (Gessner et al., 1989). X-ray crys-tallography has also shown that the base pairs located atthe junctions between the B- andZ-sections (B-Z junctions,Figure 1a) have the tendency to flip out of the double-helix(Ha et al., 2005), thereby providing a substrate for basemodification or cleavage (Burrows and Muller, 1998).See also: DNA Structure

Non-B DNA-forming Repeats andHuman Genes

Completion of the HGSP (Lander et al., 2001) has madepossible the search for repetitive non-B DNA sequences

with the goal not only of determining their genome-wide distribution but also of elucidating their putativefunctional role(s). The first such study (Warburton et al.,2004) reported that large IRs (4100 kb) are present mostlyon the sex chromosomes, with male-specific genes andgene families essential for male fertility located at sym-metrical positions along the IR arms (Table 1; Skaletskyet al., 2003). The maintenance of gene function, particu-larly for the Y-chromosome that lacks a homologuefor recombination, is believed to depend on the form-ation of large cruciform structures by IR sequences,which potentiate the correction of mutations and dou-ble-strand breaks by intrachromosomal recombinat-ion (Lange et al., 2009; Rozen et al., 2003; Zhao et al.,2010).Quadruplex-forming motifs are strongly over-repre-

sentednear (+ 500 bp) transcription start sites and at the 3’end of genes (1/4 to 1/3 of � 20 000 human genes) associ-atedwith regulatory functions, including proto-oncogenes,such as MYC, KRAS and KIT (Du et al., 2008; Huppertet al., 2008; Table 1). Genome-wide gene expression ana-lyses (Du et al., 2008; Fernando et al., 2009) support theconclusion that these sequences play an active role intranscriptional regulation through the formation ofquadruplex structures (Figure 2a).The first hint that long (41 kb) runs of homo-

purines.homopyrimidines might exist in the human gen-ome came from the sequencing of the polycystic kidneydisease 1 (PKD1) gene (Van Raay et al., 1996). A total of228 genes were subsequently found to contain such tracts(� 250 bases in length) in introns (Bacolla et al., 2006),and these genes disproportionately encoded proteins witha function in cell communication and synaptic transmis-sion of the nerve impulse (Table 1). These gene classes arealso enriched in GGAA, GAAA and GGGA tetranucleo-tide repeats, which have the capacity to form stabletriplexes and are characterised by unusually strongbase stacking. Although these types of genes are gene-rally weakly transcribed (Zhao et al., 2010; Figure 2b), ahigh proportion of them is preferentially expressed in thebrain. Thus, homopurines.homopyrimidines may playspecific roles in brain-associated gene function andregulation.The distribution of short tandem repeats (slipped/

hairpinDNA) in protein-coding sequences is dominatedbytriplet repeats encoding homopolymeric runs of spe-cific amino acids, such as polyglutamine, in transcri-ption factors and gene-regulatory proteins that bind DNAand RNA (Table 1; Bacolla et al., 2008). Homopolymericruns of amino acids are known to play critical roles inprotein–protein and protein–DNA/RNA interactions(Liu et al., 2006) and their number at specific loci appears toincrease as one progresses from the genomes of simplerspecies to the more complex. Thus, DNA slippage andhairpin/loop formation may have been exploited overevolutionary time as a means to acquire, or fine-tune,protein function. See also: Genetic Variation: Polymorph-isms and Mutations

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net4

Page 5: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Table 1 Non-BDNA-forming repeats and human genes.Most relevant repetitive DNA sequences associated with human genes

or gene classes

Gene/gene families in the largest (4100 kb) inverted repeats (IR)

Chromosome IR arm size (kb) Gene/gene class

Tissues with predominant

expression

Y palindrome P1 1450 DAZ Testes

Y palindrome P5 495.5 CDY Testes

Y palindrome P3 283.0 PRY Testes

Y palindrome P4 190.2 HSFY Testes

Xp11.22 142.2 GAGE-D2,3 Testes

Xq22.1 140.6 NXF2 Testes

Y palindrome P2 122.0 DAZ Testes

Xq13.1 119.3 DMRTC1 Testes, kidney, pancreas

11q14.3 103.9 RNF18 Testes, kidney, spleen

Purine:pyrimidine tracts in introns of genes

Gene category/function P-value

� 250 nt (228 genes) � 100 nt (1951 genes)

Ion channel activity 1.95� 10205 5.92� 10209

Protein binding 3.14� 10203 6.25� 10215

Glutamate receptor activity 6.11� 10204 1.92� 10207

Cell adhesion 1.11� 10204 3.36� 10212

Cell communication 2.19� 10204 5.24� 10215

Transmission of nerve impulse 1.83� 10204 5.24� 10208

Synapse 2.18� 10202 7.69� 10205

Alternative splicing ND 2� 10282

Chromosomal translocations ND 1� 10207

Tetranucleotide repeats (TR) in introns of genes

Gene category/function/attribute P-value

Localisation to the membrane 1� 10207–5� 10230 (Range for 10 gene groups

containing: groups 1–8, 8–15 TR

units; group 9, 16 and 17 TR units;

group 10, � 18 TR units; 190–1423

genes/group)

Ion channel 5� 10202–1� 10213

Cell adhesion 8� 10204–2� 10237

Alternative splicing 1� 10264 � 8TR units (4182 genes)

Chromosomal translocations 2� 10207 � 8TR units (4182 genes)

Micro/minisatellites (2–11 nt repeats) in cDNAs

Gene category/function P-value (coding plus noncoding exons) (2626 genes)

Transcription regulator activity 2.0� 10240

Regulation of cellular processes 2.3� 10238

Protein binding 2.0� 10233

Sequence-specific DNA binding 3.8� 10223

Nuclear localisation 9.3� 10222

RNA polymerase II transcription factor activity 1.2� 10216

Axon guidance 2.3� 10205

MAPK signalling pathway 2.1� 10204

WNT signalling pathway 2.4� 10204

(Continued )

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 5

Page 6: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

DNA Structure, Phenotypic Variationand Genetic Disease

A number of studies have implicated the formation ofnon-B DNA conformations as a source of genomicrearrangements causing human genetic disease, includingFabry disease (Kornreich et al., 1990), mental retardation(Bonaglia et al., 2009; Rooms et al., 2007), ornithine tran-scarbamylase deficiency (Quental et al., 2009), uniparentaldisomy 14(mat) (Bena et al., 2010) and spermatogenicfailure (Repping et al., 2002) among others (reviewed inBacolla and Wells, 2004). As an example, Emanuel syn-drome (MIM #609029), characterised by severe mentalretardation, facial abnormalities and heart and kidneydefects, is a rare disease caused by the inheritance of asupernumerary der(22) chromosome from a parent carry-ing a constitutional translocation between chromosomes11 and 22 (t(11;22)(q23;q11)) (Figure 3). Cloning of thegenomic regions involved in the translocation revealed thatthe breakpoints typically occurred within narrow regionsonboth chr11 and chr22, at the centre of large (� 450bponchr11 and � 590 bp on chr22) IR structures comprisingalmost exclusively A and T bases. These recurring breaks,at the centre of specific IRs termed PATRR11 andPATRR22 (palindromic AT-rich regions) respectively, areconsistent with the formation of large cruciform structureson both chromosomes (Figure 3). This conclusion is sup-ported by a number of observations (Kurahashi et al.,2010). First, the PATRR22 sequence was shown to be bothpolymorphic and intrinsically unstable in the generalpopulation, such that deletions and duplications reducingor disrupting IR symmetry were commonly observed.Analyses of t(11;22) frequencies in sperm cells fromhealthyindividuals yielded an estimate of � 1.5� 1025 for the full-length IR chromosomes, but an � 10-fold reduction forthose chromosomes in which IR symmetry was disrupted,implying that cruciform structures were responsible forpromoting the translocation event. Second, fluorescence insitu hybridisation indicated that the 22q11 cluster wasinvolved in additional translocations, including 17q11,4q35.1, 1p21.2 and 8q24.1. In all cases, repetitive sequenceswith IR symmetry were detected on the partner

chromosome, with translocation frequencies decreasingwith decreasing length of the IR sequences. Thus, size (andhence stability) of cruciform structures correlates with thelikelihood of chromosomal breaks. Third, experiments inwhich two plasmids, one containing the PATRR11sequence and the other containing the PATRR22sequence, were transfected in human cells in culture dis-played recombination between PATRR11 and PATRR22.Taken together, these data support the conclusion thatlarge cruciforms are extruded from the PATRR11 andPATRR22 sequences, which are then cleaved at the centre,generating double-stranded broken ends that are sealed,yielding the der(22) and der(11) chromosomes (Figure 3).The number of identified human disorders associated

with gains (duplications) and losses (deletions) of geneticmaterial, types of mutation once believed to occur onlyrarely (Perry et al., 2008), has increased considerably overthe past few years (Zhang et al., 2010). The recent appli-cation of aCGH (Conrad et al., 2010; Perry et al., 2008) topatients afflicted with either single-gene disorders, such asthe CFTR gene associated with cystic fibrosis (Quemeneret al., 2010), or multigene disorders (Vissers et al., 2009;Zhang et al., 2010), indicates thatCNVscanalsobe involvedin these conditions. Additional genome-wide studies sup-port the conclusion that CNVs affect thousands of loci,contributing in all likelihood to the myriad phenotypic dif-ferences observed between individuals (Conrad et al., 2010;Perry et al., 2008). Bioinformatic analyses indicate that thedensity of repetitive DNA sequence motifs capable ofadopting non-BDNA conformations is significantly higheratCNVbreakpoints than thegenomeaverage, implying thatthe formation of local DNA secondary structure may rep-resent a common mechanism for CNV-mediated deletions,duplications and inversions (Conrad et al., 2010; Perry et al.,2008; Vissers et al., 2009).Gene conversion is a form of recombination that uses

homologous DNA sequences to repair a double-strandbreak (DSB) and involves the nonreciprocal transfer ofDNA from a ‘donor’ (the intact homologous sequence) toan ‘acceptor’ (the broken DNA end) sequence. Whensuch transfer inactivates functional genes, pathologicalconsequences can ensue. A comprehensive meta-analysis

Table 1 Continued

G-quadruplex in both 5’- and 3’-UTR

Gene category/function P-value

5’-UTR 3’-UTR

Guanyl-nt exchange factor activity 7.9� 10213 6.3� 10212

Rho guanyl-nt exchange factor activity 7.9� 10210 1.6� 10209

Regulation of Rho signal transduction 2.0� 10210 1.6� 10209

Transcription factor activity 6.3� 10205 3.2� 10210

Sequence-specific DNA binding 2.5� 10206 3.2� 10208

Note: ND, not determined.Source: With kind permission from Springer Science+Business Media (Zhao et al., 2010).

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net6

Page 7: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Ln gene expression levels

2 4 6 8 10 12

0

2

4

6

8

10

12

14

16

18

Control genesGenes with ≥18 Tetra units

(a)

(b)

9.3

PG4MD500-negative genes

PG4MD500-positive genes

9.29.1

98.98.8

8.7

8.6

8.58.4

8.38.2

8.1

87.9

7.87.7

7.6

Gen

e ex

pre

ssio

n le

vel

Ova

ryC

iliar

y ga

nglio

nA

trio

vent

ricul

ar n

ode

Skin

DRG

Glo

bus

pal

lidus

Cer

ebel

lum

Who

le b

rain

Panc

reas

Subt

hala

mic

nuc

leus

Trig

emin

al g

angl

ion

Sup

erio

r ce

rvic

al g

angl

ion

Test

is in

ters

titia

lU

teru

sKi

dney

Occ

ipita

l lob

eTr

ache

aA

dren

al g

land

Ap

pen

dix

Col

orec

tal a

deno

carc

inom

aSa

livar

y gl

and

Med

ulla

obl

onga

taFe

tal l

ung

Thym

usTe

stis

sem

inife

rous

tub

ule

Plac

enta

Pons

Tem

por

al lo

beC

ereb

ellu

m p

edun

cles

Test

is le

ydig

cel

lFe

tal l

iver

Cin

gula

te C

orte

xTo

ngue

Test

is g

erm

cel

lBM

-CD

71+

early

ery

thro

idLy

mp

hnod

eO

lfact

ory

bulb

Adr

enal

cor

tex

Am

ygda

laTo

nsil

Parie

tal l

obe

Leuk

emia

lym

pho

blas

tic (

mol

t4)

Cau

date

nuc

leus

Test

isA

dip

ocyt

eSk

elet

al m

uscl

eLe

ukem

ia c

hron

ic m

yelo

geno

usBo

ne m

arro

wLu

ngH

eart

Ute

rus

corp

usFe

tal b

rain

Leuk

emia

pro

mye

locy

tic(h

l60)

Smoo

th m

uscl

eFe

tal t

hyro

idW

hole

blo

odH

ypot

hala

mus

Live

rPr

osta

tePr

efro

ntal

cor

tex

Thal

amus

Pitu

itary

PB-C

D8+

T c

ells

Panc

reat

ic Is

lets

Bron

chia

l ep

ithel

ial c

ells

PB-B

DC

A4+

Den

triti

c _C

ells

PB-C

D14

+ M

onoc

ytes

PB-C

D4+

135

T c

ells

Spin

al c

ord

PB-C

D19

+ B

cells

BM-C

D10

5+ E

ndot

helia

lTh

yroi

dBM

-CD

33+

mye

loid

PB-C

D56

+NKC

ells

Car

diac

myo

cyte

s72

1-B-

lym

pho

blas

tsBu

rkitt

’s ly

mp

hom

a (R

aji)

Burk

itt’s

lym

pho

ma

(Dau

di)

BM-C

D34

+

Perc

ent

of g

ene

exp

ress

ion

data

Figure 2 Non-B DNA-forming repeats and genome-wide gene expression. (a) The gene expression profile of 13 237 nonredundant annotated human

(RefSeq) genes (y-axis) was examined in 79 tissues/cancer/cell types (x-axis) and the average values plotted for the 8124 genes that contained quadruplex-

forming repeats (filled squares) within +500 base pairs of the main transcription start site (TSS) and the remaining genes that did not contain such elements

within +500 base pairs of the main TSS (open squares). (b) The gene expression levels for 16 146 gene probes in the 70 human tissues/cell types included in

(a) was plotted as percent of data (y-axis) falling within specific intervals of gene expression (x-axis) for control genes (black) and for 190 genes (red) that

contained triplex-forming tetranucleotide repeats �72 base pairs long. With kind permission from Springer Science+Business Media (Zhao et al., 2010).

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 7

Page 8: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

(Chuzhanova et al., 2009) of the DNA sequences flankingthe regions of DNA exchange in 27 known disease-associ-ated gene conversion events indicated that non-B DNA-forming sequences, particularly IRs, and recombination-promoting motifs occurred more frequently than expectedby chancealone.Hence, at least somegene conversion eventsmay be initiated by DSBs at sites of non-B DNA structure.

MRDs represent a growing class of pathological con-ditions, currently numbering 430, caused by the expan-sion of tandem repeats,mostly trinucleotide repeats,withinhuman genes (Table 2; Brouwer et al., 2009; Lopez Castelet al., 2010). In most cases, the formation of DNA sec-ondary structures, including slipped/hairpin DNA, quad-ruplex and triplex structures is believed to drive theexpansion process (Lopez Castel et al., 2010;Mirkin, 2007;Wells and Ashizawa, 2006). MRDs may be broadly clas-sified into two types: type 1, in which expansions are large(usually hundreds of copies of the repeat) and occur innonprotein-coding regions (untranslated and intronic) andtype 2, in which expansions are less severe but occur inprotein-coding regions thereby altering protein sequence.See also: Chromosomal Genetic Disease: StructuralAberrations; Segmental Duplications and Genetic Disease

Type 1 MRDs

In fragile X syndrome (FXS), a CGG repeat is expanded inthe 5’ untranslated (UTR) region of the FMR1 gene. InFriedreich ataxia (FA), expansions of a GAA repeat occurin the first intron of the frataxin (FRDA) gene, whereasmyotonic dystrophy (DM) is caused by the expansion ofeither a CTG repeat in the 3’-UTR of the DMPK gene(myotonic dystrophy type 1, DM1) or a CCTG repeat inthe first intron of theZNF9 gene (myotonic dystrophy type2, DM2).

In FXS and FA patients, repeat expansion is associatedwith histone markers specific for heterochromatin (i.e.nontranscribed DNA), including deacetylation of histonesH3 and H4 and methylation of histone H3 at lysine 9(H3K9me), supporting the notion that loss-of-functiondue to gene silencing is the likelymechanism responsible forthese diseases. In the case of DM,most of the pathogenesishas been recapitulated in mouse models by experimentsthat showRNAgain-of-function and the altered activity oftwo proteins, MBLN1 (muscleblind-like 1) and CUGBP1(CUG-binding protein 1) (Figure 4).MBNL1 is a key regulator of alternative splicing that

binds to double-stranded RNA hairpins, including thoseformed by CUG repeats (Lee and Cooper, 2009). In cellsfrom DM1 patients, most MBNL1 is found in discretenuclear foci where it is sequestered by expanded CUG (orCCUG) hairpins. Several transcripts are aberrantly splicedin DM1 mouse models, including Clcn1 (chloride channel1), Insr (insulin receptor) and Tnnt2 (cardiac troponin T).Thus, loss-of-function of MBLN1 mediated by long RNAhairpins is thought to contribute to DM pathology (Figure

4a). The second protein affected is CUGBP1, which bindssingle-stranded CUG repeats and becomes hyperpho-sphorylated by PKC (protein kinase C) on CUG/CCUGrepeat expansion through an unknown mechanism. Acti-vation of CUGBP1 exacerbates the splicing defects ofMBNL1 deficiency. It also increases the translation ofembryonic transcripts, such as MEF2A, and prevents thedecay of short-lived mRNAs, including c-FOS and TNFa(tumour necrosis factor alpha). The antagonistic activitiesof MBNL1 and CUGBP1 are critical for the shifts inalternative splicing that accompany the transition from theembryonic to the adult developmental stages; DM maytherefore entail a reversal of this developmental pattern(Figure 4b). RNA gain-of-function is also thought to

chr22

chr11 der(11)

der(22)

Translocation

PATR

R22

PATR

R11

Figure 3 Cruciform-mediated chromosomal t(11;22) translocation and the Emanuel syndrome. The PATRR sequences on human chromosome 11 (green)

and 22 (black) are proposed to fold into large cruciform structures at some frequency during gametogenesis and be cleaved at the single-stranded tips,

resulting in double-strand breaks (left insets). The broken chromosomal ends (middle) join aberrantly, yielding the derivative chromosomes der(11) and

der(22) (right). Occasional inheritance of der(22), in addition to a normal karyotype, is responsible for the Emanuel syndrome in the offspring.

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net8

Page 9: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

underlie the pathology of fragile X-associated tremor/ataxia syndrome (FXTAS), affecting older patients carry-ing moderate CGG expansions in the frataxin gene, as wellas SCA8, SCA10 and SCA12 (Brouwer et al., 2009).

Type 2 MRDs

In type 2 diseases, triplet repeat expansions occur exclu-sively in coding exons andhence they alter protein sequenceby increasing the length of homopolymeric amino acidruns, typically polyglutamine and polyalanine (Table 2;Brouwer et al., 2009; Lopez Castel et al., 2010). Homo-polymeric amino acid runs contribute to ‘protein disorder’,a structural property that often plays a critical role inmediating protein–protein and protein–DNA inter-actions, particularly in protein families comprising tran-scription factors and regulators of transcription and DNAreplication (Uversky et al., 2009). Expansion beyond acritical threshold destabilises protein structure, causingaggregation or loss of the binding affinities with partnermolecules (Messaed and Rouleau, 2009).

The number of tandem repeats at a given locus is oftenpolymorphic in the general population (variable number oftandem repeats or VNTR), a process thatmay be driven byslipped hairpins or aberrant DNA replication. The pres-ence of a VNTR within an intron or the regulatory regionof a gene is often associated with a change in its tran-scription, thereby yielding functionally different alleles thatmay contribute to variable phenotypic traits (e.g. bloodpressure, heart rate and muscular tension) and suscepti-bility to multigenic diseases, such as brain disorders,obesity and stroke (Bacolla et al., 2008; Hannan, 2010).Thus, slipped DNA may also contribute to phenotypicvariability and susceptibility to disease.

In summary, the formation of aberrant DNA secondarystructure is the root cause of a number of quite differenthuman inherited diseases, where mutagenesis is elicited invarious ways: DNA breakage and stimulation of recom-bination leading to genomic rearrangements, altered RNAsecondary structure, mRNA codon changes resulting inimpaired protein function and altered transcriptionaffecting protein levels. See also: Trinucleotide RepeatExpansions: Disorders

Mechanisms Underlying Non-B DNAInduced Mutations

A large number of studies have been conducted in modelorganisms, including bacteria, fly, yeast, mammalian cellculture andmice to address themechanisms responsible fornon-BDNA-induced genetic instability, particularly in thecontext of repeat expansion related to MRDs. Knockingdown the expression of proteins involved in DNA repli-cation, repair and recombination affected repeat stabilityby either increasing or decreasing instability (depending onthe specific enzyme), suggesting that these biochemical

processes are all potentially involved (Lopez Castel et al.,2010; Mirkin, 2007; Wells and Ashizawa, 2006). Despitethese important advances, the mechanisms restricting thetiming of repeat instability and genomic rearrangements toearly development remain elusive. In transgenic mice, thepresence ofMsh2 andMsh3, two proteins of the mismatchrepair pathway, was found to be critical for elicitingmicrosatellite repeat expansion during all stages of devel-opment (Figure 5; Dion andWilson, 2009). A role restrictedto adult somatic tissues has been noted for ataxia talan-giectasia and Rad3 related (Atr), post-meiotic segregationincreased 2 (Pms2) and the Ogg1 glycosylase. Atr, Pms2and Ogg1 are involved in DNA damage responses andDNA repair. Thus, both DNA damage and DNA repairmay play a critical role in eliciting non-B DNA-inducedgenetic instability, at least in the context of MRDs (Dionand Wilson, 2009).Concerning genomic rearrangements, nonhomologous

end-joining (NHEJ), a DSB repair pathway that can beerror-prone is believed to act on DSBs induced at sites ofnon-B DNA structures (Kha et al., 2010). These DSBs arelikely to result from the exposure, on non-B DNA for-mation, of single-stranded nucleotides to oxidative damageby endogenous reactive oxygen species or the activity ofsingle-stranded-specific endonucleases. See also: Trinu-cleotide Repeat Expansions: Mechanisms and DiseaseAssociations

Prospects for the Future

The relevance of unusual or aberrant DNA structures tohuman genetic disease has extended our interest in this areafrom the narrow confines of the nucleic acid biochemistry/physicochemical laboratory to a much wider communityoperating in medicine, clinical diagnostics, genetics andbioinformatics, thereby propelling the field into main-stream biomedical research. Paradoxically, the transientnature of DNA secondary structures remains a challengefor the development ofmethods aimed at identifying non-BDNA structures in living human cells.The advent of aCGH is likely to reveal more instances of

the co-localisation of repetitive DNA with CNVs, both inthe context of phenotypic variation, as well as in the sphereof susceptibility to infectious disease, cancer and inheriteddisease. As more information is acquired from mousemodels on the timing of repeat instability during develop-ment and its relationships with chromatin remodelling,CpG methylation-linked epigenetic reprogramming, andDNA damage and repair, experiments can be designed toaddress the biochemical pathways involved in non-B-dir-ected genetic instability.Considerable effort is currently being invested in

sequencing cancer genomes. This information will be cru-cial to address the question as to whether non-B DNA isalso responsible for at least some of the genomicrearrangements associated with cancer. The formation oftriplex DNA through the delivery of triplex-forming

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 9

Page 10: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Table 2 Microsatellite repeat diseases (MRDs). Type 1 and type 2 microsatellite repeat diseases

Disease Gene

Chromosome

location Amplet

Normal copy

length or

number

Expanded

copy length or

number Location

Disease

symbol

Type 1 diseases

FRAXE-associated mental

retardation

AFF2 Xq28 CCG 4–39 4200 5’-UTR FRAXE

Spinocerebellar ataxia 10 ATXN10 22q13.31 ATTCT 529 4800 Intron 1 SCA10

Spinocerebellar ataxia 8 ATXN8OS 13q21 CTG 15–50 110–130 3’-UTR SCA8

Jacobsen syndrome CBL2 11q23.3 CCG 11 700–800 ? FRA11B

Myotonic dystrophy type 2 CNBP 3q21 CCTG 104–176 bp 75–11 000 Intron 1 DM2

Epilepsy progressive myoclonic CSTB 21q2.3 CCCCGCCCCGCG 2–3 30–75 Promoter EPM1

Mental retardation DIP2B 12q13.13 CGG 6–23 250–285 5’-UTR FRA12A

Myotonic dystrophy type 1 DMPK 19q13.32 CTG 530 50–2000 3’-UTR DM1

Fragile X mental retardation

syndrome

FMR1 Xq27.3 CGG 5–52 4200 5’-UTR FXS

Fragile X-associated tremor

ataxia syndrome

FMR1 Xq27.3 CGG 555 55–200 5’-UTR FXTAS

Friedreich ataxia FXN 9q21.11 GAA 7–20 200–900 Intron 1 FA

Spinocerebellar ataxia 12 PPP2R2B 5q32 CAG 7–31 55–78 5’-UTR SCA12

Cataract formation in myotonic

dystrophy

SIX5 19q13.32 CTG 5–37 � 50 Promoter SIX5

Spinocerebellar ataxia 31 TK2/BEAN 16q22 TGGAA 0 110 Intron SCA31

No

n-B

DN

AStru

cture

and

Mu

tation

sC

ausin

gH

um

anG

enetic

Disease

EN

CYC

LOPED

IAO

FLIFE

SC

IEN

CES&

2010,Jo

hn

Wile

y&

Son

s,Ltd

.w

ww

.els.n

et

10

Page 11: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Type 2 diseases

Spino-bulbar muscular atrophy

(Kennedy disease)

AR Xq12 CAG 17–26 40–52 Coding region SBMA

Mental retardation and epilepsy ARX Xp21.3 GCG 10–16 17–33 Exon 1 XLMR

Dentatorubro-pallidoluysian

atrophy (Haw river fever)

ATN1 12p13.31 CAG 3–36 48–93 Coding region DRPLA

Spinocerebellar ataxia 1 ATXN1 6p22.3 CAG 6–44 37–91 Coding region SCA1

Spinocerebellar ataxia 2 ATXN2 12q24.12 CAG 14–31 32–500 Coding region SCA2

Machado–Joseph disease ATXN3 14q32.12 CAG 13–47 53–86 Coding region SCA3

Spinocerebellar ataxia 7 ATXN7 3p14.1 CAG 7–35 36–300 Coding region SCA7

Spinocerebellar ataxia 6 CACNA1A 19p13.13 CAG 4–18 19–33 Coding region SCA6

Blepharophimosis syndrome

and premature ovarian failure 3

FOXL2 3q22.3 GCN 14 22–24 Coding region BPES

Hand-foot-genital syndrome HOXA13 7p15.2 GCN 14–18 22–30 Coding region HFGS

Brachydactyly syndactyly

syndrome

HOXD13 2q31.1 GCN 15 22–29 Coding region SPD

Huntington disease HTT 4p16.3–4p16.2 CAG 535 40–400 Coding region HD

Huntington disease-like 2 JPH3 16q24.2 CTG 6–27 44–57 Exon 1 HDL2

Oculopharyngeal muscular

dystrophy

PABPN1 14q11.2 GCG 6 7–13 Coding region OPMD

Congenital central

hypoventilation syndrome

PHOX2B 4p13 GCN 20 25–33 Coding region CCHS

Cleidocranial dysplasia RUNX2 6p12.3 GCK 17 27 Coding region CCD

Spinocerebellar ataxia 17 TBP 6q27 CAG 25–42 45–63 Coding region SCA17

Holoprosencephaly ZIC2 13q32.3 GCN 15 25 Coding region HPA

Source: With kind permission from Springer Science+Business Media (Zhao et al., 2010).

No

n-B

DN

AStru

cture

and

Mu

tation

sC

ausin

gH

um

anG

enetic

Disease

EN

CYC

LOPED

IAO

FLIFE

SC

IEN

CES&

2010,Jo

hn

Wile

y&

Son

s,Ltd

.w

ww

.els.n

et

11

Page 12: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

oligonucleotides (TFOs) is being pursued with the aim ofknocking-down gene expression or directly inactivatingtarget genes (Jain et al., 2008). This technology has thepotential to complement the arsenal of tools currently usedto combat cancer. Finally, studies aimed at developingreagents that can identify non-B DNA structures in cellsare ongoing. Major advances are therefore to be expectedin the coming years. See also: Chromosome 22; Compara-tive Genetics of Trinucleotide Repeats in the Human and

ApeGenomes; DNAMismatch Repair: Eukaryotic; DNARecombination; DNAStructure: A-, B- and Z-DNAHelixFamilies; DNA Structure: Sequence Effects; DNA Topol-ogy:Fundamentals; EukaryoticRecombination: Initiationby Double-strand Breaks; Human-specific Changes ofGenome Structure; Microdeletions and Microduplica-tions: Mechanism; Microsatellites; Microsatellite Instabil-ity; Nucleic Acids: General Properties; RecombinationalDNA Repair in Eukaryotes; Relevance of Copy Number

MBNL1

CUGBP1

Normal(adult)

DM1(embryonic)

Nucleus

STOP

CIC-1

5cTNT

IR Insulinresistance

Myotonia

Cardiacdefects ?

CIC-1

cTNT

11IR

PCUGBP1

(b)

CUG

CUG

CUG

CUG

CUG

CUG

CUG

CUGCUGCUGCUGCUG

CU

G

DMPK 3′-UTR

MBNL1

Alternativesplicing

Sequestration

?PKC

activation

PCUGBP1

PCUGBP 1

Increasedsteady state

level

Alternativesplicing

mRNAtranslation

mRNAdecay

MBNL1 loss-of-function CUGBP1 gain-of-function

(a)

Figure 4 Triplet repeat expansion alters mRNA function. (a) In DM1, CTG expansion in the 3’-UTR of the DMPK gene causes the ensuing mRNA to fold into

a large and stable double-stranded hairpin stabilised by U.U and G.C base pairs, which recruits muscleblind-like (Drosophila) (MBNL1), a mediator of pre-

mRNA alternative splicing regulation. CUG-hairpins also stimulate CUG RNA-binding protein 1 (CUGBP1) hyperphosphorylation and stabilisation, which

alter several events related to alternative splicing, mRNA tanslation and mRNA decay. (b) Sequestration of MBNL1 and CUGBP1 activation shift alternative

splicing programs from the adult stage towards embryonic-specific patterns, including activation of exon 5 inclusion of cardiac isoforms of TNNT2 (cTNT)

during heart remodeling, exclusion of exon 11 in the insulin receptor (IR) pre-mRNA and inclusion of stop-containing exon in chloride channel 1 transcripts.

Adapted from Lee and Cooper (2009) with kind permission by Portland Press Ltd. Copyright & the Biochemical Society.

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net12

Page 13: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Variation to Human Genetic Disease; Simple Repeats;Structural Diversity of the Human Genome and DiseaseSusceptibility; Telomeric and Subtelomeric RepeatSequences;TrinucleotideRepeatExpansions:Mechanismsand Disease Associations

Acknowledgements

This work was supported by NIH/NCI Grant CA093729to Karen M Vasquez and in part with federal funds fromthe National Cancer Institute, National Institutes ofHealth, under Contract No. HHSN261200800001E toAlbino Bacolla. The content of this publication does notnecessarily reflect the viewsof policies of theDepartment ofHealth and Human Services, nor does mention of tradenames, commercial products or organisations implyendorsement by the U.S. Government.

References

Bacolla A, Collins JR, Gold B et al. (2006) Long homo-

purine�homopyrimidine sequences are characteristic of genes

expressed in brain and the pseudoautosomal region. Nucleic

Acids Research 34: 2663–2675.

Bacolla A, Jaworski A, Larson JE et al. (2004) Breakpoints of

gross deletions coincide with non-B DNA conformations.

Proceedings of the National Academy of Sciences of the USA

101: 14162–14167.

Bacolla A, Larson JE and Collins JR (2008) Abundance and

length of simple repeats in vertebrate genomes are determined

by their structural properties.Genome Research 18: 1545–1553.

Bacolla A and Wells RD (2004) Non-B DNA conformations,

genomic rearrangements, and human disease. Journal of Bio-

logical Chemistry 279: 47411–47414.

BenaF,Gimelli S,Migliavacca E et al. (2010) A recurrent 14q32.2

microdeletion mediated by expanded TGG repeats. Human

Molecular Genetics 19: 1967–1973.

Bonaglia MC, Giorda R, Massagli A et al. (2009) A familial

inverted duplication/deletion of 2p25.1-25.3 provides new clues

on the genesis of inverted duplications. European Journal of

Human Genetics 17: 179–186.

Brouwer JR, Willemsen R and Oostra BA (2009) Microsatellite

repeat instability and neurological disease. BioEssays 31: 71–83.

Burrows CJ and Muller JG (1998) Oxidative nucleobase modifi-

cations leading to strand scission. Chemical Reviews 98: 1109–

1152.

ChristopheD,Cabrer B, BacollaA et al. (1985)Anunusually long

poly(purine)-poly(pyrimidine) sequence is located upstream

from the human thyroglobulin gene.Nucleic Acids Research 13:

5127–5144.

ChuzhanovaN,Chen JM,BacollaA et al. (2009)Gene conversion

causing human inherited disease: evidence for involvement of

non-B-DNA-forming sequences and recombination-pro-

moting motifs in DNA breakage and repair. Human Mutation

30: 1189–1198.

ConradDF,PintoD,RedonR et al. (2010)Origins and functional

impact of copy number variation in the human genome.Nature

464: 704–712.

Dion V andWilson JH (2009) Instability and chromatin structure

of expanded trinucleotide repeats. Trends in Genetics 25:

288–297.

Du Z, Zhao Y and Li N (2008) Genome-wide analysis reveals

regulatory role of G4 DNA in gene transcription. Genome

Research 18: 233–241.

Felsenfeld G and Rich A (1957) Studies on the formation of two-

and three-stranded polyribonucleotides. Biochimica et Biophy-

sica Acta 26: 457–468.

FernandoH, Sewitz S,Darot J et al. (2009)Genome-wide analysis

of a G-quadruplex-specific single-chain antibody that regulates

gene expression. Nucleic Acids Research 37: 6716–6722.

Gessner RV, Frederick CA, Quigley GJ, Rich A and Wang AH

(1989) The molecular structure of the left-handed Z-DNA

double helix at 1.0-A atomic resolution. Geometry, conform-

ation, and ionic interactions of d(CGCGCG). Journal of Bio-

logical Chemistry 264: 7921–7935.

Glickman BW and Ripley LS (1984) Structural intermediates

of deletion mutagenesis: a role for palindromic DNA. Pro-

ceedings of the National Academy of Sciences of the USA 81:

512–516.

Ha SC, Lowenhaupt K, Rich A, Kim YG and Kim KK (2005)

Crystal structure of a junction between B-DNA and Z-DNA

reveals two extruded bases. Nature 437: 1183–1186.

HannanAJ (2010) Tandem repeat polymorphisms:modulators of

disease susceptibility and candidates for ‘missing heritability’.

Trends in Genetics 26: 59–65.

Huppert JL, Bugaut A, Kumari S and Balasubramanian S (2008)

G-quadruplexes: the beginning and end ofUTRs.Nucleic Acids

Research 36: 6260–6268.

Pms2

Ogg1

Somatic

Msh2

Msh3

AtrAtr

Dnmt1

Germline

Msh2

Msh3

Embryo

Msh2

Msh3

Figure 5 Modulators of triplet repeat expansion in mouse models of

MRDs. The mismatch repair proteins Msh2 and Msh3 are required for

triplet repeat instability throughout all developmental stages. DNA

cytosine-5-methyltransferase 1 (Dnmt1) protects against expansions in an

expanded CAG model in the germline. Ataxia talangiectasia and Rad3

related (Atr) protein prevent expansion in the female germline and in

somatic tissues of a CGG repeat mouse model. Post-meiotic segregation

increased 2 (Pms2) and 8-oxoguanine glycosylase (Ogg1) are selectively

involved in age-dependent somatic instability. Reproduced from Dion and

Wilson (2009) with permission from Elsevier.

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 13

Page 14: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

Jain A, Wang G and Vasquez KM (2008) DNA triple helices:

biological consequences and therapeutic potential. Biochimie

90: 1117–1130.

Kha DT, Wang G, Natrajan N, Harrison L and Vasquez KM

(2010) Pathways for double-strand break repair in genetically

unstable Z-DNA-forming sequences. Journal of Molecular

Biology 398: 471–480.

Kornreich R, Bishop DF and Desnick RJ (1990) Alpha-gala-

ctosidase A gene rearrangements causing Fabry disease. Iden-

tification of short direct repeats at breakpoints in an Alu-rich

gene. Journal of Biological Chemistry 265: 9319–9326.

Kurahashi H and Emanuel BS (2001) Long AT-rich palindromes

and the constitutional t(11;22) breakpoint. Human Molecular

Genetics 10: 2605–2617.

Kurahashi H, Inagaki H, Ohye T et al. (2010) The constitutional

t(11;22): implications for a novel mechanism responsible for

gross chromosomal rearrangements. Clinical Genetics. [Epub

ahead of print; doi: 10.1111/j.1399-0004.2010.01445.x].

Lander ES, Linton LM, Birren B et al. (2001) Initial sequencing

and analysis of the human genome. Nature 409: 860–921.

Lange J, Skaletsky H, van Daalen SK et al. (2009) Isodicentric

Y chromosomes and sex disorders as byproducts of homolo-

gous recombination that maintains palindromes. Cell 138:

855–869.

Lee JE and Cooper TA (2009) Pathogenic mechanisms of

myotonic dystrophy. Biochemical Society Transactions 37:

1281–1286.

Liu J, Perumal NB, Oldfield CJ et al. (2006) Intrinsic disorder in

transcription factors. Biochemistry 45: 6873–6888.

Lopez Castel A, Cleary JD and Pearson CE (2010) Repeat

instability as the basis for human diseases and as a potential

target for therapy. Nature Reviews. Molecular Cell Biology 11:

165–170.

Lyamichev VI, Mirkin SM and Frank-Kamenetskii MD (1985)

A pH-dependent structural transition in the homopurine-

homopyrimidine tract in superhelical DNA. Journal of Bio-

molecular Structure and Dynamics 3: 327–338.

Messaed C and Rouleau GA (2009) Molecular mechanisms

underlying polyalanine diseases. Neurobiology of Disease 34:

397–405.

Mirkin SM (2007) Expandable DNA repeats and human disease.

Nature 447: 932–940.

Neidle S (2009) The structures of quadruplex nucleic acids and

their drug complexes. Current Opinion in Structural Biology 19:

239–250.

PerryGH, Ben-Dor A, TsalenkoA et al. (2008) The fine-scale and

complex architecture of human copy-number variation.

American Journal of Human Genetics 82: 685–695.

Quemener S, Chen JM, Chuzhanova N et al. (2010) Complete

ascertainment of intragenic copy numbermutations (CNMs) in

the CFTR gene and its implications for CNM formation at

other autosomal loci. Human Mutation 31: 421–428.

Quental R, Azevedo L, Rubio V, Diogo L and Amorim A (2009)

Molecular mechanisms underlying large genomic deletions in

ornithine transcarbamylase (OTC) gene. Clinical Genetics 75:

457–464.

Repping S, Skaletsky H, Lange J et al. (2002) Recombination

between palindromes P5 and P1 on the human Y chromosome

causes massive deletions and spermatogenic failure. American

Journal of Human Genetics 71: 906–922.

Rooms L, Reyniers E and Kooy RF (2007) Diverse chromosome

breakage mechanisms underlie subtelomeric rearrangements, a

common cause of mental retardation. Human Mutation 28:

177–182.

Rozen S, Skaletsky H,Marszalek JD et al. (2003) Abundant gene

conversion between arms of palindromes in human and ape Y

chromosomes. Nature 423: 873–876.

Skaletsky H, Kuroda-Kawaguchi T, Minx PJ et al. (2003) The

male-specific region of the humanY chromosome is amosaic of

discrete sequence classes. Nature 423: 825–837.

Uversky VN, Oldfield CJ, Midic U et al. (2009) Unfoldomics of

human diseases: linking protein intrinsic disorder with diseases.

BMC Genomics 10(suppl. 1): S7.

Van Raay TJ, Burn TC, Connors TD et al. (1996) A 2.5 kb

polypyrimidine tract in the PKD1 gene contains at least 23 H-

DNA-forming sequences.Microbial andComparativeGenomics

1: 317–327.

Vissers LE, Bhatt SS, Janssen IM et al. (2009) Rare pathogenic

microdeletions and tandem duplications are microhomology-

mediated and stimulated by local genomic architecture.Human

Molecular Genetics 18: 3579–3593.

Wang G and Vasquez KM (2004) Naturally occurring H-DNA-

forming sequences are mutagenic in mammalian cells. Pro-

ceedings of the National Academy of Sciences of the USA 101:

13448–13453.

Warburton PE, Giordano J, Cheung F, Gelfand Y and Benson G

(2004) Inverted repeat structure of the human genome: the

X-chromosome contains a preponderance of large, highly

homologous inverted repeats that contain testes genes.Genome

Research 14: 1861–1869.

Watson JD and Crick FH (1953) Molecular structure of nucleic

acids; a structure for deoxyribose nucleic acid. Nature 171:

737–738.

Wells RD and Ashizawa T (2006) Genetic Instabilities and

Neurological Diseases, 2nd edn. San Diego, CA: Elsevier/Aca-

demic Press.

Wells RD, Collier DA, Hanvey JC, Shimizu M and Wohlrab F

(1988) The chemistry and biology of unusual DNA structures

adopted by oligopurine.oligopyrimidine sequences. FASEB

Journal 2: 2939–2949.

Zhang F, Seeman P, Liu P et al. (2010) Mechanisms for non-

recurrent genomic rearrangements associated with CMT1A or

HNPP: rare CNVs as a cause for missing heritability.American

Journal of Human Genetics 86: 892–903.

Zhao J, BacollaA,WangGandVasquezKM(2010)Non-BDNA

structure-induced genetic instability and evolution.Cellular and

Molecular Life Sciences 67: 43–62.

Zheng M, Huang X, Smith GK, Yang X and Gao X (1996)

Genetically unstableCXG repeats are structurally dynamic and

have a high propensity for folding. An NMR and UV spec-

troscopic study. Journal of Molecular Biology 264: 323–336.

Further Reading

Balasubramanian S and Neidle S (2009) G-quadruplex nucleic

acids as therapeutic targets. Current Opinion in Chemical Biol-

ogy 13: 345–353.

D’Angelo CS, Gajecka M, Kim CA et al. (2009) Further delin-

eation of nonhomologous-based recombination and evidence

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net14

Page 15: Encyclopedia of Life Sciences || Non-B DNA Structure and Mutations Causing Human Genetic Disease

for subtelomeric segmental duplications in 1p36 rearrange-

ments. Human Genetics 125: 551–563.

Karlin S, Brocchieri L, Bergman A, Mrazek J and Gentles AJ

(2002) Amino acid runs in eukaryotic proteomes and disease

associations.Proceedings of theNationalAcademyofSciences of

the USA 99: 333–338.

Orr HT and Zoghbi HY (2007) Trinucleotide repeat disorders.

Annual Review of Neuroscience 30: 575–621.

Rich A and Zhang S (2003) Timeline: Z-DNA: the long road to

biological function. Nature Reviews. Genetics 4: 566–572.

Sinden RR (1994) DNA Structure and Function. San Diego:

Academic Press.

Stankiewicz P and Lupski JR (2010) Structural variation in the

human genome and its role in disease. Annual Review of Medi-

cine 61: 437–455.

Wang G, Christensen LA and Vasquez KM (2006) Z-DNA-

forming sequences generate large-scale deletions inmammalian

cells. Proceedings of the National Academy of Sciences of the

USA 103: 2677–2682.

Wang G and Vasquez KM (2009) Models for chromosomal rep-

lication-independent non-B DNA structure-induced genetic

instability. Molecular Carcinogenesis 48: 286–298.

Non-B DNA Structure and Mutations Causing Human Genetic Disease

ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 15