encyclopedia of life sciences || non-b dna structure and mutations causing human genetic disease
TRANSCRIPT
Non-B DNA Structure andMutations Causing HumanGenetic DiseaseAlbino Bacolla, The University of Texas M.D. Anderson Cancer Center, Smithville
Texas, USA
David N Cooper, Institute of Medical Genetics, School of Medicine, Cardiff University
Cardiff, UK
Karen M Vasquez, The University of Texas M.D. Anderson Cancer Center, Smithville
Texas, USA
In addition to the canonical right-handed double helix,
several noncanonical deoxyribonucleic acid (DNA) sec-
ondary structures have been characterised, including
quadruplexes, triplexes, slipped/hairpins, Z-DNA and
cruciforms. The formation of these structures is mediated
by repetitive sequence motifs, such as G-rich sequences,
purine/pyrimidine tracts, direct (tandem) repeats,
alternating purine–pyrimidines and inverted repeats,
respectively. Such repeats are abundant in the human
genome and are found in association with specific classes
of genes, supporting a role for them in gene regulation or
protein function. Repetitive sequence motifs are also
commonly found at sites of chromosomal alteration,
including gross rearrangements and copy number vari-
ations (CNVs) associated with both disease and pheno-
typic variation. Finally, variable number tandem repeats
(VNTRs) or microsatellites are present in many gene
regulatory regions. Characterised by an inherent capacity
to expand spontaneously, such sequences are not only
known to cause 430 neurological diseases but may also
contributeto humandisease susceptibility. Theformation
ofalternativenon-BDNAstructures isbelievedtopromote
genomic alterations by impeding efficient DNA repli-
cation and repair.
IntroductionSoon after the discovery that deoxyribonucleic acid (DNA)was an antiparallel right-handed double helix (Watson andCrick, 1953), work on synthetic single-stranded DNAmolecules of defined sequence composition revealed theformation of a three-stranded structure, in addition to theexpected duplex (Felsenfeld and Rich, 1957). This impliedthe existence ofmore than one type ofDNA conformation.During the subsequent 50 years, the repertoire of con-formations inwhich syntheticDNAmolecules of repeatingsequence composition were found to assemble, increasedsteadily. To date, several unique DNA structures, quitedistinct from the canonical B-form, have been character-ised, including left-handed Z-DNA, cruciforms, looped-out or slipped folds, parallel DNA, triplexes, quadruplexesand higher-order arrangements. These conformations arecollectively called non-B DNA.In parallel to these biophysical investigations, DNA
sequencing has revealed the existence in chromosomes ofseveral different types of repetitive motifs known to adoptnon-B forms in vitro (Christophe et al., 1985; Lyamichevet al., 1985; Wells et al., 1988), spurring speculation as totheir biological function. Pioneering work in bacteria(Glickman and Ripley, 1984) suggested a role for non-BDNA-forming sequences in mediating chromosomal dele-tions in vivo. However, it was not until comparativelyrecently (Bacolla et al., 2004; Kurahashi and Emanuel,2001; Repping et al., 2002;Wang and Vasquez, 2004;Wellsand Ashizawa, 2006) that the concept of DNA secondarystructure as a promoter of genomic rearrangementsacquired broad support. Critical to these developmentswasthe discovery of microsatellite repeat diseases (MRDs), anovel class of neurological disorders caused by the expan-sion of triplet (and other) microsatellite repeats (Brouweret al., 2009; Dion andWilson, 2009; Lee and Cooper, 2009;Lopez Castel et al., 2010; Wells and Ashizawa, 2006).The completion of the human and other mammalian
genome-sequencing projects (Lander et al., 2001) hasmadeit possible to explore the frequencies and locations of
Advanced article
Article Contents
. Introduction
. Non-B DNA Structures
. Non-B DNA-forming Repeats and Human Genes
. DNA Structure, Phenotypic Variation and Genetic
Disease
. Mechanisms Underlying Non-B DNA Induced Mutations
. Prospects for the Future
. Acknowledgements
Online posting date: 18th October 2010
ELS subject area: Genetics and Disease
How to cite:Bacolla, Albino; Cooper, David N; and Vasquez, Karen M (October
2010) Non-B DNA Structure and Mutations Causing Human Genetic
Disease. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd:Chichester.
DOI: 10.1002/9780470015902.a0022657
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 1
chromosomal sites containing non-B DNA-formingsequences, as well as their evolutionary conservation inorthologous genomes. From a number of studies (reviewedin Zhao et al., 2010) it can be concluded that such sitesoccurmuchmore frequently than expected by chance aloneand that different non-BDNA-formingmotifs are found inassociation with different genomic regions. Hence, a novelview of genome structure and function has emerged inwhich repetitive DNA, once regarded as ‘junk DNA’,participates in the regulation of genes, telomere mainten-ance, protein function and genome stability.
The human genome sequencing project (HGSP) has alsoled to the realisation thatmany individual genomes exist (incontrast to the early concept of a nearly identical standardgenome common to all human beings) in which a largenumber of gross variations involving duplicated anddeleted regions (copy number variations or CNVs) com-prising large (41 kb in length, on average) segments ofDNA distinguish one individual from another. The adventof array comparative genomic hybridisation (aCGH)techniques over the past few years has made it possible toscreen the human genome for CNVs in genes/gene regionsthat either lead/predispose to disease or give rise tophenotypic variation in healthy individuals and popu-lations. The picture emerging is that some forms of non-BDNA, along with other sequence signatures, have a role inpromoting CNV formation. Indeed, these composite datasuggest that cells may have exploited the dynamicallypolymorphic nature of DNA to create functional genomicdiversity; at the same time, the instability of certainsequences can give rise to deleterious changes that lead, orpredispose, to genetic instability and disease.
Herein, we review the main forms of non-B DNAstructure, outline the distribution of the underlyingrepetitive motifs in the human genome, and present recentwork on the relationship between repetitiveDNAsequenceand human genetic disease as well as phenotypic variationthat may predispose to disease.
Non-B DNA Structures
To date, at least five types of non-B DNA structures havebeen associated with human genetic disease: cruciform,triplex, slipped/hairpin DNA, quadruplex and Z-DNA(Figure 1).MostDNA sequences in the human genome existin the B-form. However, repetitive sequences may alsoadopt alterative conformations as a consequence of mul-tiple hydrogen bonding interactions between bases or,as in the case of Z-DNA, rotational freedom about theN-glycosidic bond of G residues. Triplex and quadruplexDNA also rely on additional hydrogen bonding donor andacceptor groups existing in purines (A and G).
Cruciform DNA
Inverted repeats (IR), defined as a series of nucleotidesfollowed on the same strand by their complementary bases
and usually separated by a spacer, are required for cruci-form formation (Figure 1a). The term ‘complementary’ hererefers to the fact that A always pairs with T whereas Galways pairs with C in B-form DNA (so-called ‘Watson–Crick’ base-pairing).The IR symmetry makes it possible for the bases along
the same strand of DNA to pair with each other, ratherthan with the complementary strand, thereby giving riseto a cross-shaped structure (Figure 1a). In solution, twointerconvertible conformations have been observed: anextended conformation, in which each cruciform armoccupies the vertex of a simple tetrahedron, and a closedconformation, in which the two pairs of arms lie almostparallel to each other. The closed conformation is favouredat physiological salt concentrations because of the shield-ing of negatively charged phosphates. Thus, it is considereda good approximation of cruciform (and Holliday junc-tion) structures in vivo.
Triplex DNA (H-DNA)
Triplex DNA is a three-stranded structure in which a third(DNA or ribonucleic acid (RNA)) strand binds to B-DNAby occupying its wide major groove. Watson–Crickhydrogen bonds are not altered in H-DNA; rather, purinebases along one strand of duplex DNA engage theincoming third strand throughHoogsteen-hydrogenbondsusing available exocyclic groups not involved in Watson–Crick interactions. This third strand may originate from anearby sequence on the same molecule (intramoleculartriplex) or from separatemolecules (intermolecular triplex),comprising either RNA, DNA or exogenously added tri-plex-forming oligonucleotides (TFOs). Hence, triplexDNArequires a succession of purines on the same strandofDNA. Four types of Hoogsteen-hydrogen bonds typicallyoccur: A:T, A:A, G:C+ (C+, protonated base) and G:G(Figure 1b and c), which yield two types of triplexes: a YRYtype, inwhich the third strand is pyrimidine-rich (Figure 1c),and an RRY type, in which the third strand is purine-rich(Figure1b). In intramolecular triplexes, amirror repeat (MR)symmetry is required in duplex DNA, whereby bases fromone repeat separate from the complementary strand andfold back onto the second repeat to engage in Hoogsteenpairing. This reaction requires energy, which can be pro-vided by negative supercoiling. If it is the pyrimidine-richstrand that engages in Hoogsteen base-pairing (such asoccurs at low pH), the third strand runs parallel to thepurine-rich accepting strand; however, if it is the purine-rich strand that engages in Hoogsteen pairing (at neutralpH), the third strand runs antiparallel to the purine-richaccepting partner, as shown in Figure 1a.
Slipped/hairpin DNA
The reiteration of bases, such as CTGCTGCTG, or directrepeats (DR), provides the basis for the bulging out ofsmall loops in duplex DNA when, after separation ofthe complementary strands, the repeat array reanneals
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net2
out-of-register. Thus, slipped DNA occurs during DNAreplication and transcription, two processes that neces-sarily entail strand separation. An alternative source ofsmall loops in DNA (particularly at mononucleotide runs,such as AAA) originates from stuttering or skippingduring DNA replication. Slippage during DNA repli-cation is believed to lead to large expansions of triplet
repeat sequences, and hence represents an impor-tant mechanism of mutation associated with MRDs.In such cases, long looped-out sequences may fold intodouble helices (hairpins) stabilised by mismatches, suchas T.T base pairs (Zheng et al., 1996), in addition tothe Watson–Crick A.T and G.C base pairs along thehairpins.
anti
G
G
G
GGG
GG G
GG G
( YR YR )n
AAGAGGxGGAGAAT TCTCC C CTC T Ty
(R Y)n mirror
repeatsTriplex
Left-handedZ - DNA
Cruciform Invertedrepeats
TCGGTAGCCA
C AGA CG TCT G
xy
QuadruplexOligo (G)n
tractsAG3(T2AG3)3
Single strand
Slipped(hairpin)structure
Directrepeats
B-Z junctions
Name Conformation General seq.requirements
Sequence
C GTCGCGTG G TGG CAGCGCAC C AC
C TGTCGGTT GG ACAGCCAA C
(a)
(b)
(c) (d)
L
D
CR
syn
Figure 1 Non-B DNA structures formed by genomic repetitive sequences. (a) Most common non-B DNA conformations, ribbon models of helical foldings,
repetitive motifs requirement and example of sequences. Center dot, Watson–Crick hydrogen bond interactions; x,y, nucleotides in the spacer between
repeats; L, lateral loop; D, diagonal loop and CR, chain reversal loop. For cruciform DNA, an extended conformation is shown. For triplex DNA, a 3’ RRY
isomer is depicted in which the 3’-half of the purine-rich strand folds back to form the Hoogsteen-bound third strand. For quadruplex DNA, an idealised
structure is drawn to highlight the loop characteristics and the relative orientation of the syn and anti N-glycosidic configurations. (b) RRY base triplets
showing the Hoogsteen-bound base (left). Thymine can be incorporated into RRY triplexes due to the symmetry of the carbonyl groups. (c) YRY base triplets
showing the Hoogsteen-bound pyrimidine and the stabilisation afforded by cytosine protonation. (d) G-tetrad.
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 3
Quadruplex (tetraplex, G4) DNA
G4 DNA has received considerable attention duringthe past few years, in part because human telomeres,whose function is often dysregulated in cancer, are com-posed exclusively of hexameric (TTAGGG)n repeatsequences capable of forming this type of non-B DNAstructure. For quadruplexes to form, the sequence patternrequired comprises four closely spaced sets of 2–4Gs, suchas the 3-setGGG(n1)GGG(n2)GGG(n3)GGG,where n1, n2and n3 represent 1–7bases of anykind (loop). Thebasic unitin G4 DNA, a G-tetrad or G-quartet, comprises a planararray of four guanines (one from each of the four sets)connected to eachother throughHoogsteen-type hydrogenbonds (Figure 1d). A core of at least two (three in theexample shown in Figure 1a) G-quartets stack on top ofone another, stabilised by cation coordination (potassiumion is most effective) between any two stacks while theloops provide strand connectivity along the edges ofthe stacked G-quartets. A high degree of structural poly-morphism has been revealed, in which loops connect ina lateral, diagonal or chain reversal fashion, strands runparallel or antiparallel to one another, and guanine re-sidues may adopt either the syn or the anti N-glycosidicconformation (Neidle, 2009; Figure 1a). Thus, compet-ing conformations usually form in solution. The mostcommon isomer, as evidenced by nuclear magneticresonance (NMR) and X-ray crystallography, is repre-sented by the (3+1)mixed topology, characterised by onechain reversal and two lateral loops and by a three syn plusone anti-arrangement of guanines per G-tetrad (Neidle,2009).
Z-DNA
Z-DNA is unusual among the non-B DNA structures inthat strandedness is reversed by the rotation from the anti(in right-handed B-DNA) to the syn (in left-handed Z-DNA) conformation of every G residue within tracts ofalternating GY (Y, pyrimidine) sequences, such as (GC)nor (GC)m(GT)n. In sharp contradistinction to B-DNA,which possesses both amajor and aminor groove, Z-DNAis characterised by a single deep and narrow groove and anoverall tube-like shape (Gessner et al., 1989). X-ray crys-tallography has also shown that the base pairs located atthe junctions between the B- andZ-sections (B-Z junctions,Figure 1a) have the tendency to flip out of the double-helix(Ha et al., 2005), thereby providing a substrate for basemodification or cleavage (Burrows and Muller, 1998).See also: DNA Structure
Non-B DNA-forming Repeats andHuman Genes
Completion of the HGSP (Lander et al., 2001) has madepossible the search for repetitive non-B DNA sequences
with the goal not only of determining their genome-wide distribution but also of elucidating their putativefunctional role(s). The first such study (Warburton et al.,2004) reported that large IRs (4100 kb) are present mostlyon the sex chromosomes, with male-specific genes andgene families essential for male fertility located at sym-metrical positions along the IR arms (Table 1; Skaletskyet al., 2003). The maintenance of gene function, particu-larly for the Y-chromosome that lacks a homologuefor recombination, is believed to depend on the form-ation of large cruciform structures by IR sequences,which potentiate the correction of mutations and dou-ble-strand breaks by intrachromosomal recombinat-ion (Lange et al., 2009; Rozen et al., 2003; Zhao et al.,2010).Quadruplex-forming motifs are strongly over-repre-
sentednear (+ 500 bp) transcription start sites and at the 3’end of genes (1/4 to 1/3 of � 20 000 human genes) associ-atedwith regulatory functions, including proto-oncogenes,such as MYC, KRAS and KIT (Du et al., 2008; Huppertet al., 2008; Table 1). Genome-wide gene expression ana-lyses (Du et al., 2008; Fernando et al., 2009) support theconclusion that these sequences play an active role intranscriptional regulation through the formation ofquadruplex structures (Figure 2a).The first hint that long (41 kb) runs of homo-
purines.homopyrimidines might exist in the human gen-ome came from the sequencing of the polycystic kidneydisease 1 (PKD1) gene (Van Raay et al., 1996). A total of228 genes were subsequently found to contain such tracts(� 250 bases in length) in introns (Bacolla et al., 2006),and these genes disproportionately encoded proteins witha function in cell communication and synaptic transmis-sion of the nerve impulse (Table 1). These gene classes arealso enriched in GGAA, GAAA and GGGA tetranucleo-tide repeats, which have the capacity to form stabletriplexes and are characterised by unusually strongbase stacking. Although these types of genes are gene-rally weakly transcribed (Zhao et al., 2010; Figure 2b), ahigh proportion of them is preferentially expressed in thebrain. Thus, homopurines.homopyrimidines may playspecific roles in brain-associated gene function andregulation.The distribution of short tandem repeats (slipped/
hairpinDNA) in protein-coding sequences is dominatedbytriplet repeats encoding homopolymeric runs of spe-cific amino acids, such as polyglutamine, in transcri-ption factors and gene-regulatory proteins that bind DNAand RNA (Table 1; Bacolla et al., 2008). Homopolymericruns of amino acids are known to play critical roles inprotein–protein and protein–DNA/RNA interactions(Liu et al., 2006) and their number at specific loci appears toincrease as one progresses from the genomes of simplerspecies to the more complex. Thus, DNA slippage andhairpin/loop formation may have been exploited overevolutionary time as a means to acquire, or fine-tune,protein function. See also: Genetic Variation: Polymorph-isms and Mutations
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net4
Table 1 Non-BDNA-forming repeats and human genes.Most relevant repetitive DNA sequences associated with human genes
or gene classes
Gene/gene families in the largest (4100 kb) inverted repeats (IR)
Chromosome IR arm size (kb) Gene/gene class
Tissues with predominant
expression
Y palindrome P1 1450 DAZ Testes
Y palindrome P5 495.5 CDY Testes
Y palindrome P3 283.0 PRY Testes
Y palindrome P4 190.2 HSFY Testes
Xp11.22 142.2 GAGE-D2,3 Testes
Xq22.1 140.6 NXF2 Testes
Y palindrome P2 122.0 DAZ Testes
Xq13.1 119.3 DMRTC1 Testes, kidney, pancreas
11q14.3 103.9 RNF18 Testes, kidney, spleen
Purine:pyrimidine tracts in introns of genes
Gene category/function P-value
� 250 nt (228 genes) � 100 nt (1951 genes)
Ion channel activity 1.95� 10205 5.92� 10209
Protein binding 3.14� 10203 6.25� 10215
Glutamate receptor activity 6.11� 10204 1.92� 10207
Cell adhesion 1.11� 10204 3.36� 10212
Cell communication 2.19� 10204 5.24� 10215
Transmission of nerve impulse 1.83� 10204 5.24� 10208
Synapse 2.18� 10202 7.69� 10205
Alternative splicing ND 2� 10282
Chromosomal translocations ND 1� 10207
Tetranucleotide repeats (TR) in introns of genes
Gene category/function/attribute P-value
Localisation to the membrane 1� 10207–5� 10230 (Range for 10 gene groups
containing: groups 1–8, 8–15 TR
units; group 9, 16 and 17 TR units;
group 10, � 18 TR units; 190–1423
genes/group)
Ion channel 5� 10202–1� 10213
Cell adhesion 8� 10204–2� 10237
Alternative splicing 1� 10264 � 8TR units (4182 genes)
Chromosomal translocations 2� 10207 � 8TR units (4182 genes)
Micro/minisatellites (2–11 nt repeats) in cDNAs
Gene category/function P-value (coding plus noncoding exons) (2626 genes)
Transcription regulator activity 2.0� 10240
Regulation of cellular processes 2.3� 10238
Protein binding 2.0� 10233
Sequence-specific DNA binding 3.8� 10223
Nuclear localisation 9.3� 10222
RNA polymerase II transcription factor activity 1.2� 10216
Axon guidance 2.3� 10205
MAPK signalling pathway 2.1� 10204
WNT signalling pathway 2.4� 10204
(Continued )
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 5
DNA Structure, Phenotypic Variationand Genetic Disease
A number of studies have implicated the formation ofnon-B DNA conformations as a source of genomicrearrangements causing human genetic disease, includingFabry disease (Kornreich et al., 1990), mental retardation(Bonaglia et al., 2009; Rooms et al., 2007), ornithine tran-scarbamylase deficiency (Quental et al., 2009), uniparentaldisomy 14(mat) (Bena et al., 2010) and spermatogenicfailure (Repping et al., 2002) among others (reviewed inBacolla and Wells, 2004). As an example, Emanuel syn-drome (MIM #609029), characterised by severe mentalretardation, facial abnormalities and heart and kidneydefects, is a rare disease caused by the inheritance of asupernumerary der(22) chromosome from a parent carry-ing a constitutional translocation between chromosomes11 and 22 (t(11;22)(q23;q11)) (Figure 3). Cloning of thegenomic regions involved in the translocation revealed thatthe breakpoints typically occurred within narrow regionsonboth chr11 and chr22, at the centre of large (� 450bponchr11 and � 590 bp on chr22) IR structures comprisingalmost exclusively A and T bases. These recurring breaks,at the centre of specific IRs termed PATRR11 andPATRR22 (palindromic AT-rich regions) respectively, areconsistent with the formation of large cruciform structureson both chromosomes (Figure 3). This conclusion is sup-ported by a number of observations (Kurahashi et al.,2010). First, the PATRR22 sequence was shown to be bothpolymorphic and intrinsically unstable in the generalpopulation, such that deletions and duplications reducingor disrupting IR symmetry were commonly observed.Analyses of t(11;22) frequencies in sperm cells fromhealthyindividuals yielded an estimate of � 1.5� 1025 for the full-length IR chromosomes, but an � 10-fold reduction forthose chromosomes in which IR symmetry was disrupted,implying that cruciform structures were responsible forpromoting the translocation event. Second, fluorescence insitu hybridisation indicated that the 22q11 cluster wasinvolved in additional translocations, including 17q11,4q35.1, 1p21.2 and 8q24.1. In all cases, repetitive sequenceswith IR symmetry were detected on the partner
chromosome, with translocation frequencies decreasingwith decreasing length of the IR sequences. Thus, size (andhence stability) of cruciform structures correlates with thelikelihood of chromosomal breaks. Third, experiments inwhich two plasmids, one containing the PATRR11sequence and the other containing the PATRR22sequence, were transfected in human cells in culture dis-played recombination between PATRR11 and PATRR22.Taken together, these data support the conclusion thatlarge cruciforms are extruded from the PATRR11 andPATRR22 sequences, which are then cleaved at the centre,generating double-stranded broken ends that are sealed,yielding the der(22) and der(11) chromosomes (Figure 3).The number of identified human disorders associated
with gains (duplications) and losses (deletions) of geneticmaterial, types of mutation once believed to occur onlyrarely (Perry et al., 2008), has increased considerably overthe past few years (Zhang et al., 2010). The recent appli-cation of aCGH (Conrad et al., 2010; Perry et al., 2008) topatients afflicted with either single-gene disorders, such asthe CFTR gene associated with cystic fibrosis (Quemeneret al., 2010), or multigene disorders (Vissers et al., 2009;Zhang et al., 2010), indicates thatCNVscanalsobe involvedin these conditions. Additional genome-wide studies sup-port the conclusion that CNVs affect thousands of loci,contributing in all likelihood to the myriad phenotypic dif-ferences observed between individuals (Conrad et al., 2010;Perry et al., 2008). Bioinformatic analyses indicate that thedensity of repetitive DNA sequence motifs capable ofadopting non-BDNA conformations is significantly higheratCNVbreakpoints than thegenomeaverage, implying thatthe formation of local DNA secondary structure may rep-resent a common mechanism for CNV-mediated deletions,duplications and inversions (Conrad et al., 2010; Perry et al.,2008; Vissers et al., 2009).Gene conversion is a form of recombination that uses
homologous DNA sequences to repair a double-strandbreak (DSB) and involves the nonreciprocal transfer ofDNA from a ‘donor’ (the intact homologous sequence) toan ‘acceptor’ (the broken DNA end) sequence. Whensuch transfer inactivates functional genes, pathologicalconsequences can ensue. A comprehensive meta-analysis
Table 1 Continued
G-quadruplex in both 5’- and 3’-UTR
Gene category/function P-value
5’-UTR 3’-UTR
Guanyl-nt exchange factor activity 7.9� 10213 6.3� 10212
Rho guanyl-nt exchange factor activity 7.9� 10210 1.6� 10209
Regulation of Rho signal transduction 2.0� 10210 1.6� 10209
Transcription factor activity 6.3� 10205 3.2� 10210
Sequence-specific DNA binding 2.5� 10206 3.2� 10208
Note: ND, not determined.Source: With kind permission from Springer Science+Business Media (Zhao et al., 2010).
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net6
Ln gene expression levels
2 4 6 8 10 12
0
2
4
6
8
10
12
14
16
18
Control genesGenes with ≥18 Tetra units
(a)
(b)
9.3
PG4MD500-negative genes
PG4MD500-positive genes
9.29.1
98.98.8
8.7
8.6
8.58.4
8.38.2
8.1
87.9
7.87.7
7.6
Gen
e ex
pre
ssio
n le
vel
Ova
ryC
iliar
y ga
nglio
nA
trio
vent
ricul
ar n
ode
Skin
DRG
Glo
bus
pal
lidus
Cer
ebel
lum
Who
le b
rain
Panc
reas
Subt
hala
mic
nuc
leus
Trig
emin
al g
angl
ion
Sup
erio
r ce
rvic
al g
angl
ion
Test
is in
ters
titia
lU
teru
sKi
dney
Occ
ipita
l lob
eTr
ache
aA
dren
al g
land
Ap
pen
dix
Col
orec
tal a
deno
carc
inom
aSa
livar
y gl
and
Med
ulla
obl
onga
taFe
tal l
ung
Thym
usTe
stis
sem
inife
rous
tub
ule
Plac
enta
Pons
Tem
por
al lo
beC
ereb
ellu
m p
edun
cles
Test
is le
ydig
cel
lFe
tal l
iver
Cin
gula
te C
orte
xTo
ngue
Test
is g
erm
cel
lBM
-CD
71+
early
ery
thro
idLy
mp
hnod
eO
lfact
ory
bulb
Adr
enal
cor
tex
Am
ygda
laTo
nsil
Parie
tal l
obe
Leuk
emia
lym
pho
blas
tic (
mol
t4)
Cau
date
nuc
leus
Test
isA
dip
ocyt
eSk
elet
al m
uscl
eLe
ukem
ia c
hron
ic m
yelo
geno
usBo
ne m
arro
wLu
ngH
eart
Ute
rus
corp
usFe
tal b
rain
Leuk
emia
pro
mye
locy
tic(h
l60)
Smoo
th m
uscl
eFe
tal t
hyro
idW
hole
blo
odH
ypot
hala
mus
Live
rPr
osta
tePr
efro
ntal
cor
tex
Thal
amus
Pitu
itary
PB-C
D8+
T c
ells
Panc
reat
ic Is
lets
Bron
chia
l ep
ithel
ial c
ells
PB-B
DC
A4+
Den
triti
c _C
ells
PB-C
D14
+ M
onoc
ytes
PB-C
D4+
135
T c
ells
Spin
al c
ord
PB-C
D19
+ B
cells
BM-C
D10
5+ E
ndot
helia
lTh
yroi
dBM
-CD
33+
mye
loid
PB-C
D56
+NKC
ells
Car
diac
myo
cyte
s72
1-B-
lym
pho
blas
tsBu
rkitt
’s ly
mp
hom
a (R
aji)
Burk
itt’s
lym
pho
ma
(Dau
di)
BM-C
D34
+
Perc
ent
of g
ene
exp
ress
ion
data
Figure 2 Non-B DNA-forming repeats and genome-wide gene expression. (a) The gene expression profile of 13 237 nonredundant annotated human
(RefSeq) genes (y-axis) was examined in 79 tissues/cancer/cell types (x-axis) and the average values plotted for the 8124 genes that contained quadruplex-
forming repeats (filled squares) within +500 base pairs of the main transcription start site (TSS) and the remaining genes that did not contain such elements
within +500 base pairs of the main TSS (open squares). (b) The gene expression levels for 16 146 gene probes in the 70 human tissues/cell types included in
(a) was plotted as percent of data (y-axis) falling within specific intervals of gene expression (x-axis) for control genes (black) and for 190 genes (red) that
contained triplex-forming tetranucleotide repeats �72 base pairs long. With kind permission from Springer Science+Business Media (Zhao et al., 2010).
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 7
(Chuzhanova et al., 2009) of the DNA sequences flankingthe regions of DNA exchange in 27 known disease-associ-ated gene conversion events indicated that non-B DNA-forming sequences, particularly IRs, and recombination-promoting motifs occurred more frequently than expectedby chancealone.Hence, at least somegene conversion eventsmay be initiated by DSBs at sites of non-B DNA structure.
MRDs represent a growing class of pathological con-ditions, currently numbering 430, caused by the expan-sion of tandem repeats,mostly trinucleotide repeats,withinhuman genes (Table 2; Brouwer et al., 2009; Lopez Castelet al., 2010). In most cases, the formation of DNA sec-ondary structures, including slipped/hairpin DNA, quad-ruplex and triplex structures is believed to drive theexpansion process (Lopez Castel et al., 2010;Mirkin, 2007;Wells and Ashizawa, 2006). MRDs may be broadly clas-sified into two types: type 1, in which expansions are large(usually hundreds of copies of the repeat) and occur innonprotein-coding regions (untranslated and intronic) andtype 2, in which expansions are less severe but occur inprotein-coding regions thereby altering protein sequence.See also: Chromosomal Genetic Disease: StructuralAberrations; Segmental Duplications and Genetic Disease
Type 1 MRDs
In fragile X syndrome (FXS), a CGG repeat is expanded inthe 5’ untranslated (UTR) region of the FMR1 gene. InFriedreich ataxia (FA), expansions of a GAA repeat occurin the first intron of the frataxin (FRDA) gene, whereasmyotonic dystrophy (DM) is caused by the expansion ofeither a CTG repeat in the 3’-UTR of the DMPK gene(myotonic dystrophy type 1, DM1) or a CCTG repeat inthe first intron of theZNF9 gene (myotonic dystrophy type2, DM2).
In FXS and FA patients, repeat expansion is associatedwith histone markers specific for heterochromatin (i.e.nontranscribed DNA), including deacetylation of histonesH3 and H4 and methylation of histone H3 at lysine 9(H3K9me), supporting the notion that loss-of-functiondue to gene silencing is the likelymechanism responsible forthese diseases. In the case of DM,most of the pathogenesishas been recapitulated in mouse models by experimentsthat showRNAgain-of-function and the altered activity oftwo proteins, MBLN1 (muscleblind-like 1) and CUGBP1(CUG-binding protein 1) (Figure 4).MBNL1 is a key regulator of alternative splicing that
binds to double-stranded RNA hairpins, including thoseformed by CUG repeats (Lee and Cooper, 2009). In cellsfrom DM1 patients, most MBNL1 is found in discretenuclear foci where it is sequestered by expanded CUG (orCCUG) hairpins. Several transcripts are aberrantly splicedin DM1 mouse models, including Clcn1 (chloride channel1), Insr (insulin receptor) and Tnnt2 (cardiac troponin T).Thus, loss-of-function of MBLN1 mediated by long RNAhairpins is thought to contribute to DM pathology (Figure
4a). The second protein affected is CUGBP1, which bindssingle-stranded CUG repeats and becomes hyperpho-sphorylated by PKC (protein kinase C) on CUG/CCUGrepeat expansion through an unknown mechanism. Acti-vation of CUGBP1 exacerbates the splicing defects ofMBNL1 deficiency. It also increases the translation ofembryonic transcripts, such as MEF2A, and prevents thedecay of short-lived mRNAs, including c-FOS and TNFa(tumour necrosis factor alpha). The antagonistic activitiesof MBNL1 and CUGBP1 are critical for the shifts inalternative splicing that accompany the transition from theembryonic to the adult developmental stages; DM maytherefore entail a reversal of this developmental pattern(Figure 4b). RNA gain-of-function is also thought to
chr22
chr11 der(11)
der(22)
Translocation
PATR
R22
PATR
R11
Figure 3 Cruciform-mediated chromosomal t(11;22) translocation and the Emanuel syndrome. The PATRR sequences on human chromosome 11 (green)
and 22 (black) are proposed to fold into large cruciform structures at some frequency during gametogenesis and be cleaved at the single-stranded tips,
resulting in double-strand breaks (left insets). The broken chromosomal ends (middle) join aberrantly, yielding the derivative chromosomes der(11) and
der(22) (right). Occasional inheritance of der(22), in addition to a normal karyotype, is responsible for the Emanuel syndrome in the offspring.
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net8
underlie the pathology of fragile X-associated tremor/ataxia syndrome (FXTAS), affecting older patients carry-ing moderate CGG expansions in the frataxin gene, as wellas SCA8, SCA10 and SCA12 (Brouwer et al., 2009).
Type 2 MRDs
In type 2 diseases, triplet repeat expansions occur exclu-sively in coding exons andhence they alter protein sequenceby increasing the length of homopolymeric amino acidruns, typically polyglutamine and polyalanine (Table 2;Brouwer et al., 2009; Lopez Castel et al., 2010). Homo-polymeric amino acid runs contribute to ‘protein disorder’,a structural property that often plays a critical role inmediating protein–protein and protein–DNA inter-actions, particularly in protein families comprising tran-scription factors and regulators of transcription and DNAreplication (Uversky et al., 2009). Expansion beyond acritical threshold destabilises protein structure, causingaggregation or loss of the binding affinities with partnermolecules (Messaed and Rouleau, 2009).
The number of tandem repeats at a given locus is oftenpolymorphic in the general population (variable number oftandem repeats or VNTR), a process thatmay be driven byslipped hairpins or aberrant DNA replication. The pres-ence of a VNTR within an intron or the regulatory regionof a gene is often associated with a change in its tran-scription, thereby yielding functionally different alleles thatmay contribute to variable phenotypic traits (e.g. bloodpressure, heart rate and muscular tension) and suscepti-bility to multigenic diseases, such as brain disorders,obesity and stroke (Bacolla et al., 2008; Hannan, 2010).Thus, slipped DNA may also contribute to phenotypicvariability and susceptibility to disease.
In summary, the formation of aberrant DNA secondarystructure is the root cause of a number of quite differenthuman inherited diseases, where mutagenesis is elicited invarious ways: DNA breakage and stimulation of recom-bination leading to genomic rearrangements, altered RNAsecondary structure, mRNA codon changes resulting inimpaired protein function and altered transcriptionaffecting protein levels. See also: Trinucleotide RepeatExpansions: Disorders
Mechanisms Underlying Non-B DNAInduced Mutations
A large number of studies have been conducted in modelorganisms, including bacteria, fly, yeast, mammalian cellculture andmice to address themechanisms responsible fornon-BDNA-induced genetic instability, particularly in thecontext of repeat expansion related to MRDs. Knockingdown the expression of proteins involved in DNA repli-cation, repair and recombination affected repeat stabilityby either increasing or decreasing instability (depending onthe specific enzyme), suggesting that these biochemical
processes are all potentially involved (Lopez Castel et al.,2010; Mirkin, 2007; Wells and Ashizawa, 2006). Despitethese important advances, the mechanisms restricting thetiming of repeat instability and genomic rearrangements toearly development remain elusive. In transgenic mice, thepresence ofMsh2 andMsh3, two proteins of the mismatchrepair pathway, was found to be critical for elicitingmicrosatellite repeat expansion during all stages of devel-opment (Figure 5; Dion andWilson, 2009). A role restrictedto adult somatic tissues has been noted for ataxia talan-giectasia and Rad3 related (Atr), post-meiotic segregationincreased 2 (Pms2) and the Ogg1 glycosylase. Atr, Pms2and Ogg1 are involved in DNA damage responses andDNA repair. Thus, both DNA damage and DNA repairmay play a critical role in eliciting non-B DNA-inducedgenetic instability, at least in the context of MRDs (Dionand Wilson, 2009).Concerning genomic rearrangements, nonhomologous
end-joining (NHEJ), a DSB repair pathway that can beerror-prone is believed to act on DSBs induced at sites ofnon-B DNA structures (Kha et al., 2010). These DSBs arelikely to result from the exposure, on non-B DNA for-mation, of single-stranded nucleotides to oxidative damageby endogenous reactive oxygen species or the activity ofsingle-stranded-specific endonucleases. See also: Trinu-cleotide Repeat Expansions: Mechanisms and DiseaseAssociations
Prospects for the Future
The relevance of unusual or aberrant DNA structures tohuman genetic disease has extended our interest in this areafrom the narrow confines of the nucleic acid biochemistry/physicochemical laboratory to a much wider communityoperating in medicine, clinical diagnostics, genetics andbioinformatics, thereby propelling the field into main-stream biomedical research. Paradoxically, the transientnature of DNA secondary structures remains a challengefor the development ofmethods aimed at identifying non-BDNA structures in living human cells.The advent of aCGH is likely to reveal more instances of
the co-localisation of repetitive DNA with CNVs, both inthe context of phenotypic variation, as well as in the sphereof susceptibility to infectious disease, cancer and inheriteddisease. As more information is acquired from mousemodels on the timing of repeat instability during develop-ment and its relationships with chromatin remodelling,CpG methylation-linked epigenetic reprogramming, andDNA damage and repair, experiments can be designed toaddress the biochemical pathways involved in non-B-dir-ected genetic instability.Considerable effort is currently being invested in
sequencing cancer genomes. This information will be cru-cial to address the question as to whether non-B DNA isalso responsible for at least some of the genomicrearrangements associated with cancer. The formation oftriplex DNA through the delivery of triplex-forming
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 9
Table 2 Microsatellite repeat diseases (MRDs). Type 1 and type 2 microsatellite repeat diseases
Disease Gene
Chromosome
location Amplet
Normal copy
length or
number
Expanded
copy length or
number Location
Disease
symbol
Type 1 diseases
FRAXE-associated mental
retardation
AFF2 Xq28 CCG 4–39 4200 5’-UTR FRAXE
Spinocerebellar ataxia 10 ATXN10 22q13.31 ATTCT 529 4800 Intron 1 SCA10
Spinocerebellar ataxia 8 ATXN8OS 13q21 CTG 15–50 110–130 3’-UTR SCA8
Jacobsen syndrome CBL2 11q23.3 CCG 11 700–800 ? FRA11B
Myotonic dystrophy type 2 CNBP 3q21 CCTG 104–176 bp 75–11 000 Intron 1 DM2
Epilepsy progressive myoclonic CSTB 21q2.3 CCCCGCCCCGCG 2–3 30–75 Promoter EPM1
Mental retardation DIP2B 12q13.13 CGG 6–23 250–285 5’-UTR FRA12A
Myotonic dystrophy type 1 DMPK 19q13.32 CTG 530 50–2000 3’-UTR DM1
Fragile X mental retardation
syndrome
FMR1 Xq27.3 CGG 5–52 4200 5’-UTR FXS
Fragile X-associated tremor
ataxia syndrome
FMR1 Xq27.3 CGG 555 55–200 5’-UTR FXTAS
Friedreich ataxia FXN 9q21.11 GAA 7–20 200–900 Intron 1 FA
Spinocerebellar ataxia 12 PPP2R2B 5q32 CAG 7–31 55–78 5’-UTR SCA12
Cataract formation in myotonic
dystrophy
SIX5 19q13.32 CTG 5–37 � 50 Promoter SIX5
Spinocerebellar ataxia 31 TK2/BEAN 16q22 TGGAA 0 110 Intron SCA31
No
n-B
DN
AStru
cture
and
Mu
tation
sC
ausin
gH
um
anG
enetic
Disease
EN
CYC
LOPED
IAO
FLIFE
SC
IEN
CES&
2010,Jo
hn
Wile
y&
Son
s,Ltd
.w
ww
.els.n
et
10
Type 2 diseases
Spino-bulbar muscular atrophy
(Kennedy disease)
AR Xq12 CAG 17–26 40–52 Coding region SBMA
Mental retardation and epilepsy ARX Xp21.3 GCG 10–16 17–33 Exon 1 XLMR
Dentatorubro-pallidoluysian
atrophy (Haw river fever)
ATN1 12p13.31 CAG 3–36 48–93 Coding region DRPLA
Spinocerebellar ataxia 1 ATXN1 6p22.3 CAG 6–44 37–91 Coding region SCA1
Spinocerebellar ataxia 2 ATXN2 12q24.12 CAG 14–31 32–500 Coding region SCA2
Machado–Joseph disease ATXN3 14q32.12 CAG 13–47 53–86 Coding region SCA3
Spinocerebellar ataxia 7 ATXN7 3p14.1 CAG 7–35 36–300 Coding region SCA7
Spinocerebellar ataxia 6 CACNA1A 19p13.13 CAG 4–18 19–33 Coding region SCA6
Blepharophimosis syndrome
and premature ovarian failure 3
FOXL2 3q22.3 GCN 14 22–24 Coding region BPES
Hand-foot-genital syndrome HOXA13 7p15.2 GCN 14–18 22–30 Coding region HFGS
Brachydactyly syndactyly
syndrome
HOXD13 2q31.1 GCN 15 22–29 Coding region SPD
Huntington disease HTT 4p16.3–4p16.2 CAG 535 40–400 Coding region HD
Huntington disease-like 2 JPH3 16q24.2 CTG 6–27 44–57 Exon 1 HDL2
Oculopharyngeal muscular
dystrophy
PABPN1 14q11.2 GCG 6 7–13 Coding region OPMD
Congenital central
hypoventilation syndrome
PHOX2B 4p13 GCN 20 25–33 Coding region CCHS
Cleidocranial dysplasia RUNX2 6p12.3 GCK 17 27 Coding region CCD
Spinocerebellar ataxia 17 TBP 6q27 CAG 25–42 45–63 Coding region SCA17
Holoprosencephaly ZIC2 13q32.3 GCN 15 25 Coding region HPA
Source: With kind permission from Springer Science+Business Media (Zhao et al., 2010).
No
n-B
DN
AStru
cture
and
Mu
tation
sC
ausin
gH
um
anG
enetic
Disease
EN
CYC
LOPED
IAO
FLIFE
SC
IEN
CES&
2010,Jo
hn
Wile
y&
Son
s,Ltd
.w
ww
.els.n
et
11
oligonucleotides (TFOs) is being pursued with the aim ofknocking-down gene expression or directly inactivatingtarget genes (Jain et al., 2008). This technology has thepotential to complement the arsenal of tools currently usedto combat cancer. Finally, studies aimed at developingreagents that can identify non-B DNA structures in cellsare ongoing. Major advances are therefore to be expectedin the coming years. See also: Chromosome 22; Compara-tive Genetics of Trinucleotide Repeats in the Human and
ApeGenomes; DNAMismatch Repair: Eukaryotic; DNARecombination; DNAStructure: A-, B- and Z-DNAHelixFamilies; DNA Structure: Sequence Effects; DNA Topol-ogy:Fundamentals; EukaryoticRecombination: Initiationby Double-strand Breaks; Human-specific Changes ofGenome Structure; Microdeletions and Microduplica-tions: Mechanism; Microsatellites; Microsatellite Instabil-ity; Nucleic Acids: General Properties; RecombinationalDNA Repair in Eukaryotes; Relevance of Copy Number
MBNL1
CUGBP1
Normal(adult)
DM1(embryonic)
Nucleus
STOP
CIC-1
5cTNT
IR Insulinresistance
Myotonia
Cardiacdefects ?
CIC-1
cTNT
11IR
PCUGBP1
(b)
CUG
CUG
CUG
CUG
CUG
CUG
CUG
CUGCUGCUGCUGCUG
CU
G
DMPK 3′-UTR
MBNL1
Alternativesplicing
Sequestration
?PKC
activation
PCUGBP1
PCUGBP 1
Increasedsteady state
level
Alternativesplicing
mRNAtranslation
mRNAdecay
MBNL1 loss-of-function CUGBP1 gain-of-function
(a)
Figure 4 Triplet repeat expansion alters mRNA function. (a) In DM1, CTG expansion in the 3’-UTR of the DMPK gene causes the ensuing mRNA to fold into
a large and stable double-stranded hairpin stabilised by U.U and G.C base pairs, which recruits muscleblind-like (Drosophila) (MBNL1), a mediator of pre-
mRNA alternative splicing regulation. CUG-hairpins also stimulate CUG RNA-binding protein 1 (CUGBP1) hyperphosphorylation and stabilisation, which
alter several events related to alternative splicing, mRNA tanslation and mRNA decay. (b) Sequestration of MBNL1 and CUGBP1 activation shift alternative
splicing programs from the adult stage towards embryonic-specific patterns, including activation of exon 5 inclusion of cardiac isoforms of TNNT2 (cTNT)
during heart remodeling, exclusion of exon 11 in the insulin receptor (IR) pre-mRNA and inclusion of stop-containing exon in chloride channel 1 transcripts.
Adapted from Lee and Cooper (2009) with kind permission by Portland Press Ltd. Copyright & the Biochemical Society.
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net12
Variation to Human Genetic Disease; Simple Repeats;Structural Diversity of the Human Genome and DiseaseSusceptibility; Telomeric and Subtelomeric RepeatSequences;TrinucleotideRepeatExpansions:Mechanismsand Disease Associations
Acknowledgements
This work was supported by NIH/NCI Grant CA093729to Karen M Vasquez and in part with federal funds fromthe National Cancer Institute, National Institutes ofHealth, under Contract No. HHSN261200800001E toAlbino Bacolla. The content of this publication does notnecessarily reflect the viewsof policies of theDepartment ofHealth and Human Services, nor does mention of tradenames, commercial products or organisations implyendorsement by the U.S. Government.
References
Bacolla A, Collins JR, Gold B et al. (2006) Long homo-
purine�homopyrimidine sequences are characteristic of genes
expressed in brain and the pseudoautosomal region. Nucleic
Acids Research 34: 2663–2675.
Bacolla A, Jaworski A, Larson JE et al. (2004) Breakpoints of
gross deletions coincide with non-B DNA conformations.
Proceedings of the National Academy of Sciences of the USA
101: 14162–14167.
Bacolla A, Larson JE and Collins JR (2008) Abundance and
length of simple repeats in vertebrate genomes are determined
by their structural properties.Genome Research 18: 1545–1553.
Bacolla A and Wells RD (2004) Non-B DNA conformations,
genomic rearrangements, and human disease. Journal of Bio-
logical Chemistry 279: 47411–47414.
BenaF,Gimelli S,Migliavacca E et al. (2010) A recurrent 14q32.2
microdeletion mediated by expanded TGG repeats. Human
Molecular Genetics 19: 1967–1973.
Bonaglia MC, Giorda R, Massagli A et al. (2009) A familial
inverted duplication/deletion of 2p25.1-25.3 provides new clues
on the genesis of inverted duplications. European Journal of
Human Genetics 17: 179–186.
Brouwer JR, Willemsen R and Oostra BA (2009) Microsatellite
repeat instability and neurological disease. BioEssays 31: 71–83.
Burrows CJ and Muller JG (1998) Oxidative nucleobase modifi-
cations leading to strand scission. Chemical Reviews 98: 1109–
1152.
ChristopheD,Cabrer B, BacollaA et al. (1985)Anunusually long
poly(purine)-poly(pyrimidine) sequence is located upstream
from the human thyroglobulin gene.Nucleic Acids Research 13:
5127–5144.
ChuzhanovaN,Chen JM,BacollaA et al. (2009)Gene conversion
causing human inherited disease: evidence for involvement of
non-B-DNA-forming sequences and recombination-pro-
moting motifs in DNA breakage and repair. Human Mutation
30: 1189–1198.
ConradDF,PintoD,RedonR et al. (2010)Origins and functional
impact of copy number variation in the human genome.Nature
464: 704–712.
Dion V andWilson JH (2009) Instability and chromatin structure
of expanded trinucleotide repeats. Trends in Genetics 25:
288–297.
Du Z, Zhao Y and Li N (2008) Genome-wide analysis reveals
regulatory role of G4 DNA in gene transcription. Genome
Research 18: 233–241.
Felsenfeld G and Rich A (1957) Studies on the formation of two-
and three-stranded polyribonucleotides. Biochimica et Biophy-
sica Acta 26: 457–468.
FernandoH, Sewitz S,Darot J et al. (2009)Genome-wide analysis
of a G-quadruplex-specific single-chain antibody that regulates
gene expression. Nucleic Acids Research 37: 6716–6722.
Gessner RV, Frederick CA, Quigley GJ, Rich A and Wang AH
(1989) The molecular structure of the left-handed Z-DNA
double helix at 1.0-A atomic resolution. Geometry, conform-
ation, and ionic interactions of d(CGCGCG). Journal of Bio-
logical Chemistry 264: 7921–7935.
Glickman BW and Ripley LS (1984) Structural intermediates
of deletion mutagenesis: a role for palindromic DNA. Pro-
ceedings of the National Academy of Sciences of the USA 81:
512–516.
Ha SC, Lowenhaupt K, Rich A, Kim YG and Kim KK (2005)
Crystal structure of a junction between B-DNA and Z-DNA
reveals two extruded bases. Nature 437: 1183–1186.
HannanAJ (2010) Tandem repeat polymorphisms:modulators of
disease susceptibility and candidates for ‘missing heritability’.
Trends in Genetics 26: 59–65.
Huppert JL, Bugaut A, Kumari S and Balasubramanian S (2008)
G-quadruplexes: the beginning and end ofUTRs.Nucleic Acids
Research 36: 6260–6268.
Pms2
Ogg1
Somatic
Msh2
Msh3
AtrAtr
Dnmt1
Germline
Msh2
Msh3
Embryo
Msh2
Msh3
Figure 5 Modulators of triplet repeat expansion in mouse models of
MRDs. The mismatch repair proteins Msh2 and Msh3 are required for
triplet repeat instability throughout all developmental stages. DNA
cytosine-5-methyltransferase 1 (Dnmt1) protects against expansions in an
expanded CAG model in the germline. Ataxia talangiectasia and Rad3
related (Atr) protein prevent expansion in the female germline and in
somatic tissues of a CGG repeat mouse model. Post-meiotic segregation
increased 2 (Pms2) and 8-oxoguanine glycosylase (Ogg1) are selectively
involved in age-dependent somatic instability. Reproduced from Dion and
Wilson (2009) with permission from Elsevier.
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 13
Jain A, Wang G and Vasquez KM (2008) DNA triple helices:
biological consequences and therapeutic potential. Biochimie
90: 1117–1130.
Kha DT, Wang G, Natrajan N, Harrison L and Vasquez KM
(2010) Pathways for double-strand break repair in genetically
unstable Z-DNA-forming sequences. Journal of Molecular
Biology 398: 471–480.
Kornreich R, Bishop DF and Desnick RJ (1990) Alpha-gala-
ctosidase A gene rearrangements causing Fabry disease. Iden-
tification of short direct repeats at breakpoints in an Alu-rich
gene. Journal of Biological Chemistry 265: 9319–9326.
Kurahashi H and Emanuel BS (2001) Long AT-rich palindromes
and the constitutional t(11;22) breakpoint. Human Molecular
Genetics 10: 2605–2617.
Kurahashi H, Inagaki H, Ohye T et al. (2010) The constitutional
t(11;22): implications for a novel mechanism responsible for
gross chromosomal rearrangements. Clinical Genetics. [Epub
ahead of print; doi: 10.1111/j.1399-0004.2010.01445.x].
Lander ES, Linton LM, Birren B et al. (2001) Initial sequencing
and analysis of the human genome. Nature 409: 860–921.
Lange J, Skaletsky H, van Daalen SK et al. (2009) Isodicentric
Y chromosomes and sex disorders as byproducts of homolo-
gous recombination that maintains palindromes. Cell 138:
855–869.
Lee JE and Cooper TA (2009) Pathogenic mechanisms of
myotonic dystrophy. Biochemical Society Transactions 37:
1281–1286.
Liu J, Perumal NB, Oldfield CJ et al. (2006) Intrinsic disorder in
transcription factors. Biochemistry 45: 6873–6888.
Lopez Castel A, Cleary JD and Pearson CE (2010) Repeat
instability as the basis for human diseases and as a potential
target for therapy. Nature Reviews. Molecular Cell Biology 11:
165–170.
Lyamichev VI, Mirkin SM and Frank-Kamenetskii MD (1985)
A pH-dependent structural transition in the homopurine-
homopyrimidine tract in superhelical DNA. Journal of Bio-
molecular Structure and Dynamics 3: 327–338.
Messaed C and Rouleau GA (2009) Molecular mechanisms
underlying polyalanine diseases. Neurobiology of Disease 34:
397–405.
Mirkin SM (2007) Expandable DNA repeats and human disease.
Nature 447: 932–940.
Neidle S (2009) The structures of quadruplex nucleic acids and
their drug complexes. Current Opinion in Structural Biology 19:
239–250.
PerryGH, Ben-Dor A, TsalenkoA et al. (2008) The fine-scale and
complex architecture of human copy-number variation.
American Journal of Human Genetics 82: 685–695.
Quemener S, Chen JM, Chuzhanova N et al. (2010) Complete
ascertainment of intragenic copy numbermutations (CNMs) in
the CFTR gene and its implications for CNM formation at
other autosomal loci. Human Mutation 31: 421–428.
Quental R, Azevedo L, Rubio V, Diogo L and Amorim A (2009)
Molecular mechanisms underlying large genomic deletions in
ornithine transcarbamylase (OTC) gene. Clinical Genetics 75:
457–464.
Repping S, Skaletsky H, Lange J et al. (2002) Recombination
between palindromes P5 and P1 on the human Y chromosome
causes massive deletions and spermatogenic failure. American
Journal of Human Genetics 71: 906–922.
Rooms L, Reyniers E and Kooy RF (2007) Diverse chromosome
breakage mechanisms underlie subtelomeric rearrangements, a
common cause of mental retardation. Human Mutation 28:
177–182.
Rozen S, Skaletsky H,Marszalek JD et al. (2003) Abundant gene
conversion between arms of palindromes in human and ape Y
chromosomes. Nature 423: 873–876.
Skaletsky H, Kuroda-Kawaguchi T, Minx PJ et al. (2003) The
male-specific region of the humanY chromosome is amosaic of
discrete sequence classes. Nature 423: 825–837.
Uversky VN, Oldfield CJ, Midic U et al. (2009) Unfoldomics of
human diseases: linking protein intrinsic disorder with diseases.
BMC Genomics 10(suppl. 1): S7.
Van Raay TJ, Burn TC, Connors TD et al. (1996) A 2.5 kb
polypyrimidine tract in the PKD1 gene contains at least 23 H-
DNA-forming sequences.Microbial andComparativeGenomics
1: 317–327.
Vissers LE, Bhatt SS, Janssen IM et al. (2009) Rare pathogenic
microdeletions and tandem duplications are microhomology-
mediated and stimulated by local genomic architecture.Human
Molecular Genetics 18: 3579–3593.
Wang G and Vasquez KM (2004) Naturally occurring H-DNA-
forming sequences are mutagenic in mammalian cells. Pro-
ceedings of the National Academy of Sciences of the USA 101:
13448–13453.
Warburton PE, Giordano J, Cheung F, Gelfand Y and Benson G
(2004) Inverted repeat structure of the human genome: the
X-chromosome contains a preponderance of large, highly
homologous inverted repeats that contain testes genes.Genome
Research 14: 1861–1869.
Watson JD and Crick FH (1953) Molecular structure of nucleic
acids; a structure for deoxyribose nucleic acid. Nature 171:
737–738.
Wells RD and Ashizawa T (2006) Genetic Instabilities and
Neurological Diseases, 2nd edn. San Diego, CA: Elsevier/Aca-
demic Press.
Wells RD, Collier DA, Hanvey JC, Shimizu M and Wohlrab F
(1988) The chemistry and biology of unusual DNA structures
adopted by oligopurine.oligopyrimidine sequences. FASEB
Journal 2: 2939–2949.
Zhang F, Seeman P, Liu P et al. (2010) Mechanisms for non-
recurrent genomic rearrangements associated with CMT1A or
HNPP: rare CNVs as a cause for missing heritability.American
Journal of Human Genetics 86: 892–903.
Zhao J, BacollaA,WangGandVasquezKM(2010)Non-BDNA
structure-induced genetic instability and evolution.Cellular and
Molecular Life Sciences 67: 43–62.
Zheng M, Huang X, Smith GK, Yang X and Gao X (1996)
Genetically unstableCXG repeats are structurally dynamic and
have a high propensity for folding. An NMR and UV spec-
troscopic study. Journal of Molecular Biology 264: 323–336.
Further Reading
Balasubramanian S and Neidle S (2009) G-quadruplex nucleic
acids as therapeutic targets. Current Opinion in Chemical Biol-
ogy 13: 345–353.
D’Angelo CS, Gajecka M, Kim CA et al. (2009) Further delin-
eation of nonhomologous-based recombination and evidence
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net14
for subtelomeric segmental duplications in 1p36 rearrange-
ments. Human Genetics 125: 551–563.
Karlin S, Brocchieri L, Bergman A, Mrazek J and Gentles AJ
(2002) Amino acid runs in eukaryotic proteomes and disease
associations.Proceedings of theNationalAcademyofSciences of
the USA 99: 333–338.
Orr HT and Zoghbi HY (2007) Trinucleotide repeat disorders.
Annual Review of Neuroscience 30: 575–621.
Rich A and Zhang S (2003) Timeline: Z-DNA: the long road to
biological function. Nature Reviews. Genetics 4: 566–572.
Sinden RR (1994) DNA Structure and Function. San Diego:
Academic Press.
Stankiewicz P and Lupski JR (2010) Structural variation in the
human genome and its role in disease. Annual Review of Medi-
cine 61: 437–455.
Wang G, Christensen LA and Vasquez KM (2006) Z-DNA-
forming sequences generate large-scale deletions inmammalian
cells. Proceedings of the National Academy of Sciences of the
USA 103: 2677–2682.
Wang G and Vasquez KM (2009) Models for chromosomal rep-
lication-independent non-B DNA structure-induced genetic
instability. Molecular Carcinogenesis 48: 286–298.
Non-B DNA Structure and Mutations Causing Human Genetic Disease
ENCYCLOPEDIA OF LIFE SCIENCES & 2010, John Wiley & Sons, Ltd. www.els.net 15