copy number variation

33
Copy Number Variation in Human Health, Disease, and Evolution Feng Zhang, 1 Wenli Gu, 1,5 Matthew E. Hurles, 2 and James R. Lupski 1,3,4 1 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030 2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom 3 Department of Pediatrics, Baylor College of Medicine, Houston, Texas 77030 4 Texas Children’s Hospital, Houston, Texas 77030 5 Institute of Human Genetics, Ludwig-Maximilians-University, School of Medicine, Munich 80336, Germany; email: [email protected] Annu. Rev. Genomics Hum. Genet. 2009. 10:451–81 The Annual Review of Genomics and Human Genetics is online at genom.annualreviews.org This article’s doi: 10.1146/annurev.genom.9.081307.164217 Copyright c 2009 by Annual Reviews. All rights reserved 1527-8204/09/0922-0451$20.00 Key Words FoSTeS, genomic disorder, genomotype/phenotype correlations, MMBIR, NAHR, NHEJ Abstract Copy number variation (CNV) is a source of genetic diversity in hu- mans. Numerous CNVs are being identified with various genome analysis platforms, including array comparative genomic hybridiza- tion (aCGH), single nucleotide polymorphism (SNP) genotyping plat- forms, and next-generation sequencing. CNV formation occurs by both recombination-based and replication-based mechanisms and de novo locus-specific mutation rates appear much higher for CNVs than for SNPs. By various molecular mechanisms, including gene dosage, gene disruption, gene fusion, position effects, etc., CNVs can cause Mendelian or sporadic traits, or be associated with complex diseases. However, CNV can also represent benign polymorphic variants. CNVs, especially gene duplication and exon shuffling, can be a predominant mechanism driving gene and genome evolution. 451 Annu. Rev. Genom. Human Genet. 2009.10:451-481. Downloaded from arjournals.annualreviews.org by Indiana University - Purdue University Indianapolis - School of Medicine on 04/19/10. For personal use only.

Upload: maria-ileana-leon

Post on 20-Apr-2017

216 views

Category:

Documents


0 download

TRANSCRIPT

ANRV386-GG10-21 ARI 19 August 2009 20:44

Copy Number Variationin Human Health, Disease,and EvolutionFeng Zhang,1 Wenli Gu,1,5 Matthew E. Hurles,2

and James R. Lupski1,3,4

1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston,Texas 770302Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB101SA, United Kingdom3Department of Pediatrics, Baylor College of Medicine, Houston, Texas 770304Texas Children’s Hospital, Houston, Texas 770305Institute of Human Genetics, Ludwig-Maximilians-University, School of Medicine,Munich 80336, Germany; email: [email protected]

Annu. Rev. Genomics Hum. Genet. 2009.10:451–81

The Annual Review of Genomics and Human Geneticsis online at genom.annualreviews.org

This article’s doi:10.1146/annurev.genom.9.081307.164217

Copyright c© 2009 by Annual Reviews.All rights reserved

1527-8204/09/0922-0451$20.00

Key Words

FoSTeS, genomic disorder, genomotype/phenotype correlations,MMBIR, NAHR, NHEJ

AbstractCopy number variation (CNV) is a source of genetic diversity in hu-mans. Numerous CNVs are being identified with various genomeanalysis platforms, including array comparative genomic hybridiza-tion (aCGH), single nucleotide polymorphism (SNP) genotyping plat-forms, and next-generation sequencing. CNV formation occurs byboth recombination-based and replication-based mechanisms and denovo locus-specific mutation rates appear much higher for CNVs thanfor SNPs. By various molecular mechanisms, including gene dosage,gene disruption, gene fusion, position effects, etc., CNVs can causeMendelian or sporadic traits, or be associated with complex diseases.However, CNV can also represent benign polymorphic variants. CNVs,especially gene duplication and exon shuffling, can be a predominantmechanism driving gene and genome evolution.

451

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

CNV: copy numbervariation

SV: structuralvariation

NAHR: nonallelichomologousrecombination

NHEJ:nonhomologousend-joining

FoSTeS: fork stallingand template switching

INTRODUCTION

The tremendous variability of human genomeshighlights the almost absurd notion of a singlereference human genome sequence. Watson-Crick base-pair changes have long been wellknown; those with frequency >1% are referredto as single nucleotide polymorphisms (SNPs).Millions of SNPs have been revealed in humanpopulations, and many more are being discov-ered with the determination of each personalgenome sequence (Table 1) (10, 81, 159, 174,177). Although numerous rare point mutationsare known to cause sporadic disease, and SNPsare associated with common human diseases(http://www.genome.gov/gwastudies/), fewgene rearrangements had been associated withdisease traits until relatively recently. For ex-ample, it was not until the early 1990s that rel-atively large (>1 Mb) but submicroscopic ge-nomic duplications and deletions causing geneCNV (copy number variation) were shown tocause Mendelian traits (16, 97, 99). CNVs likeSNPs can also represent benign polymorphicvariants present in >1% of the population.However, the extent to which large duplica-tion and deletion CNVs contribute to humangenetic diversity, and may or may not conveyphenotypes, is still being unraveled.

In 2004, with the advent of genome-wideanalysis tools that could be used to interrogateDNA content, two studies revealed that copynumber variations (CNVs) [i.e., DNA segmentsthat present at variable copy number in com-parison to a reference genome with the usual

Table 1 Copy number variation (CNV) versus single nucleotide polymorphism (SNP)

CNV (Database of Genomic Variants,http://projects.tcag.ca/variation/) SNP (dbSNP, http://www.ncbi.nlm.nih.gov/SNP/)

Total number 38,406a (Mar 11, 2009) 14,708,752 (Build 129)Size 100 bp to 3 Mb Mostly 1 bpType Deletion, duplication, complex Transition, transversion, short deletion, short insertionEffects on genes Gene dosage, interruption, etc. Missense, nonsense, frameshift, splice sitePercentage of the referencegenome covered

29.74%b <1%

aIncluding inversions.bThis value may be overestimated owing to the resolution limitation of the technologies (e.g., BAC aCGH) used in CNV screening but also could beunderestimated because of the inability to resolve smaller CNVs (1- to 20-kb range) and the limitations of the current reference genome.

copy number of N = 2 (35)] are widespreadin human genomes and represent a significantsource of genetic variation (57, 138). With theaid of genome-wide technologies of higher res-olution, over 38,000 CNVs (>100 bp in size)and many other structural variations (SVs, in-cluding balanced inversions and translocations)have been reported (Table 1). In terms of to-tal bases involved, SVs may account for moredifferences among individuals than do SNPs(132). In addition, CNVs appear to have a muchhigher de novo locus-specific mutation ratethan SNPs based on estimations from CNV-associated disease prevalence (92) and observa-tions from PCR assays of pooled sperm DNA(166).

How has the human genome come tohave such a great amount of CNV? Tworecombination-based mechanisms, i.e., nonal-lelic homologous recombination (NAHR) andnonhomologous end-joining (NHEJ) (98), andretrotransposition (63, 64, 68, 181), have beenimplicated in genomic rearrangements andthe formation of CNVs. Recently, a novelreplication-based mechanism, fork stalling andtemplate switching (FoSTeS), has been pro-posed to account for the observed complex ge-nomic rearrangements that cannot be readilyexplained by NAHR, NHEJ, or retrotransposi-tion (78). Notably, breakpoint sequencing dataalso suggest that a portion of CNV likely occursby a mechanism consistent with FoSTeS (125).

In addition to their association with spo-radic and Mendelian diseases in humans, CNVs

452 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

have also recently been shown to be associatedwith human complex traits such as susceptibil-ity to HIV infection, autism, and schizophrenia(Table 2). Since CNV can encompass part orall of a gene, or be a genomic segment contain-ing several genes, some CNVs are likely to havea role in the alteration of human physiologi-cal functions, Thus, as well as causing disease,human-specific CNVs may be responsible forthe emergence of advantageous human-specifictraits, such as cognition and endurance running(24, 91). They therefore are likely to be subjectto evolutionary pressures such as selection aswell as genetic drift (119).

COPY NUMBER VARIATION ANDSTRUCTURAL VARIATION: THEPHENOMENON

In 2004, two studies reported that CNVs ofmany large DNA genomic segments exist be-tween normal human individuals. Sebat et al.(138) employed a technology termed ROMA(representational oligonucleotide microarrayanalysis) with 85,000 interrogating probes withan average spacing of 35 kb to study the large-scale (>100-kb) copy number differences be-tween 20 normal individuals. In total, 221 copynumber changes were detected at 76 CNVloci. Using a BAC (bacterial artificial chro-mosome) CGH array (127) with resolution of∼1 Mb, Iafrate et al. (57) investigated large-scale CNVs in 55 unrelated individuals andidentified 255 clones with copy number gainor loss. Interestingly, 102 (41%) of these copynumber changes are observed in more than oneindividual (57), which suggests that these vari-ations are not rare but may represent polymor-phic variations. Furthermore, examination ofthe genomic content of such CNVs revealedthat these genomic regions are not “junk” butinclude many functional genes involved in reg-ulation of cell growth and metabolism (57), thusimplicating CNVs in human traits, disease, andevolution.

These two early studies provided evidencefor a more expanded view of human ge-netic variation. However, owing to the limited

ROMA:representationaloligonucleotidemicroarray analysis

BAC: bacterialartificial chromosome

DGV: Database ofGenomic Variants

subject sample size, assay resolution, andgenome coverage, the newly discovered CNVsdetected only a small fraction of the CNVs ex-pected to exist in human genomes. In subse-quent years, many additional studies using amultitude of different high-resolution genomeanalysis platforms have advanced our knowl-edge of CNVs, and a comprehensive CNV mapis beginning to emerge.

SNP genotyping data have also been used toinvestigate deletion polymorphisms. By assum-ing that genotypes of SNPs included in CNVswill violate the rules of Mendelian transmis-sion, as initially determined for the CMT1Aduplication CNV (97, 101). Conrad et al. (18)examined the transmission patterns of SNPgenotypes in 30 European-derived trios (i.e.mother, father, and child) from Utah and in 30Yoruba African trios. A total of 586 deletion lociwere identified, ranging in size from 300 bp to1200 kb (18). Intriguingly, the size versus fre-quency distribution of these deletions followsan L-shaped curve; i.e., there are more smalldeletions and fewer large ones (18). A simi-lar CNV distribution pattern was also found inother studies (Figure 1).

In a related study, McCarroll et al. (102)investigated the physically clustered patternsof null genotypes, apparent Mendelian incon-sistencies and Hardy-Weinberg disequilibriumfor the Phase I data (1.3 million SNPs, 269 in-dividuals) of the International HapMap Project(159). They identified 541 deletion variantsranging from 1 kb to 745 kb in size, 278 of whichare detected in multiple individuals. Hinds et al.(54) directly examined on resequencing arraysthe reduced intensity of the hybridization signalcaused by deletions and identified 215 relativelysmall deletions (70 bp to 10 kb) in 24 unrelatedindividuals. The deletion variations identifiedin these three studies added over 1000 CNVs(26) to a database where structural variation iscataloged (the Database of Genomic Variantsor DGV, http://projects.tcag.ca/variation/).

In 2006, Redon et al. (132) constructeda first-generation CNV map of the humangenome. They employed both SNP genotypingarrays (Affymetrix GeneChip Human Mapping

www.annualreviews.org • CNV in Human Health, Disease, Evolution 453

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

Table 2 Examples of copy number variations (CNVs) and conveyed genomic disordersa

Phenotype OMIM Locus CNV Referencesa

Mendelian (autosomal dominant)b

Williams-Beuren syndrome 194050 7q11.23 del S47q11.23 duplication syndrome 609757 7q11.23 dup S62Spinocerebellar ataxia type 20 608687 11q12 dup S30Smith-Magenis syndrome 182290 17p11.2/RAI1 del S13Potocki-Lupski syndrome 610883 17p11.2 dup S49HNPP 162500 17p12/PMP22 del S11CMT1A 118220 17p12/PMP22 dup S41Miller-Dieker lissencephaly syndrome 247200 17p13.3/LIS1 del S10, S50Mental retardation 601545 17p13.3/LIS1 dup S6DGS/VCFS 188400/192430 22q11.2/TBX1 del S16, S55Microduplication 22q11.2 608363 22q11.2 dup S17, S47, S75Adult-onset leukodystrophy 169500 LMNB1 dup S48Mendelian (autosomal recessive)Familial juvenile nephronophthisis 256100 2q13/NPHP1 del S31, S53Gaucher disease 230800 1q21/GBA del S5Pituitary dwarfism 262400 17q24/GH1 del S9, S24Spinal muscular atrophy 253300 5q13/SMN1 del S43, S51beta-thalassemia 141900 11p15/beta-globin del S29alpha-thalassemia 141750 16p13.3/HBA del S25Mendelian (X-linked)Hemophilia A 306700 F8 inv/del S2Hunter syndrome 309900 IDS del/inv S8, S70, S72Ichthyosis 308100 STS del S56Mental retardation 300706 HUWE1 dup S21Pelizaeus-Merzbacher disease 312080 PLP1 del/dup/tri S14, S28, S37, S38,

S71Progressive neurological symptoms(MR+SZ)

300260 MECP2 dup S3, S15, S65

Red-green color blindness 303800 opsin genes del S46Complex traitsAlzheimer disease 104300 APP dup S52Autism 612200 3q24 inherited homozygous del S45

611913 16p11.2 del/dup S34, S42, S54, S68Crohn disease 266600 HBD-2 copy number loss S20

612278 IRGM del S44HIV susceptibility 609423 CCL3L1 copy number loss S23, S33Mental retardation 612001 15q13.3 del S58

610443 17q21.31 del S32, S57, S59

300534 Xp11.22 dup S21Pancreatitis 167800 PRSS1 tri S36Parkinson disease 168600 SNCA dup/tri S12, S19, S22, S27,

S61(Continued )

454 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

Table 2 (Continued )

Phenotype OMIM Locus CNV Referencesa

Psoriasis 177900 DEFB copy number gain S26Schizophrenia 612474 1q21.1 del S63, S64, S67

181500 15q11.2 del S63

612001 15q13.3 del S63, S64Systemic lupus erythematosus 152700 FCGR3B copy number loss S1, S18, S69

120810 C4 copy number loss S74

aFor the supplemental references, follow the Supplemental Material link from the Annual Reviews home page at http://www.annualreviews.org.bAlso de novo sporadic.

500K, 474642 SNPs) for comparative analysisof hybridization intensities and Whole GenomeTilePath BAC arrays (WGTP, 26574 large in-sert BAC clones) for CGH in 270 HapMapindividuals (159). A total of 1447 CNV re-gions were identified and approximately halfof these were detected in multiple individuals(132).

The CNVs included in the DGV databaseare reported to account for 29.7% of the hu-man genome (Table 1). However, in some ofthe initial studies, CNV size was often overes-timated because of the relative low resolution ofsome platforms (e.g., BAC arrays) used in CNVscreening. With the aid of high-resolution(∼1 kb) aCGH, Perry et al. (125) studied 2191known CNV regions in 30 individuals from4 HapMap populations. They detected copynumber changes in 1153 loci and narrowed theboundaries of 1020 (88%) CNV regions (125).Reduced CNV sizes were also reported in an-other study of McCarroll et al. (104) on theHapMap cohort via hybrid SNP-CNV geno-typing arrays. Hence, the large-scale CNVsmay affect less than previously proposed andCNVs perhaps encompass 5% of the humangenome (104), although this estimate is stillchallenged by methods that do not resolve endpoints to the base-pair level and by arrays thatare based on a reference genome that is limitedin scope.

In addition to the array-based platforms(SNP or CGH arrays), CNV can also be in-vestigated by DNA sequencing. By mining in-sertion and deletion polymorphisms from splitcapillary reads of DNA resequencing traces that

CNV size distributions

2.0

1.5

1.0

0.5

0

Den

sity

Log10 (CNV length)2 3 4 5 6

Venter >100

Watson >100

Kidd deletions

Korbel deletions

Redon-500K

Figure 1Size distribution of copy number variations (CNVs) larger than 100 bp. Red,CNVs in the Venter genome (81); purple, CNVs in the Watson genome (177);blue, CNVs in Redon et al. (132); green, deletion CNVs in Korbel et al. (68);yellow, deletion CNVs in Kidd et al. (64). Note the greater detection ofsmaller-sized CNVs with higher-resolution genome analysis.

aCGH: arraycomparative genomichybridization

were generated by shotgun sequencing the ge-nomic DNA of 36 individuals, Mills et al. (110)discovered 415,436 nonredundant indels andCNVs, ranging from 1 to 9989 bp in size.

Balanced SVs such as inversions that cannotbe identified by CGH are detectable bypaired-end sequencing techniques. Eichler andcolleagues (167) first compared fosmid (a phagecloning vector with DNA packaging limited to∼40 kb) DNA sequences from a library con-structed from the genomic DNA of individualNA15510 and identified 297 potential SVsvarying in size from 8 kb to 1.9 Mb. In a related

www.annualreviews.org • CNV in Human Health, Disease, Evolution 455

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

PEM: paired-endmapping

LCR: low-copy repeat

PRIS: potentialrecombinogenicinverted sequences

study (64), Eichler and colleagues constructednew fosmid libraries from eight HapMap sam-ples (four Yoruba Africans and four non-Africanindividuals) and sequenced both ends (termedpaired-end sequencing), of approximately onemillion clones per genome. Combined with theprevious observations in the NA15510 fosmidlibrary (167), they validated 1695 SVs across9 diploid human genomes, including 747 dele-tions, 724 insertions, and 224 inversions. 50%of these were found in multiple libraries (64).

By using next-generation sequencing tech-nology to derive sequence of paired ends of3-kb DNA fragments and computational map-ping of DNA reads onto the reference genome,Korbel et al. (68) developed a high-throughputmethod, paired-end mapping (PEM), to in-vestigate SVs in two female individuals,one African (NA18505), the other putativelyEuropean (NA15510). This strategy can iden-tify deletions, inversions, mated insertions, andunmated insertions of ∼3 kb or larger, as well assimple insertions of 2 to 3 kb (68). In total, 1297SVs, including 853 deletions, 322 insertions,and 122 inversions, were identified. Korbelet al. confirmed only 41% of the previouslyreported deletions and inversions of NA15510mainly due to 62% coverage of NA15510;however, they also detected 407 new SVs inNA15510 (68, 167). By comparing the SVsidentified in NA15510 with those in NA18505,these authors found that 45% SVs were sharedbetween NA15510 and NA18505 (68).

Recently, comparison of the complete ge-nomic sequence of Craig Venter (a de novobuild of human diploid genome delineated withconventional Sanger dideoxy technology) (81),James D. Watson (177), an African (NA18507)(10), and Asian individual (174) (the latter threewith next-generation sequencing following acomparison to the human reference genome)has revealed millions of SNPs, many shortindels and large SVs. These studies showedconclusively what had long been suspected:there is a continuous size distribution of SVsin the human genome, with smaller variantsbeing the most frequent at almost all scales(Figure 1).

Interestingly, the size distribution of dele-tions observed in the Watson and Ventergenomes (81, 177) exhibits a marked enrich-ment in the size range of 300–350 basesowing to the known retrotransposition-basedpolymorphism of Alu elements, and a lessmarked enrichment at ∼6 kb owing to theretrotransposition-based polymorphism of L1elements (Figure 1). Array-based methodswere used to identify large (e.g., >50 kb) CNVsin these genomes. For the Watson genome,the read depth of 454 sequencing confirmedmany array-detected CNVs. However, manyCNVs < 50 kb in size potentially remain to beidentified. The paired-end sequencing methodsused in the African and Asian genomes enabledsmaller CNVs to be identified than was pos-sible with array-based methods. However, dueto the limit of read length (average 35 bases),large-scale CNVs (especially duplications) mayhave been missed by massively parallel sequenc-ing (10, 174). Thus it remains a challenge todetect all CNVs in an individual genome usingone single technology since CNV can range insize from single exons (∼100 bp) to millions ofbase pairs. However, in principle it should bepossible to use new sequencing technologies toidentify all forms of SV by combining paired-read analyses with read-depth analyses and denovo assembly. Hence our current inability tocapture all of these variants remains, in part,an analytical challenge rather than a technicalone.

In addition to sequencing-based plat-forms, some inversions can also be studied byPCR-based approaches (39). NAHR betweenintrachromosomal low-copy repeat (LCR)sequences in reverse orientation can lead toinversion (154). Flores et al. (39) searched suchpotential recombinogenic inverted sequences(PRIS) in the human reference genome andidentified a total of 24,547 PRIS, varying insize from 400 to 74868 nucleotides, whichrepresent a huge resource of potential locisusceptible to inversion variation. Due to thePCR efficiency and technological limitations,these authors (39) selected eight PRIS for fur-ther study and identified inversion variations

456 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

at six PRIS loci (75%), in two of which genestructures were affected by inversions.

To date (March 2009), the Database ofGenomic Variants comprises 38,406 SVs(Table 1). Database limitations include theuse of multiple platforms of varying degreesof genome resolution. Therefore, most CNVsare not resolved to the nucleotide level andhave thus not been genotyped in differentpopulations.

The contributions of CNVs, relative toSNPs, to human phenotypes, especially com-plex diseases, are still unknown. Recent stud-ies attempted to partially explore the question:What are the relative contributions to pheno-type of CNV and SNP when the phenotypicoutput measured is changes of gene expression?In a study of the genetic contribution to vari-ation in gene expression of cell lines of 210HapMap individuals, Stranger et al. (158) com-pared the SNP data of HapMap Phase I (160)to the CNV data of Redon et al. (132). Aftermeasuring the expression levels of 14,072 genes(14,925 transcripts), at least 8.75% to 17.7% ofthe variation in gene expression could be ex-plained by CNVs (158) versus 92.5% to 83.6%that could be explained by SNPs. Remarkablyfew of the CNV associations were also observedat SNPs; i.e. there is minimal overlap betweenthe contributions from CNV versus SNP. Thiscould be potentially explained in part by thelow power in this study due to the relativelysmall sample sizes in each HapMap population(158). In addition, because only a small por-tion of existing CNVs were investigated, thecurrent observations in Stranger et al. (158)may underestimate the overall contribution ofCNV to gene expression level. Further com-prehensive research is expected to better elu-cidate the contributions of CNVs to humanphenotypes.

How Many Copy Number VariationsRemain to be Discovered?

Have we already identified all CNVs in the hu-man genome? The answer is a definite “no”.First, the characterization of the SV in a certain

SD: segmentalduplication

genomic region requires knowledge about thesequence of this region. The current humangenome sequence encompasses virtually all eu-chromatic regions; however, many gaps remainfor which we have no sequence information.In the most recent draft published in April2003 by the International Human Genome Se-quence Consortium (59), up to 6% of the to-tal genome is reported to remain uncovered,consisting of 341 sequencing gaps with 273 re-siding in euchromatin. An estimated 54% ofall gaps in euchromatin regions are flanked bysegmental duplications (SDs) or LCRs (5) andare thus prone to SVs (154). Many of theseare adjacent to complex LCRs (LCRs consist-ing of a cluster of different repeat subunitslying in either orientation, see Figure 2 be-low) (154), such as the 15 gaps in the 1q21.1deletion region recently found to be associatedwith microcephaly/macrocephaly and develop-mental/behavioral abnormalities (13, 106) andschizophrenia (Table 2). In aggregate, manyof the current sequence gaps of the humangenome can be expected to contain sequenceswith SV. In fact, SVs can be one of the factorsthat complicated the completion of sequencingin the gap regions because different haplotypesneed to be differentiated from each other (14,27). Most recently, Bovee et al. reported theclosure of 26 of the 273 remaining gaps in eu-chromatin regions (12). Indeed, 30.7% of theseclosed gaps are polymorphic and show SVs (12).

Even in the genomic regions with availablereference sequence, we may identify more novelSVs in the future. Genomic structures such asthe complex LCRs (Figure 2) containing mul-tiplex adjacent LCR elements in tandem andreverse orientations provide the structural basisfor a multitude of variations including duplica-tion, deletion, inversion of different length andin diverse combinations (14, 20, 47, 154). Thedetection of all of these SVs will require detailedanalyses of a large number of different hap-lotypes. Furthermore, complex LCRs, particu-larly those with inverted repeats, can also formcruciforms and may trigger genomic rearrange-ments, inducing nonrecurrent, sometimes com-plex SVs in some individuals. These variations

www.annualreviews.org • CNV in Human Health, Disease, Evolution 457

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

THOC3NY-REN-7 PROP1

A AB BCD DE EF FC

Sos-PREP (390-kb) Sos-DREP (429-kb)

telcen

50 kb

C'

Not

I (4)

Not

I (5)

THOC3 NY-REN-7

tel

330 kb 45 kb 330 kb

NPHP1cen330RI

175 kb

330RII45RII

804H10R

BtBmBc CmCc CtAtAmAccen tel

D7S

653

D7S

672

D7S

1816

BCST

R1

BAST

R1 (D

7S48

9Ct)

BBST

R1c

BBST

R2c

D7S

2490

BAST

R1 (D

7S48

9B)

CR16

TEL

N

BBST

R1m

BBST

R2m

BBST

R2t

BBST

R1t

BAST

R1 (D

7S48

9A)

BAST

R1 (D

7S48

9Ct)

BCST

R1

WBS

tel1

D7S

2518

HSB

055X

E5

D7S

1870

100 kb

SSN

C-A

SSN

C-A

Low-copy repeat (LCR)

Direct LCRs

Inverted LCRs

Complex LCRs

a LCR

b Complex LCR

NPHP1

WBS

Sos

45RI

Figure 2Low-copy repeats (LCRs) or segmental duplications (SDs) in the human genome. (a) LCR orientations (direct, reverse, and complexLCRs). (b) Examples of complex LCR. NPHP1, the shaded blue representing the inverted 330-kb repeats and the green arrows for thedirect 45-kb repeats (adapted from Reference 135). The WBS locus at 7q11.23, Blocks A, B, and C of centromeric (c), medial (m), andtelomeric (t) LCRs represented by black arrows (adapted from Reference 9). The Sos locus, six subunit LCRs between the proximal andthe distal Sos-REPs (A-F) and their orientation depicted by the arrowhead (adapted from Reference 72).

tend to be rare and require large sample sizesfor identification.

Finally, the reference sequence contains thedeleted allele at a number of common CNVs,manifesting as novel inserted sequences insequencing-based surveys (64), but missed bymicroarray-based surveys. These insertions can

often be found in the chimp genome sequence,which suggests that these apparent insertionsare indeed deletions in the human referencesequence.

Given the huge scientific and clinical sig-nificance of structural variations and humangenomics in general, multicenter collaborative

458 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

projects including the Human Genome Struc-tural Variation Group (28, 64) and the 1000Genomes Project (10, 174) have been launchedto extend our knowledge on this subject. We areoptimistic that these endeavors will unveil manynovel structural variations of our genomes inthe near future.

COPY NUMBER VARIATION:MECHANISMS FOR FORMATION

Four major mechanisms: NAHR, NHEJ,FoSTeS, and L1-mediated retrotransposition,

generate rearrangements in the human genomeand probably account for the majority of CNV(reviewed in References 47 and 51; Figure 3).

Nonallelic HomologousRecombination Occurs in Meiosisand Mitosis, Resulting in Duplication,Deletion, and Inversion

NAHR is caused by the alignment of and thesubsequent crossover between two nonallelic(i.e., paralogous) DNA sequence repeats shar-ing high similarity to each other (Figure 3)

FoSTeSNHEJNAHRFoSTeS × 2

OHP

TS

TSDTSD

L1 retrotransposition

NAHR NHEJ FoSTeS Retrotransposition

Structural variation type dup, del, inv dup, del dup, del, inv,complex

ins

Homology flanking breakpoint(before rearrangement)?

Yes (LCR/SD, Alu,L1, or pseudogene)

No No No

Breakpoint Inside homology Addition or deletionof basepairs, ormicrohomology

Microhomology No specification

Sequence undergoing SV Any Any Any Transcribedsequences

FoSTeS × 1

a

b

Figure 3Comparisons and characteristics of the four major mechanisms underlying human genomic rearrangementsand CNV formation. (a) Models for Non-Allelic Homologous Recombination (NAHR) between repeatsequences (LCRs/SDs, Alu, or L1 elements); Non-Homologous End-Joining (NHEJ), recombination repairof double strand break; Fork Stalling and Template Switching (FoSTeS), multiple FoSTeS events (×2 ormore) resulting in complex rearrangement and single FoSTeS event (×1) causing simple rearrangement; andretrotransposition. TS, target site; TSD, duplicated target site. Adapted from References (47, 123). Thickbars of different colors indicate different genomic fragments; completely different colors (as orange and red ororange/red/green in FoSTeS×2) symbolize that no homology between the two fragments is required. The twobars in two similar shades of blue indicate that the two fragments involved in NAHR should have extensivehomology with each other. The triangles symbolize short sequences sharing microhomologies. Eachgroup of triangles (either filled or empty) indicates one group of sequences sharing the same microhomologywith each other. (b) Characteristic features for each rearrangement mechanism. Specific features of certainmechanisms are shown in red. Abbreviations: dup, duplication; del, deletion; inv, inversion; ins, insertion.

www.annualreviews.org • CNV in Human Health, Disease, Evolution 459

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

(154). Repeats on the same chromosome and indirect orientation mediate duplication and/ordeletion, whereas inverted repeats mediate in-version of the genomic interval flanked by therepeats (88, 154). NAHR between sequence re-peats on different chromosomes can lead tochromosomal translocation (88, 154).

Substrates for NAHR are LCRs or SDsof >10 kb in length with 95%–97% similar-ity (Figure 2) (147, 154). Different groups ofLCRs are sometimes localized adjacent to eachother, with some subunits in tandem and oth-ers in reverse orientation resulting in complexLCRs (Figure 2) (15, 154). Complex LCRscan themselves undergo CNV in both meio-sis and mitosis, and specific structural variantsmay predispose susceptibility to chromosomalrearrangements (7, 14, 20, 90). In addition toLCR, repetitive sequences such as the retro-transposable L1 elements (48), Alu (2, 49, 139),or matching pseudogenes (65, 157) can act asNAHR substrates if from similar families orwith high enough sequence identity to facili-tate homologous recombination. Such shortersequence substrates tend to mediate NAHR ofshorter DNA fragments (48, 139), which areoften underestimated owing to the detectionlimit of many CNV assays (64, 68). A recom-bination hotspot of 28 bp has been detected inAlu sequence (139), whereas no recombinationhotspots have been found in L1 sequences (48).

NAHR hotspots can be closely related tothe allelic homologous recombination (AHR)hotspots (46, 60, 89, 163) and can overlapin the human genome and in our ancestralgenomes (86, 114, 131). A cis-acting sequencemotif CCNCCNTNNCCNC was found to beenriched in both AHR and NAHR hotspots(113) and may represent the analogue of Chi(GCTGGTGG) in prokaryotic homologousrecombination (152).

NAHR can occur in meiosis, where it resultsin unequal crossing over as revealed by segre-gation of marker genotypes and leads to con-stitutional genomic rearrangements that can bebenign polymorphisms or manifest sporadic, ifde novo, or inherited genomic disorders (92,98, 166). NAHR can also occur in mitosis,

resulting in mosaic populations of somatic cellscarrying copy number or SVs (39, 75, 76).

The recent use of the advanced array-basedtechniques enabled the identification of a num-ber of novel genomic disorders caused by thepredicted NAHR-mediated reciprocal genomicrearrangements (Table 2). The relative fre-quency of duplications and deletions mediatedby the same pairs of LCRs is thus an increas-ingly important and clinically relevant question.

To address this question directly, Turneret al. (166) used pooled sperm PCR assaysto analyze four NAHR loci related to well-studied genomic disorders, including Williams-Beuren syndrome (WBS; MIM 194050) dele-tion and 7q11.23 duplication, the azoospermiafactor a (AZFa; MIM 415000) deletion and itsreciprocal duplication, the hereditary neuropa-thy with liability to pressure palsies (HNPP;MIM 162500) deletion and Charcot-Marie-Tooth disease type 1A (CMT1A; MIM 118220)duplication, and the Smith-Magenis syndrome(SMS; MIM 182290) deletion and Potocki-Lupski syndrome (PTLS; MIM 610883) dupli-cation in the sperm populations of five males.Strikingly, they found that all five subjects con-sistently displayed an approximately 2:1 ratio ofdeletion versus duplication at all three autoso-mal loci and 4:1 for the AZFa locus on the Ychromosome (166). Thus, at least in meiosis onautosomal loci, the reciprocal deletion and du-plication tended to have the frequency ratio of2:1 (166), although the predominance of inter-chromosomal rearrangements in velocardiofa-cial syndrome (VCFS; MIM 192430) suggests amore equal ratio. Notably, the assays of Turneret al. (166) only measured the NAHR eventsacross known meiotic recombination hotspots.

Lam & Jeffreys (75, 76) also performedpooled sperm assays on the alpha-globin lo-cus in two subjects. Instead of measuring acrosshotspots in the LCR, they examined the en-tire globin locus and could thus also recordNAHR events outside the hotspots and othernon-NAHR rearrangements. In their study,one person showed the same deletion and du-plication frequency, whereas the other one hadmore duplication than deletion. This apparent

460 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

discrepancy between the two studies could bedue to experimental design or it could reflecttrue differences among different NAHR loci,potentially influenced by the local genomic ar-chitecture.

Eichler and colleagues designed an aCGHplatform to interrogate genomic regions witharchitectural features resulting in susceptibilityto NAHR. Bioinformatic analyses of the humangenome reference sequence were used to iden-tify genomic intervals with substrate require-ments (LCRs/SDs located on the same chro-mosome, directly oriented, and with high se-quence identity) for NAHR (88). They thendesigned arrays to interrogate NAHR sus-ceptible regions to investigate the genomicDNA from patients with mental retardation(MR) and/or multiple congenital anomalies(MCA) (143). In less than two years, they de-fined six new MR/MCA genomic disordersincluding microdeletions and/or microdupli-cations of 1q21.1 (106), 15q13.3 (52, 144),15q24 (145), 16p13.11 (50), 17q12 (105),and 17q21.31 (143). These and other studiesclearly demonstrate the importance of under-standing the mechanisms for human genomicrearrangements.

Some Simple Copy Number VariationsCan Be Explained by NonhomologousEnd-Joining

Nonhomologous end-joining (Figure 3) isutilized by human cells to repair DNA double-strand breaks (DSBs) caused by ionizing radia-tion or reactive oxygen species and for physio-logical V(D)J recombination (83, 84, 136). Twocharacteristic features of NHEJ are: (a) UnlikeNAHR, NHEJ does not require substrates withextended homology; (b) NHEJ can leave an ‘in-formation scar’ in the form of loss or additionof several nucleotides at the join point (82).

Breakpoints of NHEJ-mediated rearrange-ments often fall within repetitive elements suchas LTR, LINE, Alu, MIR, and MER2 DNA el-ements (120, 164). Moreover, sequence motifslike TTTAAA, known to be capable of causingDSB or curving DNA, are present in proximityto many of these junctions (120, 148, 164).

MCA: multiplecongenital anomalies

Also, many 17p translocations and othernonrecurrent deletions have one of their break-points in LCRs (155). These data suggestedthat although without any obligatory substraterequirement for LCRs, NHEJ may still bestimulated by certain genomic architectures(147, 155).

A DNA Replication–BasedMechanism, FoSTeS, Can Account forComplex Genomic Rearrangementsand Copy Number Variations

Recently, the observation of the often complexdetails of genomic rearrangements, enabled byaCGH techniques, led Lee et al. (78) to proposethe replication FoSTeS model as a mechanismfor human genomic rearrangements (Figure 3).

According to this model, the DNA replica-tion fork can stall, the lagging strand disengagesfrom the original template and switches to an-other replication fork and restarts DNA synthe-sis on the new fork by priming it via the micro-homology between the switched template siteand the original fork (78). The new templatestrand is not necessarily adjacent to the orig-inal replication fork in primary sequence, butlikely in three-dimensional physical proximity.Upon annealing, the transferred strand primesits own template-driven extension at the trans-ferred fork. Depending on the direction of thefork progression and whether the lagging orleading strand in the new fork was used as atemplate and copied, the erroneously incorpo-rated fragment from the new replication forkcould be in direct or inverted orientation to itsoriginal position. Furthermore, depending onwhether the new fork is located downstreamor upstream of the original fork, the templateswitching results in either a deletion or a du-plication. This procedure of disengaging, in-vading, and synthesizing may occur multipletimes in series (i.e., FoSTeS × 2, FoSTeS × 3,etc.), likely reflecting the poor processivity ofthe DNA polymerase involved, and resulting inthe observed complex rearrangements.

Array CGH data on genomic regions in-cluding the Pelizaeus-Merzbacher disease locus

www.annualreviews.org • CNV in Human Health, Disease, Evolution 461

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

MMBIR:microhomology-mediated break-induced replication

(78), the SMS/PTLS locus (130, 171, 186), theMECP2 locus (8, 22), the LIS1 locus (11), anda number of other loci have revealed the com-plex nature of many other nonrecurrent rear-rangements, some of which were thought to besimple deletion or tandem duplication beforethe analysis by higher-resolution genome tech-nologies. In addition, some complex chromoso-mal rearrangements unveiled by recent cytoge-netic experiments can also be better explainedby FoSTeS (171, 185).

In our studies on breakpoint sequencesof FoSTeS-mediated complex rearrangements,e.g., complex LIS1 duplications (11), micro-homologies shared between two Alu elementshave also been identified at one or more break-points. Thus, in addition to Alu-Alu-mediatedNAHR resulting in simple genomic rearrange-ment (2, 49, 139), microhomology shared be-tween Alu elements can also apparently be usedfor template switching and priming DNA repli-cation on a new fork. Since the prevalence ofAlu elements has been observed at the ends ofLCRs/SDs (6) and at breakpoints of subunitsof complex LCRs/SDs (2), FoSTeS events in-volving Alu elements may also be responsiblefor LCR/SD formation.

Not only can FoSTeS generate large ge-nomic duplications of several megabases (11,78), it also causes genic duplication/triplicationand even rearrangements of single exons (186),which implicate FoSTeS in gene duplicationand exon shuffling; two predominant mech-anisms driving gene and genome evolution(42, 121).

By reanalyzing the breakpoint sequence dataof 23 deletion CNVs published by Perry et al.(125), we have found that at least 5 (22%) CNVbreakpoints were highly complex and consis-tent with two or more FoSTeS events. Further-more, many of the remaining deletion CNVsshow microhomologies at breakpoints, whichare also consistent with FoSTeS × 1 (125).

Based upon observations regarding genomicrearrangements in human (FoSTeS), bacte-ria, yeast, and other model organisms, Hast-ings et al. (51) proposed a generalized replica-tive, template-switch model that may underlie

many structural variations in genomes andgenes from all domains of life. This is termedthe microhomology-mediated break-inducedreplication (MMBIR) model. MMBIR couldnot only lead to CNV by itself, but also cre-ate LCRs that provide the homology requiredfor NAHR and predispose to more genomic re-arrangements in future generations. MMBIRmay cause somatic events associated with can-cer and underlie the genomic rearrangementsand CNV associated with the emergence ofprimate-specific traits providing variation fornatural selection and evolution (51).

L1 Retrotransposition Contributesto Copy Number Variation

Long interspersed element-1 (L1) elementscover 16.89% of human genomic DNA andare the only currently active autonomoustransposons in the human genome (45, 63).Of the 516,000 copies of L1 in our genome,only about 80–100 copies are full length(about 6 kb in size) and have two intact openreading frames (ORF): ORF1 coding for aRNA-binding protein and ORF2 encodinga protein with both endonuclease (EN) andreverse transcriptase (RT) activity (3).

L1 transposition occurs via an RNA inter-mediate that is probably transcribed by RNApolymerase II (3). The reverse transcription andintegration are thought to occur in a coupledprocess called target primed reverse transcrip-tion (TPRT) (123) (Figure 3). The resultantinsertion is flanked by duplicated target sites(TSD), characteristic of TPRT (123).

L1s are also responsible for the mobiliza-tion of Alu elements and SVA elements (SINE-R, VNTR) (3, 45, 122) as well as retrogenes.Korbel et al. attributed 30% of the SV in-dels that they detected to retrotransposition;of these 90% were due to L1 elements them-selves and 8% to SVAs (68). Kidd et al. found15% of the SV events that they detected to bedue to retrotransposition (64). In the analysisof the mobile element-associated SVs in theVenter genome, Xing et al. found that ∼10%of the indels of > 100 bp are associated with

462 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

transposable DNA sequences, including L1,Alu, and SVA (181).

Almost all the sequenced SV breakpoints(N = 114) in Korbel et al. (68) bear sig-natures of either NAHR (surrounded byLCRs or other repetitive sequences), NHEJ/FoSTeS × 1 (microhomology at the junction),or retrotransposition (mostly L1 elements).Kidd et al. (64) also observed similar evidenceat breakpoints that can be interpreted as havingoccured by NAHR, NHEJ, FoSTeS, and retro-transposition mechanisms. Thus, these mech-anisms can explain the majority of the DNArearrangements occurring in our genomes.

COPY NUMBER VARIATION:MUTATION RATES

The mutation rate of point mutations is es-timated to be 1.8–2.5 × 10−8 per base pairper generation (67, 115). Although precise es-timates of CNV mutation rates are elusive, thelocus-specific mutation rates of some de novoCNVs have been estimated by different meth-ods, including studies of a single X-linked gene(DMD) (169), prevalence calculations, pooledsperm PCR assays, and aCGH analyses of trios.The estimates for CNV locus-specific muta-tion rates range from 1.7 × 10−6 to 1.0 ×10−4 per locus per generation (92, 169), i.e., 100to 10,000 times higher than nucleotide substi-tution rates! Distinct from the relatively con-stant mutation rates of SNPs across the humangenome [with the exception of a transition pointmutation causing SNPs at CpG dinucleotides(19)], mutation rates of CNVs can vary widelyat different loci, likely reflecting the differencesin CNV formation mechanism and local or re-gional genome architecture inciting genomicinstability (Figure 4).

For autosomal dominant genomic disordersthat do not result in embryonic lethality andare fully penetrant with fitness w, the per-locusper-generation rate (μ) of loss-of-function mu-tation can be estimated by the birth prevalenceB, i.e., μ = (1/2)(1 – w)B (92). According tothe assumed fitness of zero and the estimatedbirth prevalence, e.g., 1/4000 for DiGeorge

SNP rate

CMT1A(118220)DGS(188400)MDLS(247200)SMS(182290)WBS(194050)

CNV rate

CAH(201910)CTNS(219800)DMD(310200)HDR(146255)PMD(312080)

AIS(300068)CF(219700)MFS(154700)NS1(163950)NPS(161200)

Diseasetrait (MIM#)

CMT1A, Charcot-Marie-Tooth disease type 1A

DGS, DiGeorge syndrome

MDLS, Miller-Dieker lissencephaly syndrome

SMS, Smith-Magenis syndrome

WBS, Williams-Beuren syndrome

CAH, congenital adrenal hyperplasia

CTNS, nephropathic cystinosis

DMD, Duchenne muscular dystrophy

HDR, hypoparathyroidism, sensorineuraldeafness, and renal disease

PMD, Pelizaeus-Merzbacher disease

AIS, androgen insensitivity syndrome

CF, cystic fibrosis

MFS, Marfansyndrome

NS1, Noonan syndrome 1

NPS, Nail-Patella syndrome

Human genome CpG = 10 ×

de novo CNV 100–10,000 ×

Figure 4New mutation rates for SNP versus CNV. Constant SNP mutation rates andvariable CNV mutation rates across the human genome. Examples of humandisease traits (OMIM numbers are shown) with different contribution of CNVversus SNP. Note that for some diseases at specific loci, CNVs outweigh SNPsas a mutational cause for disease. CNV de novo locus-specific rates can varythroughout the genome, whereas SNP rates are essentially constant. The SNPrate for CpG dinucleotides is constant but about ten times higher than otherbases because of methyl-mediated deamination of cytosine to uracil, causingtransition mutations.

syndrome (DGS)/VCFS at 22q11.2, 1/10000for WBS at 7q11.23, and 1/25000 for SMS at17p11.2 (142), the estimated mutation rates forthese genomic rearrangements are 2 × 10−5 to1.25 × 10−4 per locus (92). As determined bypooled sperm PCR assays directly in the malegermline, the average rate for deletion CNV is4.20 × 10−5 for the CMT1A locus, 9.55 ×10−6 for the WBS locus, 1.87 × 10−6 for theLCR17p locus, and 2.16 × 10−5 for the Y-linked AZFa locus (166). Notably, the mutationrate of CMT1A duplication based on the exper-imentally determined pooled sperm PCR assay(1.73 ± 0.49 × 10−5) (166) correlates very wellwith that determined theoretically from preva-lence calculations (1.7 – 2.6 × 10−5) (92).

The overall rate of de novo CNV can alsobe explored via the genome-wide screeningfor de novo CNVs in trios. By reanalyzingthe oligonucleotide genome-wide tiling-pathaCGH data of Sebat et al. (137) from triosof autism cases and controls, the per-locus

www.annualreviews.org • CNV in Human Health, Disease, Evolution 463

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

mutation rate (μ) was estimated by μ × 3 ×103 = 1/2 (1 × 10−2), i.e., μ = 1.7 × 10−6,based on the observations of 2 de novo CNVs(∼1%) out of 196 controls and average CNVsize of ∼2 × 106 bp in the autism cohort (i.e.,2 × 106 such loci in the whole genome) (92).However, as the resolution limit of aCGH em-ployed in Sebat et al. (137) prevents the de-tection of numerous small CNVs, the calcu-lated CNV mutation rate of 1.7 × 10−6 fromtheir data may thus be an underestimate. Notealso that somatic mutation may contribute tosome of these observations, and it is importantto demonstrate germline transmission to distin-guish germline from somatic variants.

With the aid of accumulating CNV data,more accurate mutation rates for CNV willbe accessible. Nevertheless, existing data sug-gest the de novo rates for CNV may be or-ders of magnitude greater than for SNPs insome regions of the genome. However, giventheir dependency on genomic architecture,locus-specific CNV mutation rates are muchmore variable across the genome than areSNPs. The variable mutation rates of CNVsmay, in part, cause the relative contribution ofSNPs/CNVs to vary widely among Mendelianloci (Figure 4).

COPY NUMBER VARIATION:CONVEYED PHENOTYPES

Large chromosomal aberrations have long beenknown to be associated with human diseases.The best-known example is Down syndrome(MIM 190685) caused by trisomy of humanchromosome 21 (79). Such large chromosomeabnormalities are detectable by conventionalmicroscopy. Copy number changes caused bysubmicroscopic genomic deletions were subse-quently found to be involved in human diseasesand other traits including thalassaemia due toalpha globin gene rearrangements (53) and red-green color blindness; the latter affects approxi-mately 8% of Caucasian males. In 1986, the red-green pigment genes were mapped to Xq28 asa tandem array of one gene encoding red opsinand one or more genes encoding green opsin(118). The high degree of sequence similarity

between these genes makes them vulnerable toNAHR and the consequent deletion or duplica-tion polymorphisms, some of which can totallyeliminate or disrupt red or green opsin genesand lead to red-green color blindness (117). Be-sides deletion resulting in defects in gene func-tion, duplication involving a dosage-sensitivegene can also cause disease. In 1991, the firstdisease-associated submicroscopic duplicationswere identified in 17p12, and this duplicationcan lead to CMT1A (97, 99). Some examplesof clinical phenotypes conveyed by CNVs in-volving single or multiple genes are shown inTable 2. Smaller CNVs affecting single exonsalso account for a proportion of human diseases(30, 73, 74).

Clinical findings associated with submi-croscopic chromosomal imbalance (includingdeletions, duplications, insertions, transloca-tions, and inversions) have been archived inDECIPHER (DatabasE of Chromosomal Im-balance and Phenotype in Humans using En-sembl Resources; http://www.sanger.ac.uk/PostGenomics/decipher/) (37). To date (May2009), over 2,200 cases of >50 diseases havebeen included in DECIPHER.

CNV can be responsible for sporadic birthdefects (87), other sporadic traits, Mendeliandiseases, and complex traits (Table 2). In thisreview we expand on complex neurological dis-eases and susceptibility to complex diseases assporadic and Mendelian disease have been re-viewed elsewhere (88, 98, 154).

Complex Neurological Diseases

Parkinson disease. Parkinson disease (PD;MIM 168600) is a common neurodegenerativedisorder with manifestations including rest-ing tremor, muscular rigidity, bradykinesia, andpostural instability that affects approximately1% of the population over age 50 (128). Atleast 13 genomic loci have been reported tobe involved in PD, including SNCA, the alpha-synuclein gene on 4q21 (128). Singleton et al.(150) identified a novel triplication involvingSNCA linked to autosomal dominant PD in alarge family, which implicates the dosage ef-fect of SNCA in PD. In a further study of

464 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

gene expression in the patients with the SNCAtriplication, Singleton and colleagues, foundan approximately twofold increase of SNCAprotein in blood, a twofold increase of SNCAmRNA in brain tissue, and increased depositionof heavily aggregated SNCA protein in braintissue (109).

Later studies also confirmed the role ofSNCA copy number gain in PD. Chartier-Harlin et al. (17) discovered the SNCA trip-lication in one of nine families with autoso-mal dominant PD. SNCA triplication has alsobeen reported in a family of Swedish Amer-ican descent with autosomal dominant early-onset PD (32). In addition to the SNCA trip-lication, Fuchs et al. (41) also detected SNCAduplications in a Swedish family, which sharesa common ancestor with the former reportedfamily (32). Ibanez et al. (58) identified 2 withSNCA duplication out of 119 individuals fromPD families. These observations strongly sug-gest a dosage effect of SNCA in selected casesof PD.

Alzheimer disease. Alzheimer disease (AD;MIM 104300) is a neurodegenerative disordercompromising cognition in the elderly, whichis characterized by intracellular neurofibrillarytangles and extracellular amyloid plaques thataccumulate in vulnerable brain regions (140).AD-associated point mutations have been iden-tified in at least 15 genomic loci, including theAPP gene encoding the amyloid precursor pro-tein (43). Genetic variations in APP promotersequences are also associated with AD. The-uns et al. (162) identified eight novel APP pro-moter variants in late-onset AD, three of whichcan cause a nearly twofold neuron-specific in-crease in APP transcriptional activity in vitro.This observation suggests that increased APPlevel can result in AD and that APP is a dosage-sensitive gene. Also of interest, Down syndromedue to trisomy 21 is associated with early-onset Alzheimer disease; the APP gene mapsto chromosome 21 and so those with Downsyndrome have three copies. Thus, copy num-ber gain of APP is hypothesized to be one ofthe causes of AD. Duplication CNVs of theAPP gene have been reported in families with

autosomal dominant early-onset Alzheimer dis-ease (ADEOAD) and cerebral amyloid an-giopathy (CAA) (133). Rovelet-Lecrux et al.(133) identified five duplications of differingsizes and encompassing APP, which causedaccumulation of beta-amyloid peptides andthe consequent phenotype of ADEOAD withCAA. These APP duplications were presentonly in affected subjects and cosegregated withthe disease in families and were not foundin 100 healthy controls with normal cogni-tion over the age of 60 years. APP duplica-tions associated with dementia with CAA havealso been identified in a Dutch population(151).

Mental retardation. Mental retardation (MR)is a nonprogressive cognitive impairment.Many de novo genomic rearrangements havebeen identified in patients with MR (21, 146,153). Due to the relative small sample size in theearly studies, these de novo CNVs were iden-tified in single cases and each CNV was differ-ent (21, 146). Larger cohorts may help detectand confirm the roles of rare de novo CNVsin MR.

Recently, Froyen et al. (40) studied a largeset of 300 well-characterized families with X-linked mental retardation (XLMR) and identi-fied 6 overlapping duplications at Xp11.22 in6 unrelated males with predominantly nonsyn-dromic XLMR. Their maximal common dupli-cated region is about 320 kb and involves fourknown genes (SMC1A, RIBC1, HSD17B10, andHUWE1), three candidates of which may con-vey the phenotype of MR. SMC1A encodes thesubunit of cohesion complex, and point mu-tations in it can lead to Cornelia de Langesyndrome (MIM 300590) with facial dysmor-phisms, MR, and growth deficits in childhood(112). A silent mutation of HSD17B10 has beenreported to be a syndromic form of MR withchoreoathetosis (80). In addition to the du-plications involving HUWE1 (an E3 ubiquitinligase gene), Froyen et al. (40) also identifiedthree point mutations of the same gene in threeXLMR families, a finding that highlights therole of HUWE1 in XLMR and supports theconclusion that it is a dosage-sensitive gene, at

www.annualreviews.org • CNV in Human Health, Disease, Evolution 465

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

ASD: autismspectrum disorder

least partially responsible for the MR pheno-type in the duplication case.

Loss-of-function mutations of the X-linkedmethyl-CpG-binding protein 2 gene (MECP2)at Xq28 are associated with developmental de-lay (DD), MR, and fatal infantile encephalopa-thy in males. Recent findings suggest increasedMECP2 gene copy number can also convey aclinical phenotype, resulting in a DD/MR plusseizures phenotype in males (8, 15, 22, 168).

Recently, Bi et al. (11) identified sevenunrelated cases with submicroscopic duplica-tion in 17p13.3 involving the LIS1 and/or the14-3-3ε genes and using a ‘reverse ge-nomics’ approach characterized the clinicalconsequences of duplication. “Genomo-type”/phenotype correlations showed thatincreased LIS1 dosage can cause microcephaly,mild brain structural abnormalities, moderateto severe developmental delay, and failure tothrive, whereas duplication of 14–3-3ε in-creases the risk for macrosomia, mild develop-mental delay, pervasive developmental disorder,and results in shared facial dysmorphologies.

Autism. Autism (MIM 209850) is a child psy-chiatric disorder that is characterized by a triadof limited or absent verbal communication, alack of reciprocal social interaction or respon-siveness, and restricted, stereotypical, and ritu-alized patterns of interests and behavior (4). Toexamine if CNVs can convey autism spectrumdisorder (ASD), Sebat et al. (137) employed theROMA technology in 165 families affected byautism and in 99 control families. They foundsignificantly more spontaneous CNVs in ASDpatients (14/195) than in unaffected controls(2/196) (P = 0.0005). Furthermore, the twoCNVs in control subjects were duplications,whereas most of the CNVs (12/15) in ASD pa-tients were deletions (137). These observationssuggest that a high frequency of de novo CNVsin ASD patients, especially deletions, is a riskfactor for autism. Some of the de novo CNVsidentified by Sebat et al. (137) have been re-ported to be autism-associated or overlap withautism susceptibility (AUTS) loci, for example,the duplication of 15q11–13 (AUTS4; MIM

608636) and the deletion of 16p11.2 (AUTS14;MIM 611913).

The role of 16p11.2 deletion in autism wasalso confirmed in later studies. By reanalyzingthe data of the genome-wide association studyof 751 multiplex families from the Autism Ge-netic Resource Exchange (AGRE), Weiss et al.(175) observed five patients with a 593-kb denovo deletion on 16p11.2. Subsequent CGHrevealed 5 more cases of 16p11.2 deletion in512 children affected by DD, MR, or suspectedASD and 3 more deletions in 299 Icelandic pa-tients affected by autism, whereas only 2 suchdeletions were identified in 18,834 unscreenedIcelandic controls. Kumar et al. (70) discovered2 de novo 16p11.2 deletions out of 180 autismprobands but none in 372 controls. In the sub-sequent additional screening, they found a sim-ilar result (2 deletions in 532 probands and nodeletion in 465 controls). These observationssuggest a significant association of the 16p11.2deletion with autism (P = 0.044). Weiss et al.(175) also documented the reciprocal duplica-tions. Both the 16p11.2 deletion and its recip-rocal duplication were shown to be risk fac-tors for autism; CNV at 16p11.2 accounts forapproximately 1% of autism cases (175). Theassociation of reciprocal 16p11.2 deletion andduplication with autism was also confirmed byMarshall et al. (100) by a combined discovery offour 16p11.2 CNVs in 427 ASD patients versusnone in 1652 controls (P = 0.002).

Using homozygosity mapping in pedigreeswith shared ancestry, Morrow et al. (111) identi-fied several large, inherited, homozygous dele-tions, including an 886-kb 3q24 deletion affect-ing DIA1, whose level of expression changes inresponse to neuronal activity.

Schizophrenia. Schizophrenia (MIM 181500)is a chronic, debilitating illness with both neu-rological and psychiatric features. It is commonwith a lifetime prevalence of approximately 1%.An increased number of CNVs were found tobe associated with schizophrenia (173, 182).

In genome-wide studies of thousands of pa-tients, two related studies, Stefansson et al.(156) and the International Schizophrenia

466 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

Table 3 Odds ratios for statistically significant associations of psychiatric illness withmicrodeletion of a given genomic intervala

Data MicrodeletionStudy Cases Controls 22q11.2 (DGS/VCFS) 15q13.3 1q21.1Stefansson et al. (156) ∼4000 ∼40,000 ∞ 14.83 11.54Int. SCZ Consortium (161) 3391 3181 21.6 17.9 6.6

aFrom Reference 94.

Consortium (161), discovered new loci respon-sible for schizophrenia (Table 3). Stefanssonet al. (156) first screened and identified 66 denovo CNVs in 9878 transmission sets, none ofwhom had been diagnosed with schizophrenia.Then they tested these de novo CNVs for dis-ease association in a sample of 1433 patientswith schizophrenia and related psychoses and33,250 controls. Three deletions (located on1q21.1, 15q11.2, and 15q13.3) were found tobe nominally associated with schizophrenia andpsychosis (156). The associations of these threedeletions with schizophrenia were further con-firmed by a follow-up investigation in a secondsample of 3285 cases and 7951 controls. TheInternational Schizophrenia Consortium (161)performed a genome-wide survey of rare CNVsin 3391 patients with schizophrenia and in 3181ancestrally matched controls, and discoveredtwo associated deletions on 1q21.1 and 15q13.3,which are identical to two of three deletionsfound in the cohort reported by Stefansson et al.(94, 156). Furthermore, both studies also repli-cated the previously reported 22q11.2 deletionswith schizophrenia phenotype in DGS/VCFS(Table 3) (62). The International Schizophre-nia Consortium (161) also documented that thetotal CNV load was greater in patients withschizophrenia versus controls.

Vrijenhoek et al. (172) identified 90 CNVsin 54 patients with schizophrenia; 13 of theseare rare and not yet reported in human popula-tions. Some of these rare CNVs were found todisrupt schizophrenia-associated genes, such asMYT1L, CTNND2, and ASTN2 (172). There-fore, rare CNVs affecting functional genes canbe an important causative variation resulting incomplex diseases such as schizophrenia. How-ever, given the low frequency of these rare

CNVs, large sample sizes may be required toconfirm the association with disease.

Susceptibility to Other Complex Traits

It has long been known that the deletionof the alpha-globin gene can lead to alpha-thalassaemia (53) and protect against malaria(38). Recently, many more CNVs have been re-ported to affect disease susceptibility.

HIV susceptibility. Chemokines are secretedproteins involved in immunoregulatory and in-flammatory processes (116). The CCL3L1 geneon 17q12 encodes the potential ligand for CCchemokine receptor 5, the major coreceptor forHIV (107). Therefore, CCL3L1 can be a domi-nant HIV-suppressive chemokine (107). CNVsof CCL3L1 (from 0 to 10 copies) have beenreported in the Caucasian population (165).Gonzalez et al. (44) examined the effects of theCCL3L1 CNVs on the susceptibility to HIVand showed that low CCL3L1 copy number(below population average) is associated withmarkedly enhanced HIV/AIDS susceptibility.Interestingly, a strong association was detectedbetween higher infant CCL3L1 copies and re-duced susceptibility to HIV in the absence ofmaternal nevirapine (69).

Crohn disease and psoriasis. Crohn disease(CD), an inflammatory bowel disease, is achronic disorder that causes inflammation ofthe digestive tract (MIM 266600). It has beenshown that deficient expression of defensins,endogenous antimicrobial peptides protectingintestinal mucosa against bacterial invasion, canlead to chronic CD (34). Therefore, it washypothesized that low copy number of the

www.annualreviews.org • CNV in Human Health, Disease, Evolution 467

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

beta-defensin gene cluster may also be associ-ated with chronic CD (33). Fellermann et al.(33) showed a median of three HBD-2 (humanbeta-defensin 2) copies per genome in colonicCD patients, which was significantly lower thanthe median of four HBD-2 copies in healthycontrols (P = 0.002). Individuals with threeor lower copies of HBD-2 have a significantlyhigher risk of developing colonic CD than doindividuals with four or more copies (odds ratio3.06).

Different from the decreased risk of colonicCD in individuals with high beta-defensin genecopies, copy number gain of beta-defensingenes was shown to be associated with psoria-sis (MIM 177900), a chronic inflammatory der-matosis that affects approximately 2% of thepopulation (55).

SNPs around IRGM (immunity-relatedGTPase family, M) have been associated withCD (124, 176). Recently, McCarroll et al. (103)showed that a previously known 20-kb deletionpolymorphism upstream of IRGM is in perfectlinkage disequilibrium with a CD-associatedIRGM SNP. The deletion haplotype of IRGMwas shown to have a distinct expression patternof IRGM compared to the reference haplotype.Given that the IRGM expression can affectcellular autophagy of internalized bacteria, itwas suggested that the common deletion poly-morphism of IRGM may cause CD throughthe altered level of IRGM expression, affectingthe efficacy of autophagy (103).

Pancreatitis. In the 1990s, several genes weredetermined to be associated with pancreatitis(MIM 167800), including the cationic trypsino-gen gene PRSS1 (178). Considering that thepancreatitis-associated missense mutations ofPRSS1, for example, the R122H mutation, in-crease trypsin activity in vitro (134), it washypothesized that PRSS1 is a dosage-sensitivegene. When investigating a cohort of 34French families with hereditary pancreatitis, LeMarechal et al. (77) did not identify any knownpoint mutations in the pancreatitis-associatedgenes, but a 605-kb triplication encompassing

PRSS1 was discovered in five families, whichrepresents an identical-by-descent mutation.

Systemic lupus erythematosus andglomerulonephritis. Systemic lupus ery-thematosus (SLE; MIM 152700) is a chronic,remitting, relapsing, inflammatory, and oftenfebrile multisystemic disorder affecting skin,joints, kidneys, and serosal membranes, due tofailure in regulation of the immune system.

In an analysis of Fc receptor polymorphismsin northern European nuclear families withSLE, Aitman and colleagues found unexpectedMendelian errors at the FCGR3B gene in 14%of these families, which was hypothesized to becaused by CNV (1). Therefore, Aitman et al.(1) examined the potential FCGR3B CNV in30 individuals from 8 nuclear families, in whichFCGR3B polymorphisms showed Mendelianerrors, and identified significant variation inFCGR3B copy number. Though no associa-tion between FCGR3B CNV and SLE wasdetected, a weak association was identifiedwith lupus nephritis, an SLE subgroup withglomerulonephritis. This association was fur-ther strengthened in a later study by these au-thors with a larger sample size (P value reducedfrom 1 × 10−3 to 1.4 × 10−8) (31). Further-more, an increased risk for development of SLEin individuals with fewer than two copies ofFCGR3B was reported in the U.K. cohort (31).Willcocks et al. (179) confirmed the associationof SLE with low FCGR3B CNV in Caucasians.

Complement component 4 (C4, includingC4A and C4B) gene mutations have long beenknown to be associated with SLE (36). Yanget al. (184) examined the C4 CNVs in 1241Americans of European descent. The copynumber of C4 varied from 2 to 6 (C4A, 0 to5; C4B, 0 to 4), and the risk of SLE increased inthe subjects with low C4 copies but decreasedin those with high C4 copies (184).

Molecular Mechanisms by which CopyNumber Variations Convey Phenotype

CNVs caused by genomic rearrangements canconvey phenotypes by the following molecular

468 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

Table 4 Specific example of copy number variation (CNV) disease mechanisms

Mechanism Affected gene Disease Reference(s)a

Gene dosage PMP22 CMT1A/HNPP S11, S40, S41, S60Gene interruption F8 Hemophilia A S2Gene fusion CYP11B1, CYP11B2 Hypertension and GRAb S39Position effects SOX9 Campomelic dysplasia S7, S66Unmasking of recessive allelesor functional polymorphism

FXII Sotos syndrome S35

Potential transvection effect Rai1 SMS S73

aFor the list of supplemental references, follow the Supplemental Material link from the Annual Reviews home page athttp://www.annualreviews.org.bGRA, glucocorticoid-remediable aldosteronism.

mechanisms: (a) gene dosage, (b) gene inter-ruption, (c) gene fusion, (d ) position effects,(e) unmasking of recessive alleles or functionalpolymorphism, and ( f ) potential transvectioneffects (98) (Table 4).

CNVs involving dosage-sensitive genes,such as PMP22, can alter gene expression lev-els and cause consequent clinical phenotypes.PMP22 (encoding peripheral myelin protein)is located within the 1.4-Mb CMT1A region at17p12, whose duplication can lead to CMT1Aby PMP22 overexpression (96, 97, 99), whereasdeletion can result in HNPP by PMP22 under-expression (i.e., haploinsufficiency) (16, 96).

When the breakpoint of a deletion, inser-tion, or tandem duplication is located withina functional gene, it may interrupt the geneand cause a loss of function by inactivating agene as exemplified by red-green opsin genesand color blindness (117). Gene fusion causedby genomic rearrangements between differentgenes or their regulatory sequences can gen-erate a gain-of-function mutation. This mech-anism is prominent among cancers associatedwith specific somatic chromosomal transloca-tions. Disease-associated gene fusion has alsobeen found in hypertension (85). Genes en-coding aldosterone synthase and steroid 11beta-hydroxylase on 8q are candidate genesfor glucocorticoid-remediable aldosteronism(GRA, an autosomal dominant disorder thatis characterized by hypertension with variablehyperaldosteronism). These two genes have95% identity, and NAHR-caused gene fusion

between them segregates with GRA in a largekindred (85).

By removing or altering a regulatory se-quence, CNV can have an effect on expressionor regulation of a nearby gene out of the CNVregion, i.e., position effect (66). For example,mutations in SOX9 lead to campomelic dyspla-sia, but Velagaleti et al. (170) reported that twobalanced translocations, with breakpoints map-ping to approximately 900 kb upstream and 1.3Mb downstream of SOX9 can also cause disease.Many other CNVs have been identified to al-ter gene expression and cause human diseases(66).

Deletion removing one allele may unmaskanother recessive allele or functional polymor-phism. For instance, the activity of the plasmacoagulation factor 12 (FXII) in patients with thecommon Sotos syndrome deletion is predomi-nantly determined by the functional polymor-phism of the remaining hemizygous FXII allele(71).

Transvection, the influence of gene expres-sion by the pairing of alleles on homologouschromosomes, is one mechanism for trans regu-lation (25). Its effect is mediated via deletion ofregulatory elements required for communica-tion between alleles. When studying the mousemodels of SMS, Yan et al. (183) found that thepenetrance of craniofacial anomalies (a majorclinical manifestation of SMS) was modifiedby the 590-kb genomic sequence surroundingRai1, in which potential transvection or othertrans-regulatory factors may exist.

www.annualreviews.org • CNV in Human Health, Disease, Evolution 469

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

COPY NUMBER VARIATION:EVOLUTION

CNVs can lead to diseases or other human traitsby involving dosage-sensitive genes, disruptingfunctional genes, or other molecular mecha-nisms. Therefore, CNVs can also be potentiallyexposed to selection pressure during evolution,which has been confirmed by the observationsin humans and other primates, mice, and fruitflies.

Purifying Selection

Substantial evidence that the location of CNVsis biased away from functional sequences hasbeen found in human genomes, which suggestspurifying selection on CNVs within humans.By determining the proportion of SNPs withindeletion CNVs in coding sequence versus in-trons, Conrad et al. (18) found strong underrep-resentation of genic SNPs in deletion regionscompared with the HapMap average. This find-ing has been confirmed by the observations ofRedon et al. (132) that CNVs are preferentiallylocated outside of genes and ultraconserved el-ements in the human genome and that a signifi-cantly lower proportion of deletions than dupli-cations overlaps with disease-related genes andRefSeq genes.

Purifying selection on CNVs, especiallydeletions, was also observed in flies. Emersonet al. (29) used genome-tiling arrays to studythe CNVs in D. melanogaster and detected 2658independent CNVs, where duplications out-numbered deletions (dup/del = 2.5). This in-dicated purifying selection on deletion CNVs(29). Dopman & Hartl (23) also detected a sim-ilar strong purifying selection on the CNVs inthe Drosophila genome, especially those locatedin functionally constrained regions.

Gene Duplicationand Positive Selection

Gene duplication has long been thought tobe a central mechanism driving long-termevolutionary changes (56, 121). Selection hasalso been shown to shape the architecture of

segmental duplications during human genomeevolution (61). Studying CNVs, especially geneamplification favored by positive selection, dur-ing evolution may help us discover new func-tional genes and reveal the genomic alter-ation and environmental impact driving humanevolution.

Sikela and colleagues (24) used aCGH of41,126 human cDNA from 24,473 unique hu-man genes to study gene CNVs spanning60 million years of human and primate evo-lution. Gene duplications and losses were sur-veyed on a genome-wide scale across 10 pri-mate species including human. It was found that6,696 (27.4%) of the examined human genesrepresent CNVs in one or more of the 10primate species (24). Remarkably, gene gainstypically outnumbered losses (gains/losses =2.34), which suggests positive selection in pri-mate genome evolution. Furthermore, someCNVs discovered are lineage specific (LS) (24).Studying these human LS CNVs may reveal theevolutionary process driving the emergence ofhuman-specific traits such as cognition.

In their earlier study of the human LS ampli-fication (129), Sikela and colleagues found thatthe strongest signal comes from the multiple-copy protein domain, DUF1220. The copynumber of DUF1220 was shown to be highlyexpanded in humans, reduced in African greatapes, further reduced in orangutan and OldWorld monkeys, only single-copy in nonpri-mate mammals, and absent in nonmammalianspecies (129). Examination of expression inbrain showed that neuron-specific DUF1220signals were present in the cortical layers ofthe hippocampus and also abundant in neuronswithin the neocortex (129). Both evolution-ary and functional evidence suggests DUF1220and its expansion in the human lineage is criti-cal to higher cognitive functions. Notably, themajority of DUF1220 sequences are locatedat 1q21.1 (129), lying within regions of CNVthat are associated with MR (143), schizophre-nia (156, 161, 173), and microcephaly/macrocephaly (13). This suggests a link be-tween DUF1220 and human brain/cognitionfunction and behavior.

470 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

Other examples of human LS gene copy-number expansions are AQP7. This is impor-tant for increasing glycerol transport for mo-bilization of energy stores and possibly watertransport involved in exercise-induced sweat-ing (24) and the salivary amylase gene, AMY1,which is correlated positively with salivary amy-lase protein levels and amount of starch in thediet among humans (126).

These examples of human-specific orprimate-specific gene amplification illustratethat CNVs encompassing functional genescan be evolutionally favored because of theiradaptive benefits. Similar evidence for positiveselection on gene duplication has also beenfound in other species. In the study of Emersonet al. on flies, 56% of the CNVs were found toaffect genes and most notably high-frequencyduplication CNVs were found to involvetoxin-response genes (for example, Cyp6g1contributing to resistance to DTT). Thissuggested potential positive selection on theseCNVs (29). In inbred mouse strains, large andcomplex interstrain CNVs (mainly duplica-tions) were shown to be restricted to gene fam-ilies functional in spermatogenesis, pregnancy,and immune response (149). However, it wasrecently shown that the genic biases of CNVscould alternatively be explained by reducedefficiency of purifying selection in eliminatingdeleterious changes in humans (119). In addi-tion to gene duplication, CNV can also lead

to exon shuffling (42, 186), another hypothesisproposed to be responsible for the origin of newgenes.

CONCLUSION

CNV has been recognized as a predominantsource of genetic variation among humanindividuals. CNV encompasses more humangenomic content and has a higher per-locusmutation rate than does SNP. Recombination-based mechanisms (e.g., NAHR and NHEJ),retrotransposition, and a new replication-basedFoSTeS and/or MMBIR mechanism have beenshown to play roles in CNV formation. CNVcan convey sporadic diseases, Mendelian andcomplex traits by dosage effects [triplication,e.g., PLP1 (180), MECP2 (22), LIS1 (11),sometimes conveying more severe phenotypicconsequences than duplication], gene dis-ruption, position effect, and other molecularmechanisms. CNV is also thought to drivehuman genome evolution via gene duplicationand exon shuffling. However, due to the limitsof interrogating resolution and genome cov-erage of the present technologies employed inCNV studies, much more undiscovered CNVmay exist in the human genomes, and furthercomprehensive investigation is expected toadvance our knowledge of the distribution,formation, conveyed phenotype or geneticsusceptibility, selection, and evolution of CNV.

SUMMARY POINTS

1. Structural variation, including CNV, is responsible for a large fraction of human geneticvariation.

2. CNVs can represent benign polymorphic variations or convey clinical phenotypes bymechanisms such as altered gene dosage and gene disruption.

3. Human genome rearrangements can occur by several mechanisms that include both re-combination (NAHR and NHEJ) and replication (FoSTeS/MMBIR)-based mechanisms.The latter can result in complex genomic rearrangements.

4. Locus-specific de novo mutation rates for CNV can be 100 to 10,000 times more frequentthan for SNP.

www.annualreviews.org • CNV in Human Health, Disease, Evolution 471

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

5. The relative contribution of CNV to locus-specific mutation rate may vary through-out the human genome and can reflect local genome architecture resulting in regionalsusceptibility to genome instability.

6. CNV is important to human genome and gene evolution.

7. Gene duplication and triplication, and potentially exon shuffling, can occur by the FoS-TeS and/or MMBIR mechanism.

FUTURE ISSUES

1. How should we define CNV? CNVs have been defined as “a segment of DNA that is 1 kbor larger and is present at a variable copy number in comparison with a reference genome”(35). However, the cutoff of 1 kb is completely arbitrary. In fact, one might argue that2 bp or more is a CNV versus a SNP using a chemical definition that SNP changes only thebase in the DNA, whereas the sugar-phosphate backbone needs to be disrupted/alteredto make a CNV. Nevertheless, based on a functional definition, it may be better to choosean average exon size (∼100 bp) as a parameter for defining CNV. Recent observations inthe Watson and Venter genomes clearly indicate that the CNV size distributions show amarked enrichment in the range of 300 to 350 bp owing to the known retrotransposition-based Alu polymorphisms (81, 177) (Figure 1). As analyses of higher resolution areapplied, many more CNVs of smaller size ranges are likely to be discovered (Figure 1).

2. What will be the standard for determining if a copy number variant causes or is associatedwith susceptibility to a given phenotype? It is challenging to ascribe a phenotype to aCNV [genomotype/phenotype correlations (11)] because often there is no genetic codeto help establish causation as there can be for nucleotide changes that result in nonsenseor frameshift alleles. Penetrance for a CNV may not be complete and duplication CNVsmay confer milder phenotypes than deletion CNVs at a given locus. Finally, it is ex-tremely difficult to determine that a CNV does not influence a phenotype, challengingthe notion of CNVs as benign polymorphic variants. Bringing together the commondisease genetics community, which is undertaking large-scale association studies, and theclinical genetics community, which seeks to generate meaningful results for individualpatients with relatively rare phenotypes, into a unified view of the allelic architecture ofgenetic diseases must be a clear objective for the coming years.

3. Will forward and reverse genomics identify the molecular bases of many traits? Foralmost a century forward genetics has been used to map Mendelian traits to specificloci in many organisms. With the advent of recombinant DNA, reverse genetics hasenabled the creation of specific gene mutations and the elucidation of the phenotypicconsequences of single gene alterations. Currently, the forward genomics clinicalimplementation of genome-wide arrays to identify CNVs responsible for disease traitsis uncovering many regions of the human genome wherein genomic changes result indisease or susceptibility to disease. Can reverse genomics now be used to systematicallyanalyze phenotype(s) conveyed by either a deletion or duplication of a specific region ofthe human genome? To what extent will reverse genomics help elucidate gene functionfor the remaining ∼ 90% of annotated genes not yet assigned a function and also addressthe question: What is the genomic code (95)?

472 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

4. Will CNV provide novel routes to therapy? If a CNV can result in a disease phenotypeby virtue of gene dosage, will normalizing gene dosage through epigenetic manipulationcorrect the disease (93)? Studies of an animal model for the CMT1A duplication demon-strate that the neuropathy phenotype due to PMP22 overexpression can be mitigatedby treatment with a progesterone antagonist that reduces PMP22 expression (108, 141).Perhaps RNAi may be effective in reducing the expression of the dosage sensitive gene(s)in duplication disorders.

5. What will be the impact of CNV on evolution? Due to the low resolution of methodsutilized previously (e.g., BAC aCGH) in CNV screening, the majority of the identifiedCNVs have not yet been finely resolved to the nucleotide level. This precludes the abilityto perform CNV genotyping. For over 90% of reported CNVs, we do not know theirtrue population frequency because they have not been genotyped. To study the roles ofCNV in genome evolution, we will need large CNV genotype datasets from differentpopulations. Also, precise de novo CNV mutation rates throughout the genome arerequired to better understand the contribution of CNV versus SNP to genome evolution,particularly with respect to gene duplication/triplication and exon shuffling.

6. We need a better reference human genome. For designing genome-wide arrays andcomparisons of personal genome sequences, one requires some reference from whichto choose oligonucleotides to be placed on arrays and also to use for computationalcomparative analysis when sequencing. The current draft/finished genome must have itsgaps filled and robust sequences for both heterochromatic transition regions and hete-rochromatin determined. Furthermore, CNV information needs to be integrated withthe reference sequence such that nucleotide positions of breakpoints, rather than justgenomic regions, are detailed in the reference genome. Comparison with the currentreference genome can also lead to erroneous conclusions. For example, we have experi-enced in the studies on genomic duplications at the SMS locus that a de novo tandemduplication spanning across one end of an inversion allele can be misinterpreted to bea de novo complex duplication because only one inversion haplotype is in the currentreference genome (186). Also, as mentioned above, personal genome sequencing mayidentify genomic segments not present on the current reference sequence. Thus, be-cause the current reference genome sequence contains the deleted allele at a number ofcommon CNVs, these will be missed by microarray-based surveys given that the arraydesign utilizes the reference genome sequence.

DISCLOSURE STATEMENT

J.R.L. is a consultant for Athena Diagnostics, 23andMe, and Ion Torrent Systems Inc., and holdsmultiple U.S. and European patents for DNA diagnostics. Furthermore, the Department ofMolecular and Human Genetics at Baylor College of Medicine derives revenue from molecu-lar diagnostic testing (MGL, http://www.bcm.edu/geneticlabs/).

ACKNOWLEDGMENTS

We thank Drs. Nicola Bruneti-Pieri, Claudia Carvalho, Katrina Gwinn, and Pawel Stankiewicz fortheir critical reviews. Work in the Lupski laboratory has been sponsored by the National Institutesof Health, the March of Dimes, the Muscular Dystrophy Association, the Foundation Fighting

www.annualreviews.org • CNV in Human Health, Disease, Evolution 473

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

Blindness, and the Charcot-Marie-Tooth Association. Work in the Hurles laboratory is supportedby the Wellcome Trust. Wenli Gu is a Feodor-Lynen Research Fellow generously supported bythe Alexander-von-Humboldt Stiftung.

LITERATURE CITED

1. Aitman TJ, Dong R, Vyse TJ, Norsworthy PJ, Johnson MD, et al. 2006. Copy number polymorphismin Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439:851–55

2. Babcock M, Pavlicek A, Spiteri E, Kashork CD, Ioshikhes I, et al. 2003. Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution. Genome Res.13:2519–32

3. Babushok DV, Kazazian HH Jr. 2007. Progress in understanding the biology of the human mutagenLINE-1. Hum. Mutat. 28:527–39

4. Bailey A, Phillips W, Rutter M. 1996. Autism: towards an integration of clinical, genetic, neuropsycho-logical, and neurobiological perspectives. J. Child Psychol. Psychiatry 37:89–126

5. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, et al. 2002. Recent segmental duplications in thehuman genome. Science 297:1003–7

6. Bailey JA, Liu G, Eichler EE. 2003. An Alu transposition model for the origin and expansion of humansegmental duplications. Am. J. Hum. Genet. 73:823–34

7. Barbouti A, Stankiewicz P, Nusbaum C, Cuomo C, Cook A, et al. 2004. The breakpoint region ofthe most common isochromosome, i(17q), in human neoplasia is characterized by a complex genomicarchitecture with large, palindromic, low-copy repeats. Am. J. Hum. Genet. 74:1–10

8. Bauters M, Van Esch H, Friez MJ, Boespflug-Tanguy O, Zenker M, et al. 2008. Nonrecurrent MECP2duplications mediated by genomic architecture-driven DNA breaks and break-induced replication repair.Genome Res. 18:847–58

9. Bayes M, Magano LF, Rivera N, Flores R, Perez Jurado LA. 2003. Mutational mechanisms of Williams-Beuren syndrome deletions. Am. J. Hum. Genet. 73:131–51

10. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. 2008. Accurate whole humangenome sequencing using reversible terminator chemistry. Nature 456:53–59

11. Bi W, Sapir T, Shchelochkov OA, Zhang F, Withers MA, et al. 2009. Increased LIS1 expression affectshuman and mouse brain development. Nat. Genet. 41:168–77

12. Bovee D, Zhou Y, Haugen E, Wu Z, Hayden HS, et al. 2008. Closing gaps in the human genome withfosmid resources generated from multiple individuals. Nat. Genet. 40:96–101

13. Brunetti-Pierri N, Berg JS, Scaglia F, Belmont J, Bacino CA, et al. 2008. Recurrent reciprocal 1q21.1deletions and duplications associated with microcephaly or macrocephaly and developmental and behav-ioral abnormalities. Nat. Genet. 40:1466–71

14. Carvalho CM, Lupski JR. 2008. Copy number variation at the breakpoint region of isochromosome 17q.Genome Res. 18:1724–32

15. Carvalho CM, Zhang F, Liu P, Patel P, Sahoo T, et al. 2008. Some complex rearrangements in patientswith duplication of MECP2 may occur by fork stalling and template switching. Hum. Mol. Genet. 18:2188–203

16. Chance PF, Alderson MK, Leppig KA, Lensch MW, Matsunami N, et al. 1993. DNA deletion associatedwith hereditary neuropathy with liability to pressure palsies. Cell 72:143–51

17. Chartier-Harlin MC, Kachergus J, Roumier C, Mouroux V, Douay X, et al. 2004. Alpha-synuclein locusduplication as a cause of familial Parkinson’s disease. Lancet 364:1167–69

18. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. 2006. A high-resolution survey ofdeletion polymorphism in the human genome. Nat. Genet. 38:75–81

19. Cooper DN, Youssoufian H. 1988. The CpG dinucleotide and human genetic disease. Hum. Genet.78:151–55

20. Cusco I, Corominas R, Bayes M, Flores R, Rivera-Brugues N, et al. 2008. Copy number variation atthe 7q11.23 segmental duplications is a susceptibility factor for the Williams-Beuren syndrome deletion.Genome Res. 18:683–94

474 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

21. de Vries BB, Pfundt R, Leisink M, Koolen DA, Vissers LE, et al. 2005. Diagnostic genome profiling inmental retardation. Am. J. Hum. Genet. 77:606–16

22. del Gaudio D, Fang P, Scaglia F, Ward PA, Craigen WJ, et al. 2006. Increased MECP2 gene copy numberas the result of genomic duplication in neurodevelopmentally delayed males. Genet. Med. 8:784–92

23. Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc.Natl. Acad. Sci. USA 104:19920–25

24. Dumas L, Kim YH, Karimpour-Fard A, Cox M, Hopkins J, et al. 2007. Gene copy number variationspanning 60 million years of human and primate evolution. Genome Res. 17:1266–77

25. Duncan IW. 2002. Transvection effects in Drosophila. Annu. Rev. Genet. 36:521–5626. Eichler EE. 2006. Widening the spectrum of human genetic variation. Nat. Genet. 38:9–1127. Eichler EE, Clark RA, She X. 2004. An assessment of the sequence gaps: unfinished business in a finished

human genome. Nat. Rev. Genet. 5:345–5428. Eichler EE, Nickerson DA, Altshuler D, Bowcock AM, Brooks LD, et al. 2007. Completing the map of

human genetic variation. Nature 447:161–6529. Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome-wide

patterns of copy-number polymorphism in Drosophila melanogaster. Science 320:1629–3130. Erlandson A, Samuelsson L, Hagberg B, Kyllerman M, Vujic M, Wahlstrom J. 2003. Multiplex ligation-

dependent probe amplification (MLPA) detects large deletions in the MECP2 gene of Swedish Rettsyndrome patients. Genet. Test 7:329–32

31. Fanciulli M, Norsworthy PJ, Petretto E, Dong R, Harper L, et al. 2007. FCGR3B copy number variationis associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat. Genet. 39:721–23

32. Farrer M, Kachergus J, Forno L, Lincoln S, Wang DS, et al. 2004. Comparison of kindreds withparkinsonism and alpha-synuclein genomic multiplications. Ann. Neurol. 55:174–79

33. Fellermann K, Stange DE, Schaeffeler E, Schmalzl H, Wehkamp J, et al. 2006. A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn diseaseof the colon. Am. J. Hum. Genet. 79:439–48

34. Fellermann K, Wehkamp J, Herrlinger KR, Stange EF. 2003. Crohn’s disease: a defensin deficiencysyndrome? Eur. J. Gastroenterol. Hepatol. 15:627–34

35. Feuk L, Carson AR, Scherer SW. 2006. Structural variation in the human genome. Nat. Rev. Genet.7:85–97

36. Fielder AH, Walport MJ, Batchelor JR, Rynes RI, Black CM, et al. 1983. Family study of the majorhistocompatibility complex in patients with systemic lupus erythematosus: importance of null alleles ofC4A and C4B in determining disease susceptibility. Br. Med. J. 286:425–28

37. Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, et al. 2009. DECIPHER: DatabasE of Chro-mosomal Imbalance and Phenotype in Humans using Ensembl Resources. Am. J. Hum. Genet. 84:524–33

38. Flint J, Hill AV, Bowden DK, Oppenheimer SJ, Sill PR, et al. 1986. High frequencies of alpha-thalassaemia are the result of natural selection by malaria. Nature 321:744–50

39. Flores M, Morales L, Gonzaga-Jauregui C, Dominguez-Vidana R, Zepeda C, et al. 2007. RecurrentDNA inversion rearrangements in the human genome. Proc. Natl. Acad. Sci. USA 104:6099–106

40. Froyen G, Corbett M, Vandewalle J, Jarvela I, Lawrence O, et al. 2008. Submicroscopic duplications ofthe hydroxysteroid dehydrogenase HSD17B10 and the E3 ubiquitin ligase HUWE1 are associated withmental retardation. Am. J. Hum. Genet. 82:432–43

41. Fuchs J, Nilsson C, Kachergus J, Munz M, Larsson EM, et al. 2007. Phenotypic variation in a largeSwedish pedigree due to SNCA duplication and triplication. Neurology 68:916–22

42. Gilbert W. 1978. Why genes in pieces? Nature 271:50143. Goate A, Chartier-Harlin MC, Mullan M, Brown J, Crawford F, et al. 1991. Segregation of a missense

mutation in the amyloid precursor protein gene with familial Alzheimer’s disease. Nature 349:704–644. Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, et al. 2005. The influence of CCL3L1

gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307:1434–4045. Goodier JL, Kazazian HH Jr. 2008. Retrotransposons revisited: the restraint and rehabilitation of para-

sites. Cell 135:23–35

www.annualreviews.org • CNV in Human Health, Disease, Evolution 475

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

46. Greenawalt DM, Cui X, Wu Y, Lin Y, Wang HY, et al. 2006. Strong correlation between meioticcrossovers and haplotype structure in a 2.5-Mb region on the long arm of chromosome 21. Genome Res.16:208–14

47. Gu W, Zhang F, Lupski JR. 2008. Mechanisms for human genomic rearrangements. PathoGenetics 1:448. Han K, Lee J, Meyer TJ, Remedios P, Goodwin L, Batzer MA. 2008. L1 recombination-associated

deletions generate human genomic variation. Proc. Natl. Acad. Sci. USA 105:19366–7149. Han K, Lee J, Meyer TJ, Wang J, Sen SK, et al. 2007. Alu recombination-mediated structural deletions

in the chimpanzee genome. PLoS Genet. 3:1939–4950. Hannes FD, Sharp AJ, Mefford HC, de Ravel T, Ruivenkamp CA, et al. 2009. Recurrent reciprocal

deletions and duplications of 16p13.11: The deletion is a risk factor for MR/MCA while the duplicationmay be a rare benign variant. J. Med. Genet. 46:223–32

51. Hastings PJ, Ira G, Lupski JR. 2009. A microhomology-mediated break-induced replication model forthe origin of human copy number variation. PLoS Genet. 5:e1000327

52. Helbig I, Mefford HC, Sharp AJ, Guipponi M, Fichera M, et al. 2009. 15q13.3 microdeletions increaserisk of idiopathic generalized epilepsy. Nat. Genet. 41:160–62

53. Higgs DR, Pressley L, Old JM, Hunt DM, Clegg JB, et al. 1979. Negro alpha-thalassaemia is caused bydeletion of a single alpha-globin gene. Lancet 2:272–76

54. Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. 2006. Common deletions and SNPs are in linkagedisequilibrium in the human genome. Nat. Genet. 38:82–85

55. Hollox EJ, Huffmeier U, Zeeuwen PL, Palla R, Lascorz J, et al. 2008. Psoriasis is associated with increasedbeta-defensin genomic copy number. Nat. Genet. 40:23–25

56. Hurles M. 2004. Gene duplication: the genomic trade in spare parts. PLoS Biol. 2:e20657. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, et al. 2004. Detection of large-scale

variation in the human genome. Nat. Genet. 36:949–5158. Ibanez P, Bonnet AM, Debarges B, Lohmann E, Tison F, et al. 2004. Causal relation between alpha-

synuclein gene duplication and familial Parkinson’s disease. Lancet 364:1169–7159. Int. Human Genome Sequencing Consort. 2004. Finishing the euchromatic sequence of the human

genome. Nature 431:931–4560. Jeffreys AJ, Neumann R, Panayi M, Myers S, Donnelly P. 2005. Human recombination hot spots hidden

in regions of strong marker association. Nat. Genet. 37:601–661. Jiang Z, Tang H, Ventura M, Cardone MF, Marques-Bonet T, et al. 2007. Ancestral reconstruction of

segmental duplications reveals punctuated cores of human genome evolution. Nat. Genet. 39:1361–6862. Karayiorgou M, Morris MA, Morrow B, Shprintzen RJ, Goldberg R, et al. 1995. Schizophrenia suscep-

tibility associated with interstitial deletions of chromosome 22q11. Proc. Natl. Acad. Sci. USA 92:7612–1663. Kazazian HH Jr, Moran JV. 1998. The impact of L1 retrotransposons on the human genome. Nat. Genet.

19:19–2464. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, et al. 2008. Mapping and sequencing of

structural variation from eight human genomes. Nature 453:56–6465. Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, et al. 2008. Analysis of copy number variants

and segmental duplications in the human genome: evidence for a change in the process of formation inrecent evolutionary history. Genome Res. 18:1865–74

66. Kleinjan DA, van Heyningen V. 2005. Long-range control of gene expression: emerging mechanismsand disruption in disease. Am. J. Hum. Genet. 76:8–32

67. Kondrashov AS. 2003. Direct estimates of human per nucleotide mutation rates at 20 loci causingMendelian diseases. Hum. Mutat. 21:12–27

68. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, et al. 2007. Paired-end mapping revealsextensive structural variation in the human genome. Science 318:420–26

69. Kuhn L, Schramm DB, Donninger S, Meddows-Taylor S, Coovadia AH, et al. 2007. African infants’CCL3 gene copies influence perinatal HIV transmission in the absence of maternal nevirapine. AIDS21:1753–61

70. Kumar RA, KaraMohamed S, Sudi J, Conrad DF, Brune C, et al. 2008. Recurrent 16p11.2 microdeletionsin autism. Hum. Mol. Genet. 17:628–38

476 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

71. Kurotaki N, Shen JJ, Touyama M, Kondoh T, Visser R, et al. 2005. Phenotypic consequences of geneticvariation at hemizygous alleles: Sotos syndrome is a contiguous gene syndrome incorporating coagulationfactor twelve (FXII) deficiency. Genet. Med. 7:479–83

72. Kurotaki N, Stankiewicz P, Wakui K, Niikawa N, Lupski JR. 2005. Sotos syndrome common deletionis mediated by directly oriented subunits within inverted Sos-REP low-copy repeats. Hum. Mol. Genet.14:535–42

73. Lalani SR, Thakuria JV, Cox GF, Wang X, Bi W, et al. 2009. 20p12.3 microdeletion predisposes toWolff-Parkinson-White syndrome with variable neurocognitive deficits. J. Med. Genet. 46:168–75

74. Lalic T, Vossen RH, Coffa J, Schouten JP, Guc-Scekic M, et al. 2005. Deletion and duplication screeningin the DMD gene using MLPA. Eur. J. Hum. Genet. 13:1231–34

75. Lam KW, Jeffreys AJ. 2006. Processes of copy-number change in human DNA: the dynamics of α-globingene deletion. Proc. Natl. Acad. Sci. USA 103:8921–27

76. Lam KW, Jeffreys AJ. 2007. Processes of de novo duplication of human alpha-globin genes. Proc. Natl.Acad. Sci. USA 104:10950–55

77. Le Marechal C, Masson E, Chen JM, Morel F, Ruszniewski P, et al. 2006. Hereditary pancreatitis causedby triplication of the trypsinogen locus. Nat. Genet. 38:1372–74

78. Lee JA, Carvalho CM, Lupski JR. 2007. A DNA replication mechanism for generating nonrecurrentrearrangements associated with genomic disorders. Cell 131:1235–47

79. Lejeune J, Gautier M, Turpin R. 1959. Etude des chromosomes somatiques de neuf enfants mongoliens.C. R. Acad. Sci. 248:1721–22

80. Lenski C, Kooy RF, Reyniers E, Loessner D, Wanders RJ, et al. 2007. The reduced expression of theHADH2 protein causes X-linked mental retardation, choreoathetosis, and abnormal behavior. Am. J.Hum. Genet. 80:372–77

81. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. 2007. The diploid genome sequence of an individualhuman. PLoS Biol. 5:e254

82. Lieber MR. 2008. The mechanism of human nonhomologous DNA end joining. J. Biol. Chem. 283:1–583. Lieber MR, Lu H, Gu J, Schwarz K. 2008. Flexibility in the order of action and in the enzymology of the

nuclease, polymerases, and ligase of vertebrate nonhomologous DNA end joining: relevance to cancer,aging, and the immune system. Cell Res. 18:125–33

84. Lieber MR, Ma Y, Pannicke U, Schwarz K. 2003. Mechanism and regulation of human nonhomologousDNA end-joining. Nat. Rev. Mol. Cell Biol. 4:712–20

85. Lifton RP, Dluhy RG, Powers M, Rich GM, Cook S, et al. 1992. A chimeric 11 beta-hydroxylase/aldosterone synthase gene causes glucocorticoid-remediable aldosteronism and human hy-pertension. Nature 355:262–65

86. Lindsay SJ, Khajavi M, Lupski JR, Hurles ME. 2006. A chromosomal rearrangement hotspot can beidentified from population genetic variation and is coincident with a hotspot for allelic recombination.Am. J. Hum. Genet. 79:890–902

87. Lu XY, Phung MT, Shaw CA, Pham K, Neil SE, et al. 2008. Genomic imbalances in neonates with birthdefects: high detection rates by using chromosomal microarray analysis. Pediatrics 122:1310–18

88. Lupski JR. 1998. Genomic disorders: structural features of the genome can lead to DNA rearrangementsand human disease traits. Trends Genet. 14:417–22

89. Lupski JR. 2004. Hotspots of homologous recombination in the human genome: not all homologoussequences are equal. Genome Biol. 5:242

90. Lupski JR. 2006. Genome structural variation and sporadic disease traits. Nat. Genet. 38:974–7691. Lupski JR. 2007. An evolution revolution provides further revelation. BioEssays 29:1182–8492. Lupski JR. 2007. Genomic rearrangements and sporadic disease. Nat. Genet. 39:S43–S793. Lupski JR. 2007. Structural variation in the human genome. N. Engl. J. Med. 356:1169–7194. Lupski JR. 2008. Schizophrenia: incriminating genomic evidence. Nature 455:178–7995. Lupski JR. 2009. Genomic disorders ten years on. Genome Med. 1:4296. Lupski JR, Chance PF. 2005. Hereditary motor and sensory neuropathies involving altered dosage or

mutation of PMP22: the CMT1A duplication and HNPP deletion. In Peripheral Neuropathy, ed. PJ Dyck,PK Thomas, pp. 1659–80. Philadelphia: Elsevier

www.annualreviews.org • CNV in Human Health, Disease, Evolution 477

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

97. Lupski JR, de Oca-Luna RM, Slaugenhaupt S, Pentao L, Guzzetta V, et al. 1991. DNA duplicationassociated with Charcot-Marie-Tooth disease type 1A. Cell 66:219–32

98. Lupski JR, Stankiewicz P. 2005. Genomic disorders: molecular mechanisms for rearrangements andconveyed phenotypes. PLoS Genet. 1:e49

99. Lupski JR, Wise CA, Kuwano A, Pentao L, Parke JT, et al. 1992. Gene dosage is a mechanism forCharcot-Marie-Tooth disease type 1A. Nat. Genet. 1:29–33

100. Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, et al. 2008. Structural variation of chromosomesin autism spectrum disorder. Am. J. Hum. Genet. 82:477–88

101. Matise TC, Chakravarti A, Patel PI, Lupski JR, Nelis E, et al. 1994. Detection of tandem duplicationsand implications for linkage analysis. Am. J. Hum. Genet. 54:1110–21

102. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, et al. 2006. Common deletion polymor-phisms in the human genome. Nat. Genet. 38:86–92

103. McCarroll SA, Huett A, Kuballa P, Chilewski SD, Landry A, et al. 2008. Deletion polymorphism up-stream of IRGM associated with altered IRGM expression and Crohn’s disease. Nat. Genet. 40:1107–12

104. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, et al. 2008. Integrated detection andpopulation-genetic analysis of SNPs and copy number variation. Nat. Genet. 40:1166–74

105. Mefford HC, Clauin S, Sharp AJ, Moller RS, Ullmann R, et al. 2007. Recurrent reciprocal genomicrearrangements of 17q12 are associated with renal disease, diabetes, and epilepsy. Am. J. Hum. Genet.81:1057–69

106. Mefford HC, Sharp AJ, Baker C, Itsara A, Jiang Z, et al. 2008. Recurrent rearrangements of chromosome1q21.1 and variable pediatric phenotypes. N. Engl. J. Med. 359:1685–99

107. Menten P, Wuyts A, Van Damme J. 2002. Macrophage inflammatory protein-1. Cytokine Growth FactorRev. 13:455–81

108. Meyer zu Horste G, Prukop T, Liebetanz D, Mobius W, Nave KA, Sereda MW. 2007. Antiprogesteronetherapy uncouples axonal loss from demyelination in a transgenic rat model of CMT1A neuropathy. Ann.Neurol. 61:61–72

109. Miller DW, Hague SM, Clarimon J, Baptista M, Gwinn-Hardy K, et al. 2004. Alpha-synuclein in bloodand brain from familial Parkinson disease with SNCA locus triplication. Neurology 62:1835–38

110. Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, et al. 2006. An initial map of insertion anddeletion (INDEL) variation in the human genome. Genome Res. 16:1182–90

111. Morrow EM, Yoo SY, Flavell SW, Kim TK, Lin Y, et al. 2008. Identifying autism loci and genes bytracing recent shared ancestry. Science 321:218–23

112. Musio A, Selicorni A, Focarelli ML, Gervasini C, Milani D, et al. 2006. X-linked Cornelia de Langesyndrome owing to SMC1L1 mutations. Nat. Genet. 38:528–30

113. Myers S, Freeman C, Auton A, Donnelly P, McVean G. 2008. A common sequence motif associatedwith recombination hot spots and genome instability in humans. Nat. Genet. 40:1124–29

114. Myers SR, McCarroll SA. 2006. New insights into the biological basis of genomic disorders. Nat. Genet.38:1363–64

115. Nachman MW, Crowell SL. 2000. Estimate of the mutation rate per nucleotide in humans. Genetics156:297–304

116. Naruse K, Ueno M, Satoh T, Nomiyama H, Tei H, et al. 1996. A YAC contig of the human CC chemokinegenes clustered on chromosome 17q11.2. Genomics 34:236–40

117. Nathans J, Piantanida TP, Eddy RL, Shows TB, Hogness DS. 1986. Molecular genetics of inheritedvariation in human color vision. Science 232:203–10

118. Nathans J, Thomas D, Hogness DS. 1986. Molecular genetics of human color vision: the genes encodingblue, green, and red pigments. Science 232:193–202

119. Nguyen DQ, Webber C, Hehir-Kwa J, Pfundt R, Veltman J, Ponting CP. 2008. Reduced purifyingselection prevails over positive selection in human copy number variant evolution. Genome Res. 18:1711–23

120. Nobile C, Toffolatti L, Rizzi F, Simionati B, Nigro V, et al. 2002. Analysis of 22 deletion breakpoints indystrophin intron 49. Hum. Genet. 110:418–21

121. Ohno S. 1970. Evolution by Gene Duplication. Berlin: Springer-Verlag

478 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

122. Ostertag EM, Goodier JL, Zhang Y, Kazazian HH Jr. 2003. SVA elements are nonautonomous retro-transposons that cause disease in humans. Am. J. Hum. Genet. 73:1444–51

123. Ostertag EM, Kazazian HH Jr. 2001. Biology of mammalian L1 retrotransposons. Annu. Rev. Genet.35:501–38

124. Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, et al. 2007. Sequence variants in theautophagy gene IRGM and multiple other replicating loci contribute to Crohn’s disease susceptibility.Nat. Genet. 39:830–32

125. Perry GH, Ben-Dor A, Tsalenko A, Sampas N, Rodriguez-Revenga L, et al. 2008. The fine-scale andcomplex architecture of human copy-number variation. Am. J. Hum. Genet. 82:685–95

126. Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, et al. 2007. Diet and the evolution of humanamylase gene copy number variation. Nat. Genet. 39:1256–60

127. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, et al. 1998. High resolution analysis of DNA copynumber variation using comparative genomic hybridization to microarrays. Nat. Genet. 20:207–11

128. Polymeropoulos MH, Higgins JJ, Golbe LI, Johnson WG, Ide SE, et al. 1996. Mapping of a gene forParkinson’s disease to chromosome 4q21-q23. Science 274:1197–99

129. Popesco MC, Maclaren EJ, Hopkins J, Dumas L, Cox M, et al. 2006. Human lineage-specific amplifi-cation, selection, and neuronal expression of DUF1220 domains. Science 313:1304–7

130. Potocki L, Bi W, Treadwell-Deering D, Carvalho CM, Eifert A, et al. 2007. Characterization of Potocki-Lupski syndrome (dup(17)(p11.2p11.2)) and delineation of a dosage-sensitive critical interval that canconvey an autism phenotype. Am. J. Hum. Genet. 80:633–49

131. Raedt TD, Stephens M, Heyns I, Brems H, Thijs D, et al. 2006. Conservation of hotspots for recombi-nation in low-copy repeats associated with the NF1 microdeletion. Nat. Genet. 38:1419–23

132. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, et al. 2006. Global variation in copy number in thehuman genome. Nature 444:444–54

133. Rovelet-Lecrux A, Hannequin D, Raux G, Le Meur N, Laquerriere A, et al. 2006. APP locus duplicationcauses autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat. Genet.38:24–26

134. Sahin-Toth M. 2006. Biochemical models of hereditary pancreatitis. Endocrinol. Metab. Clin. North Am.35:303–12, ix

135. Saunier S, Calado J, Benessy F, Silbermann F, Heilig R, et al. 2000. Characterization of the NPHP1locus: mutational mechanism involved in deletions in familial juvenile nephronophthisis. Am. J. Hum.Genet. 66:778–89

136. Schwarz K, Ma Y, Pannicke U, Lieber MR. 2003. Human severe combined immune deficiency and DNArepair. BioEssays 25:1061–70

137. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, et al. 2007. Strong association of de novo copynumber mutations with autism. Science 316:445–49

138. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, et al. 2004. Large-scale copy number polymorphismin the human genome. Science 305:525–28

139. Sen SK, Han K, Wang J, Lee J, Wang H, et al. 2006. Human genomic deletions mediated by recombi-nation between Alu elements. Am. J. Hum. Genet. 79:41–53

140. Sennvik K, Fastbom J, Blomberg M, Wahlund LO, Winblad B, Benedikz E. 2000. Levels of alpha-and beta-secretase cleaved amyloid precursor protein in the cerebrospinal fluid of Alzheimer’s diseasepatients. Neurosci. Lett. 278:169–72

141. Sereda MW, Meyer zu Horste G, Suter U, Uzma N, Nave KA. 2003. Therapeutic administration ofprogesterone antagonist in a model of Charcot-Marie-Tooth disease (CMT-1A). Nat. Med. 9:1533–37

142. Shaffer LG, Lupski JR. 2000. Molecular mechanisms for constitutional chromosomal rearrangementsin humans. Annu. Rev. Genet. 34:297–329

143. Sharp AJ, Hansen S, Selzer RR, Cheng Z, Regan R, et al. 2006. Discovery of previously unidentifiedgenomic disorders from the duplication architecture of the human genome. Nat. Genet. 38:1038–42

144. Sharp AJ, Mefford HC, Li K, Baker C, Skinner C, et al. 2008. A recurrent 15q13.3 microdeletionsyndrome associated with mental retardation and seizures. Nat. Genet. 40:322–28

145. Sharp AJ, Selzer RR, Veltman JA, Gimelli S, Gimelli G, et al. 2007. Characterization of a recurrent15q24 microdeletion syndrome. Hum. Mol. Genet. 16:567–72

www.annualreviews.org • CNV in Human Health, Disease, Evolution 479

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

146. Shaw-Smith C, Redon R, Rickman L, Rio M, Willatt L, et al. 2004. Microarray based comparativegenomic hybridisation (array-CGH) detects submicroscopic chromosomal deletions and duplications inpatients with learning disability/mental retardation and dysmorphic features. J. Med. Genet. 41:241–48

147. Shaw CJ, Lupski JR. 2004. Implications of human genome architecture for rearrangement-based disor-ders: the genomic basis of disease. Hum. Mol. Genet. 13(Spec No 1):R57–64

148. Shaw CJ, Lupski JR. 2005. Non-recurrent 17p11.2 deletions are generated by homologous and nonho-mologous mechanisms. Hum. Genet. 116:1–7

149. She X, Cheng Z, Zollner S, Church DM, Eichler EE. 2008. Mouse segmental duplication and copynumber variation. Nat. Genet. 40:909–14

150. Singleton AB, Farrer M, Johnson J, Singleton A, Hague S, et al. 2003. alpha-Synuclein locus triplicationcauses Parkinson’s disease. Science 302:841

151. Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, et al. 2006. APP duplication is sufficientto cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain 129:2977–83

152. Smith GR. 1998. Chi sites and their consequences. In Bacterial Genomes: Physical Structure and Analysis,ed. FJ de Bruijn, JR Lupski, GM Weinstock, pp. 49–66. New York: Chapman & Hall

153. Stankiewicz P, Beaudet AL. 2007. Use of array CGH in the evaluation of dysmorphology, malformations,developmental delay, and idiopathic mental retardation. Curr. Opin. Genet. Dev. 17:182–92

154. Stankiewicz P, Lupski JR. 2002. Genome architecture, rearrangements and genomic disorders. TrendsGenet. 18:74–82

155. Stankiewicz P, Shaw CJ, Dapper JD, Wakui K, Shaffer LG, et al. 2003. Genome architecture catalyzesnonrecurrent chromosomal rearrangements. Am. J. Hum. Genet. 72:1101–16

156. Stefansson H, Rujescu D, Cichon S, Pietilainen OP, Ingason A, et al. 2008. Large recurrent microdele-tions associated with schizophrenia. Nature 455:232–36

157. Steinmann K, Cooper DN, Kluwe L, Chuzhanova NA, Senger C, et al. 2007. Type 2 NF1 deletionsare highly unusual by virtue of the absence of nonallelic homologous recombination hotspots and anapparent preference for female mitotic recombination. Am. J. Hum. Genet. 81:1201–20

158. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, et al. 2007. Relative impact of nucleotideand copy number variation on gene expression phenotypes. Science 315:848–53

159. The Int. HapMap Consort. 2003. The International HapMap Project. Nature 426:789–96160. The Int. HapMap Consort. 2005. A haplotype map of the human genome. Nature 437:1299–320161. The Int. Schizophrenia Consort. 2008. Rare chromosomal deletions and duplications increase risk of

schizophrenia. Nature 455:237–41162. Theuns J, Brouwers N, Engelborghs S, Sleegers K, Bogaerts V, et al. 2006. Promoter mutations that

increase amyloid precursor-protein expression are associated with Alzheimer disease. Am. J. Hum. Genet.78:936–46

163. Tiemann-Boege I, Calabrese P, Cochran DM, Sokol R, Arnheim N. 2006. High-resolution recombina-tion patterns in a region of human chromosome 21 measured by sperm typing. PLoS Genet. 2:e70

164. Toffolatti L, Cardazzo B, Nobile C, Danieli GA, Gualandi F, et al. 2002. Investigating the mechanismof chromosomal deletion: characterization of 39 deletion breakpoints in introns 47 and 48 of the humandystrophin gene. Genomics 80:523–30

165. Townson JR, Barcellos LF, Nibbs RJ. 2002. Gene copy number regulates the production of the humanchemokine CCL3-L1. Eur. J. Immunol. 32:3016–26

166. Turner DJ, Miretti M, Rajan D, Fiegler H, Carter NP, et al. 2008. Germline rates of de novo meioticdeletions and duplications causing several genomic disorders. Nat. Genet. 40:90–95

167. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, et al. 2005. Fine-scale structural variation of thehuman genome. Nat. Genet. 37:727–32

168. Van Esch H, Bauters M, Ignatius J, Jansen M, Raynaud M, et al. 2005. Duplication of the MECP2 regionis a frequent cause of severe mental retardation and progressive neurological symptoms in males. Am. J.Hum. Genet. 77:442–53

169. van Ommen GJ. 2005. Frequency of new copy number variation in humans. Nat. Genet. 37:333–34170. Velagaleti GV, Bien-Willner GA, Northup JK, Lockhart LH, Hawkins JC, et al. 2005. Position effects

due to chromosome breakpoints that map approximately 900 Kb upstream and approximately 1.3 Mbdownstream of SOX9 in two patients with campomelic dysplasia. Am. J. Hum. Genet. 76:652–62

480 Zhang et al.

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

ANRV386-GG10-21 ARI 19 August 2009 20:44

171. Vissers LE, Stankiewicz P, Yatsenko SA, Crawford E, Creswick H, et al. 2007. Complex chromosome17p rearrangements associated with low-copy repeats in two patients with congenital anomalies. Hum.Genet. 121:697–709

172. Vrijenhoek T, Buizer-Voskamp JE, Van Der Stelt I, Strengman E, Genetic Risk and Outcome in Psychosis(GROUP) Consort., et al. 2008. Recurrent CNVs disrupt three candidate genes in schizophrenia patients.Am. J. Hum. Genet. 83:504–10

173. Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, et al. 2008. Rare structural variantsdisrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320:539–43

174. Wang J, Wang W, Li R, Li Y, Tian G, et al. 2008. The diploid genome sequence of an Asian individual.Nature 456:60–65

175. Weiss LA, Shen Y, Korn JM, Arking DE, Miller DT, et al. 2008. Association between microdeletionand microduplication at 16p11.2 and autism. N. Engl. J. Med. 358:667–75

176. Wellcome WTCCC. 2007. Genome-wide association study of 14000 cases of seven common diseasesand 3000 shared controls. Nature 447:661–78

177. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, et al. 2008. The complete genome of anindividual by massively parallel DNA sequencing. Nature 452:872–76

178. Whitcomb DC, Gorry MC, Preston RA, Furey W, Sossenheimer MJ, et al. 1996. Hereditary pancreatitisis caused by a mutation in the cationic trypsinogen gene. Nat. Genet. 14:141–45

179. Willcocks LC, Lyons PA, Clatworthy MR, Robinson JI, Yang W, et al. 2008. Copy number of FCGR3B,which is associated with systemic lupus erythematosus, correlates with protein expression and immunecomplex uptake. J. Exp. Med. 205:1573–82

180. Wolf NI, Sistermans EA, Cundall M, Hobson GM, Davis-Williams AP, et al. 2005. Three or more copiesof the proteolipid protein gene PLP1 cause severe Pelizaeus-Merzbacher disease. Brain 128:743–51

181. Xing J, Zhang Y, Han K, Salem AH, Sen SK, et al. 2009. Mobile elements create structural variation:analysis of a complete human genome. Genome Res. In press. doi:10.1101/gr.091827.109

182. Xu B, Roos JL, Levy S, van Rensburg EJ, Gogos JA, Karayiorgou M. 2008. Strong association of denovo copy number mutations with sporadic schizophrenia. Nat. Genet. 40:880–85

183. Yan J, Bi W, Lupski JR. 2007. Penetrance of craniofacial anomalies in mouse models of Smith-Magenissyndrome is modified by genomic sequence surrounding Rai1: Not all null alleles are alike. Am. J. Hum.Genet. 80:518–25

184. Yang Y, Chung EK, Wu YL, Savelli SL, Nagaraja HN, et al. 2007. Gene copy-number variation andassociated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE):Low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibilityin European Americans. Am. J. Hum. Genet. 80:1037–54

185. Zhang F, Carvalho CM, Lupski JR. 2009. Complex human chromosomal and genomic rearrangements.Trends Genet. In press, doi: 10.1016/j.tig.2009.05.005

186. Zhang F, Khajavi M, Connolly AM, Towne CF, Batish SD, Lupski JR. 2009. The DNA replicationFoSTeS/MMBIR mechanism can generate human genomic, genic, and exonic complex rearrangements.Nat. Genet. 41:849–53

www.annualreviews.org • CNV in Human Health, Disease, Evolution 481

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

AR386-GG10-FM ARI 6 August 2009 7:15

Annual Review ofGenomics andHuman Genetics

Volume 10, 2009

Contents

Chromosomes in Leukemia and Beyond: From Irrelevantto Central PlayersJanet D. Rowley � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 1

Unraveling a Multifactorial Late-Onset Disease: From GeneticSusceptibility to Disease Mechanisms for Age-Related MacularDegenerationAnand Swaroop, Emily Y. Chew, Catherine Bowes Rickman, and Goncalo R. Abecasis � � �19

Syndromes of Telomere ShorteningMary Armanios � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �45

Chronic Pancreatitis: Genetics and PathogenesisJian-Min Chen and Claude Ferec � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �63

The Genetics of Crohn’s DiseaseJohan Van Limbergen, David C. Wilson, and Jack Satsangi � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �89

Genotyping Technologies for Genetic ResearchJiannis Ragoussis � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 117

Applications of New Sequencing Technologies for TranscriptomeAnalysisOlena Morozova, Martin Hirst, and Marco A. Marra � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 135

The Posttranslational Processing of Prelamin A and DiseaseBrandon S.J. Davies, Loren G. Fong, Shao H. Yang, Catherine Coffinier,

and Stephen G. Young � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 153

Genetic Testing in Israel: An OverviewGuy Rosner, Serena Rosner, and Avi Orr-Urtreger � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 175

Stewardship of Human Biospecimens, DNA, Genotype, and ClinicalData in the GWAS EraStephen J. O’Brien � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 193

Schistosoma Genetics: New Perspectives on Schistosome Biologyand Host-Parasite InteractionZe-Guang Han, Paul J. Brindley, Sheng-Yue Wang, and Zhu Chen � � � � � � � � � � � � � � � � � � � � 211

Evolution of Genomic Imprinting: Insights from Marsupialsand MonotremesMarilyn B. Renfree, Timothy A. Hore, Geoffrey Shaw,

Jennifer A. Marshall Graves, and Andrew J. Pask � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 241

v

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.

AR386-GG10-FM ARI 6 August 2009 7:15

Methods for Genomic PartitioningEmily H. Turner, Sarah B. Ng, Deborah A. Nickerson, and Jay Shendure � � � � � � � � � � � � � 263

Biased Gene Conversion and the Evolution of MammalianGenomic LandscapesLaurent Duret and Nicolas Galtier � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 285

Inherited Variation in Gene ExpressionDaniel A. Skelly, James Ronald, and Joshua M. Akey � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 313

Genomic Analyses of Sex Chromosome EvolutionMelissa A. Wilson and Kateryna D. Makova � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 333

Sequencing Primate Genomes: What Have We Learned?Tomas Marques-Bonet, Oliver A. Ryder, and Evan E. Eichler � � � � � � � � � � � � � � � � � � � � � � � � � � � 355

Genotype ImputationYun Li, Cristen Willer, Serena Sanna, and Goncalo Abecasis � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 387

Genetics of Athletic PerformanceElaine A. Ostrander, Heather J. Huson, and Gary K. Ostrander � � � � � � � � � � � � � � � � � � � � � � � � 407

Genetic Screening for Low-Penetrance Variantsin Protein-Coding DiseasesJill Waalen and Ernest Beutler � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 431

Copy Number Variation in Human Health, Disease, and EvolutionFeng Zhang, Wenli Gu, Matthew E. Hurles, and James R. Lupski � � � � � � � � � � � � � � � � � � � � � 451

The Toxicogenomic Multiverse: Convergent Recruitment of ProteinsInto Animal VenomsBryan G. Fry, Kim Roelants, Donald E. Champagne, Holger Scheib,

Joel D.A. Tyndall, Glenn F. King, Timo J. Nevalainen, Janette A. Norman,Richard J. Lewis, Raymond S. Norton, Camila Renjifo,and Ricardo C. Rodríguez de la Vega � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 483

Genetics, Medicine, and the Plain PeopleKevin A. Strauss and Erik G. Puffenberger � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 513

Indexes

Cumulative Index of Contributing Authors, Volumes 1–10 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 537

Cumulative Index of Chapter Titles, Volumes 1–10 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 541

Errata

An online log of corrections to Annual Review of Genomics and Human Genetics articlesmay be found at http://genom.annualreviews.org/

vi Contents

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

9.10

:451

-481

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

Ind

iana

Uni

vers

ity -

Pur

due

Uni

vers

ity I

ndia

napo

lis -

Sch

ool o

f M

edic

ine

on 0

4/19

/10.

For

per

sona

l use

onl

y.