partial deficiency of the c-terminal-domain phosphatase of rna polymerase ii is associated with...

5
LETTERS Congenital cataracts facial dysmorphism neuropathy (CCFDN) syndrome (OMIM 604168) is an autosomal recessive developmental disorder that occurs in an endogamous group of Vlax Roma (Gypsies; refs. 1–3). We previously localized the gene associated with CCFDN to 18qter, where a conserved haplotype suggested a single founder mutation 4 . In this study, we used recombination mapping to refine the gene position to a 155-kb critical interval. During haplotype analysis, we found that the non-transmitted chromosomes of some unaffected parents carried the conserved haplotype associated with the disease. Assuming such parents to be completely homozygous across the critical interval except with respect to the disease- causing mutation, we developed a new ‘not quite identical by descent’ (NQIBD) approach, which allowed us to identify the mutation causing the disease by sequencing DNA from a single unaffected homozygous parent. We show that CCFDN is caused by a single-nucleotide substitution in an antisense Alu element in intron 6 of CTDP1 (encoding the protein phosphatase FCP1, an essential component of the eukaryotic transcription machinery 5,6 ), resulting in a rare mechanism of aberrant splicing and an Alu insertion in the processed mRNA. CCFDN thus joins the group of ‘transcription syndromes’ 7 and is the first ‘purely’ transcriptional defect identified that affects polymerase II–mediated gene expression. CCFDN is characterized by a complex clinical phenotype with seem- ingly unrelated features involving multiple organs and systems 1–3 . Developmental abnormalities include congenital cataracts and microcorneae, hypomyelination of the peripheral nervous system, impaired physical growth, delayed early motor and intellectual devel- opment, facial dysmorphism and hypogonadism 1–3 . Central nervous system involvement, with cerebral and spinal cord atrophy, may be the result of disrupted development with superimposed degenerative changes 1,2 . Affected individuals are prone to severe rhabdomyolysis after viral infections 3,8 and to serious complications related to general anesthesia (such as pulmonary edema and epileptic seizures). The disorder was originally described in affected Gypsies in Bulgaria 1,4 and has since been diagnosed in many different countries. The Gypsy population of Europe is composed of numerous genetically isolated groups. Nearly all families known to have CCFDN belong to the same endogamous group, called the Rudari. Analysis of the 13 families in whom CCFDN was originally described 4 placed the gene in a 1-Mb interval on 18q23–qter containing 10 known and 28 predicted genes (Fig. 1a). We refined this interval by analyzing samples from 52 families (85 affected members) using a dense map of 170 polymorphisms, including 160 single-nucleotide polymorphisms (SNPs) and insertion-deletions (indels) identified by sequencing of positional candidate genes (Supplementary Table 1 online). Genetic homogeneity among the 52 families was supported by haplotype shar- ing and linkage analysis, with the highest combined two-point lod score of 14.88 at D18S1390. Recombination mapping placed the boundaries of the CCFDN critical region at position 1,623,325 centromeric (LOC125267, subsequently withdrawn) and 1,768,321 telomeric (EST AV728725) of NCBI contig NT_010879.13 (Fig. 1a,b). Other recombi- 1 Institute of Human Genetics, Charité, Humboldt University, Berlin, Germany. 2 Laboratory of Molecular Genetics, Western Australian Institute for Medical Research, University of Western Australia Centre for Medical Research, Perth, Australia. 3 Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA. 4 Department of Neuropediatrics, Charité, Humboldt University, Berlin, Germany. 5 Department of Neurology, Academic Medical Centre, University of Amsterdam, The Netherlands. 6 Department of Neurology and 7 Laboratory of Molecular Pathology, Medical University, Sofia, Bulgaria. 8 Friedrich-Baur Institut, Ludwig-Maximilians-Universität, München, Germany. 9 Department of Ophthalmology, University Hospital of Vienna, Austria. 10 Istituto Ortopedico Rizzoli, Bologna, Italy. 11 Munich Re, Centre of Competence Biosciences, Munich, Germany. 12 University of Utah School of Medicine, Salt Lake City, Utah, USA. 13 Institute of Neurology and National Hospital for Neurology and Neurosurgery, London, UK. 14 Department of Genetics, Stanford University School of Medicine, Stanford, California, USA. 15 Division of Research, Kaiser Permanente, Oakland, California, USA. 16 These two authors contributed equally to this work. Correspondence should be addressed to L.K. ([email protected]). Published online 21 September 2003; doi:10.1038/ng1243 Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome Raymonda Varon 1,16 , Rebecca Gooding 2,16 , Christina Steglich 1 , Lorna Marns 2 , Hua Tang 3 , Dora Angelicheva 2 , Kiau Kiun Yong 2 , Petra Ambrugger 1 , Anke Reinhold 1,4 , Bharti Morar 2 , Frank Baas 5 , Marcel Kwa 5 , Ivailo Tournev 6 , Velina Guerguelcheva 6 , Ivo Kremensky 7 , Hanns Lochmüller 8 , Andrea Müllner-Eidenböck 9 , Luciano Merlini 10 , Luitgard Neumann 1 , Joachim Bürger 1,11 , Maggie Walter 8 , Kathryn Swoboda 12 , P K Thomas 13 , Arpad von Moers 4 , Neil Risch 14,15 & Luba Kalaydjieva 2 NATURE GENETICS VOLUME 35 | NUMBER 2 | OCTOBER 2003 185 © 2003 Nature Publishing Group http://www.nature.com/naturegenetics

Upload: bharti

Post on 21-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome

L E T T E R S

Congenital cataracts facial dysmorphism neuropathy (CCFDN)syndrome (OMIM 604168) is an autosomal recessivedevelopmental disorder that occurs in an endogamous groupof Vlax Roma (Gypsies; refs. 1–3). We previously localized thegene associated with CCFDN to 18qter, where a conservedhaplotype suggested a single founder mutation4. In this study,we used recombination mapping to refine the gene position toa 155-kb critical interval. During haplotype analysis, we foundthat the non-transmitted chromosomes of some unaffectedparents carried the conserved haplotype associated with thedisease. Assuming such parents to be completely homozygousacross the critical interval except with respect to the disease-causing mutation, we developed a new ‘not quite identical bydescent’ (NQIBD) approach, which allowed us to identify themutation causing the disease by sequencing DNA from a singleunaffected homozygous parent. We show that CCFDN iscaused by a single-nucleotide substitution in an antisense Aluelement in intron 6 of CTDP1 (encoding the proteinphosphatase FCP1, an essential component of the eukaryotictranscription machinery5,6), resulting in a rare mechanism ofaberrant splicing and an Alu insertion in the processed mRNA.CCFDN thus joins the group of ‘transcription syndromes’7 andis the first ‘purely’ transcriptional defect identified that affectspolymerase II–mediated gene expression.

CCFDN is characterized by a complex clinical phenotype with seem-ingly unrelated features involving multiple organs and systems1–3.

Developmental abnormalities include congenital cataracts andmicrocorneae, hypomyelination of the peripheral nervous system,impaired physical growth, delayed early motor and intellectual devel-opment, facial dysmorphism and hypogonadism1–3. Central nervoussystem involvement, with cerebral and spinal cord atrophy, may bethe result of disrupted development with superimposed degenerativechanges1,2. Affected individuals are prone to severe rhabdomyolysisafter viral infections3,8 and to serious complications related to generalanesthesia (such as pulmonary edema and epileptic seizures). Thedisorder was originally described in affected Gypsies in Bulgaria1,4

and has since been diagnosed in many different countries. The Gypsypopulation of Europe is composed of numerous genetically isolatedgroups. Nearly all families known to have CCFDN belong to the sameendogamous group, called the Rudari.

Analysis of the 13 families in whom CCFDN was originally described4

placed the gene in a 1-Mb interval on 18q23–qter containing 10 knownand 28 predicted genes (Fig. 1a). We refined this interval by analyzingsamples from 52 families (85 affected members) using a dense map of170 polymorphisms, including 160 single-nucleotide polymorphisms(SNPs) and insertion-deletions (indels) identified by sequencing ofpositional candidate genes (Supplementary Table 1 online). Genetichomogeneity among the 52 families was supported by haplotype shar-ing and linkage analysis, with the highest combined two-point lod scoreof 14.88 at D18S1390. Recombination mapping placed the boundariesof the CCFDN critical region at position 1,623,325 centromeric(LOC125267, subsequently withdrawn) and 1,768,321 telomeric (ESTAV728725) of NCBI contig NT_010879.13 (Fig. 1a,b). Other recombi-

1Institute of Human Genetics, Charité, Humboldt University, Berlin, Germany. 2Laboratory of Molecular Genetics, Western Australian Institute for Medical Research,University of Western Australia Centre for Medical Research, Perth, Australia. 3Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle,Washington, USA. 4Department of Neuropediatrics, Charité, Humboldt University, Berlin, Germany. 5Department of Neurology, Academic Medical Centre, Universityof Amsterdam, The Netherlands. 6Department of Neurology and 7Laboratory of Molecular Pathology, Medical University, Sofia, Bulgaria. 8Friedrich-Baur Institut,Ludwig-Maximilians-Universität, München, Germany. 9Department of Ophthalmology, University Hospital of Vienna, Austria. 10Istituto Ortopedico Rizzoli, Bologna,Italy. 11Munich Re, Centre of Competence Biosciences, Munich, Germany. 12University of Utah School of Medicine, Salt Lake City, Utah, USA. 13Institute ofNeurology and National Hospital for Neurology and Neurosurgery, London, UK. 14Department of Genetics, Stanford University School of Medicine, Stanford,California, USA. 15Division of Research, Kaiser Permanente, Oakland, California, USA. 16These two authors contributed equally to this work. Correspondence shouldbe addressed to L.K. ([email protected]).

Published online 21 September 2003; doi:10.1038/ng1243

Partial deficiency of the C-terminal-domain phosphataseof RNA polymerase II is associated with congenitalcataracts facial dysmorphism neuropathy syndromeRaymonda Varon1,16, Rebecca Gooding2,16, Christina Steglich1, Lorna Marns2, Hua Tang3, Dora Angelicheva2,Kiau Kiun Yong2, Petra Ambrugger1, Anke Reinhold1,4, Bharti Morar2, Frank Baas5, Marcel Kwa5, Ivailo Tournev6,Velina Guerguelcheva6, Ivo Kremensky7, Hanns Lochmüller8, Andrea Müllner-Eidenböck9, Luciano Merlini10,Luitgard Neumann1, Joachim Bürger1,11, Maggie Walter8, Kathryn Swoboda12, P K Thomas13, Arpad von Moers4,Neil Risch14,15 & Luba Kalaydjieva2

NATURE GENETICS VOLUME 35 | NUMBER 2 | OCTOBER 2003 185

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 2: Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome

L E T T E R S

nation breakpoints within 4–7 kb of these positions supported the newcritical interval of ∼ 155 kb (Fig. 1c), in which all chromosomes associ-ated with the disease shared identical 78-marker haplotypes.

An unusual observation during haplotype analysis prompted a newstrategy for identifying the disease-causing mutation. We identifiedseveral unaffected carrier parents (4 of 92 analyzed) who were homozy-gous with respect to the conserved disease-associated haplotype, withidentity between transmitted and non-transmitted chromosomes sup-ported by 133 markers over the 1-Mb region. (Fig. 2a andSupplementary Table 1 online). Detailed clinical examination of theseindividuals found no symptoms of the disease. Sibship analysis in thefamilies with CCFDN showed that all 18 siblings who shared both

chromosomes identical by descent (IBD) with the index individualwith CCFDN were affected, and all 38 who shared one chromosome(24) or no chromosomes (14) IBD with the index individual were unaf-fected. The families of the four homozygous parents included onlyaffected siblings, who shared both chromosomes IBD. The findingsruled out incomplete penetrance and digenic inheritance.

We next investigated the epidemiology of the disease and the his-tory of the population to explain the observed haplotype homozygos-ity of unaffected carriers. Unlike other ‘private’ Gypsy disorders9,CCFDN is almost exclusively confined to a single Gypsy group. TheRudari are a recently formed genetic isolate of limited diversity whoseseparation from other, closely related Vlax Gypsy groups dates to

186 VOLUME 35 | NUMBER 2 | OCTOBER 2003 NATURE GENETICS

KIA

A08

63

FLJ

2237

8

CT

DP

1

DIM

1

PA

RD

6G

KC

NG

2

AT

P9B

NF

AT

C1

FLJ

1096

7

FLJ

2117

2

Conserved Ht A haplotypes Ht BRecombinations

a

b

c

cen tel

155 kb

CTDP1LOC284241

D18

S11

41

2309

0ta1

1908

ca1

LOC

1252

67

ES

T A

V72

8725

2159

4at1

D18

S10

95

D18

S13

90

6853

0gt1

6853

0gt2

D18

S70

6853

0ca1

Predicted gene

Microsatellite

Figure 1 Fine mapping of the gene associatedwith CCFDN on 18qter. (a) The 1-Mb CCFDNcritical interval as defined in the initial mappingstudy4. (b) Haplotype analysis of diseasechromosomes and recombination mapping of thecritical interval. We observed two (rather thanone) conserved haplotypes, Ht A and Ht B, thatdiffered by ∼ 248 kb of divergent sequence(shaded area) documented by 14 informativemarkers. The two haplotypes occurred atapproximately equal frequencies and hadgenerated a similar degree of diversity. Thecritical region was defined by recombinationbreakpoints mapping to LOC125267 and ESTAV728725. The new boundaries placed thedivergent part of the conserved haplotypetelomeric to the critical region. The haplotypesare shown in Supplementary Table 4 online. (c) The refined 155-kb CCFDN critical region,containing the gene encoding C-terminal-domainphosphatase 1 (CTDP1), predicted geneLOC284241 and 31 ESTs (arrows) notoverlapping the genes.

Nontransmittedhaplotypes

D18

S46

2

Conserved Ht Ahaplotypes Ht B

D18

S11

41

cen tel MRCACCFDNhaplotypes

Nontransmittedhaplotypes

CCFDNmutation

MRCA all related haplotypes

Ht A Ht B

T1 = 30

T2 = 14

a b

Figure 2 NQIBD analysis. (a) The identity of the non-transmitted NQIBD parental chromosomes (outlined in green) to the conserved disease-associatedhaplotype Ht A (outlined in red) was confirmed by analysis of 133 polymorphisms across the 1-Mb region telomeric of D18S1141 (arrows indicatemicrosatellites, bars indicate SNPs and indels). The additional microsatellites centromeric of D18S1141 (not to scale) were analyzed to get a betterrepresentation of haplotype diversification over a ∼ 5-cM genetic distance (deCODE map). Details are presented in Supplementary Table 5 online. (b) Genealogy of NQIBD chromosomes. Our estimates of the haplotype coalescence times, for all related chromosomes (transmitted and nontransmitted) and for the transmitted disease chromosomes alone, placed conservatively the most recent common ancestors (MRCAs) at the 95% confidence intervals for the coalescence times T1 = 30 generations (upper bound for the MRCA of all related haplotypes) and T2 = 14 generations (lower bound for the MRCA of disease-associated haplotypes). We predicted that the mutation causing CCFDN, assumed to have originated in the interval of at most 16 generationsseparating the two MRCAs, would be the single difference between NQIBD chromosomes.

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 3: Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome

L E T T E R S

∼ 300–500 years ago10,11. We reasoned that the mutation causingCCFDN occurred recently, around or after the time of founding ofthe Rudari population, on a common chromosomal haplotype whosetwo versions (normal and mutant) still occur today. Based on haplo-type genealogy (Fig. 2b), we conservatively placed the origin of themutation in an interval of (at most) 16 generations separating theupper bound of the coalescence time of all related (disease-associatedand non-disease-associated) haplotypes from the lower bound of thecoalescence time of just the disease-associated haplotypes. In this sce-nario, the two seemingly identical chromosomes of a homozygousparent were in fact not quite identical by descent (NQIBD) as theyrepresented the two versions of the ancestral haplotype. NQIBDchromosomes were predicted to differ at a single position in the criti-cal region, that being the mutation that causes the disease. Statisticalanalysis supported this contention, with the probability of more thanone mutation in the 155-kb interval estimated at P < 0.0004. Ourmutation identification strategy thus detected the single differencebetween the two chromosomes of an unaffected homozygous parent.

We sequenced all expressed sequences and flanking introns in thecritical region. NQIBD chromosomes differed at only 1 of the 29,893nucleotides analyzed: a C→T substitution in an intronic Alu element(Fig. 3a). The mutation abolishes an NlaIII restriction site, and weused a PCR-based restriction fragment–length polymorphism assay(Fig. 3b) to test for its presence in families with CCFDN and in con-trol individuals. In the affected families, the T allele segregated per-fectly with the disease phenotype (with a two-point lod score of 20.9):the 85 affected individuals were T/T homozygotes; 131 unaffected rel-atives were C/T heterozygotes (all parents, including those NQIBD,fell into this group) and 25 were C/C homozygotes. Screening of 887unaffected population controls found a 6.9% carrier rate among the

Rudari, in close agreement with predictions based on CCFDN preva-lence; an average carrier rate of 0.6% in other Gypsy populations,compatible with a limited exchange of migrants; and a rate of 0%among non-Gypsy Europeans. Several individuals from Vlax Gypsygroups related to the Rudari, who carried the conserved haplotypeassociated with CCFDN on one or both chromosomes, were C/C

NATURE GENETICS VOLUME 35 | NUMBER 2 | OCTOBER 2003 187

T T A C A G G C A T G A G C C

T T A C A G G T A T G A G C C

Exon 6 Exon 7

EST AA682622

Antisense Alu

C→T*

Mutant

Wild-type

CTDP1 1 2 3 4

461240221

a b

Figure 3 The only difference between NQIBD chromosomes is a single-nucleotide substitution in an intronic Alu element. (a) Sequencing ofgenomic DNA identified a C→T transition in an antisense Alu element. The Alu repeat is part of EST AA682622, residing in intron 6 of CTDP1and reading in the opposite direction. (b) The C→T Alu mutation can bedetected by NlaIII restriction digestion of the 461-bp PCR product (seeSupplementary Methods online), resulting in fragments of 240 bp and 221 bp in the wild-type sequence. The mutation abolishes the restrictionsite. Lane 1, pUC HpaII size ladder; lane 2, C/T heterozygote; lane 3, wild-type C/C homozygote; lane 4, mutant T/T homozygote.

Figure 4 The Alu C→T mutation causes aberrantsplicing of CTDP1. (a) The C→T substitutioncreates a donor splice site, which complies fullywith the consensus sequence. In the resultingabnormal mRNA processing, the polypyrimidinetract of the antisense Alu element and anadjacent AG dinucleotide are recognized as anacceptor site (with splicing of the upstream partof intron 6), and the sequence surrounding themutation serves as the next donor splice site. The insertion of 95 bp of the Alu sequence in the processed CTDP1 transcript generates apremature termination signal 17 codonsdownstream of exon 6. (b) RT–PCR evidence ofaberrant CTDP1 splicing in cultured cells fromindividuals with CCFDN. Competitive allele-specific amplification with a common reverseprimer in exon 7 and two forward primers(located in CTDP1 exon 6 and in the Aluinsertion) resulted in three products: 193 bp,wild-type; 288 bp, mutant containing the 95-bp insertion; 244 bp, part of the mutantproduct. Lane 1, 1-kb size ladder; lanes 2–4,lymphoblastoid cells from two individuals withCCFDN and a control individual, respectively;lanes 5 and 6, myoblasts from an individual withCCFDN and a control, respectively; lanes 7 and8, Schwann cells from nerve biopsy samplesfrom an individual with CCFDN and a control,respectively; lanes 9 and 10, fibroblasts from anindividual with CCFDN and a control, respectively. Results from additional RT–PCR experiments are shown in Supplementary Figure 1 online. (c) Residuallevels of the normal CTDP1 transcript. In the real-time PCR experiments, we compared the normalized ratios (NR) of CTDP1 versus 18S rRNA in culturedcells from individuals with CCFDN relative to control cells (NR values in control cells taken as 1). The level shown for fibroblasts is the average of sixexperiments on cell cultures from two affected subjects.

Exon 6 Exon 7Antisense Alu IVS6+389C→T

95 bp

S K T G N L S T D G G F T V L A R M A S I S *

………….(t)32gtatttttag taca….acag gtatga………………... …...…...ag AAATCTCACCTTAG gt………

aa 287

1 2 3 4 5 6 7 8 9 10

288 bp244 bp193 bp

Mut–F

Ex6 Insert Ex7Mutanttranscript

Ex6–F Ex7–R

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fibrob

lasts

Lym

phob

lasto

id

Myo

blasts

Schwan

n ce

lls

NR

of w

ild-t

ype

CT

DP

1 tr

ansc

ript i

n C

CF

DN

cel

ls r

elat

ive

to c

ontr

ol c

ells

a

b c

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 4: Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome

L E T T E R S

homozygotes, in full agreement with our population genetics model.Thus, the genetic data supported the conclusion that the C→T substi-tution is the defect that causes CCFDN.

The mutated Alu element is part of EST AA682622, residing on theantisense strand of CTDP1 in intron 6 (Fig. 3a). Northern-blot andRT–PCR analysis did not identify a longer transcript (data notshown), thereby ruling out the possibility that EST AA682622belongs to a novel gene reading in the opposite direction to CTDP1.At the same time, the Alu mutation creates a perfect donor splice site389 bp downstream of the exon 6/intron 6 junction of CTDP1 (Fig.4a). RT–PCR and sequencing analysis identified a rare mechanism ofaberrant splicing in which the donor site created by the C→T transi-tion activates an upstream cryptic acceptor site, resulting in the inser-tion of 95 nucleotides of the Alu sequence in the processed CTDP1mRNA (Fig. 4a). This mechanism has been identified previously onlyin ornithine aminotransferase deficiency12, a modest contribution tothe list of disease-causing mutations, given the role of Alu exonizationin the general mechanisms of alternative splicing13. The insertion inthe CTDP1 mRNA results in a premature termination signal 17codons downstream of exon 6, with the mutant transcript expected toundergo nonsense-mediated decay or lead to a nonfunctional proteinlacking the nuclear localization signal14. We observed the abnormalproduct in all cell types studied, regardless of their involvement in theclinical phenotype (Fig. 4b). The wild-type transcript was also pre-sent in all cells from individuals with CCFDN, indicating that normaland aberrant splicing both occurred and that the mutation causespartial deficiency. The levels of the wild-type transcript in cells fromindividuals with CCFDN, determined by real-time PCR, were15–35% of those in control cells, with variation related to cell typeand culture conditions (Fig. 4c).

CTDP1 encodes a protein phosphatase called FCP1, whose sub-strates are the phosphorylated serine residues in the heptamerrepeats of the C-terminal domain (CTD) of the largest RNA poly-merase II subunit5,6. CTD phosphorylation is a key mechanism ofregulation of gene expression in eukaryotes (reviewed in ref. 15). TheCTD acts as the assembly platform for the transcription and mRNA-processing machinery, with the site and level of serine phosphoryla-tion having a key role in the recruitment of proteins involved indifferent stages of the process. FCP1-mediated CTD dephosphoryla-tion, regulated by general transcription factors IIF and IIB16, isessential for the recycling of the largest RNA polymerase II subunitafter completion of the transcription cycle17. Experimental data sug-gest that, in addition to and independent of its phosphatase activity,FCP1 is involved in transcription regulation as a stoichiometric com-ponent of the elongation complex18,19, as a positive transcriptionregulator14 and as a factor opposing the Srb10-mediated repressionof cell-cycle and growth-control genes18,20. By the criteria of classicalinborn errors of metabolism, the residual FCP1 levels in cells fromindividuals with CCFDN would fall into the heterozygous range.This, together with recent evidence of the existence of other CTDphosphatases21,22, suggests that enzyme deficiency may not be theprimary mechanism of pathogenesis in CCFDN, and that pathogen-esis may involve other proposed role(s) of FCP1. The relative impor-tance of these different functions in developmental regulation andsubsequent control of gene expression is unclear, and CCFDN mayprovide useful clues.

Vermeulen et al.7 first predicted the existence of the category ofgenetic diseases named ‘transcription syndromes’, which theyexpected to result from partial defects of different components of thetranscription machinery. Our data classify CCFDN as a new memberof this category and as the first ‘pure’ defect of polymerase II–mediated

transcription. Notably, the list of candidate disorders7 includedMarinesco-Sjögren syndrome (MSS), whose phenotype is very similarto that of CCFDN3,8. Our analysis of 14 individuals with MSS ofdiverse (non-Gypsy) ethnicity detected no mutations in CTDP1.Recently, a gene associated with MSS was mapped to chromosome5q32 (ref. 23), a region containing several genes encoding transcrip-tion and elongation factors. Although CCFDN and MSS share pheno-typic features predicted to be common to transcription syndromes,such as impaired physical and intellectual development and infertil-ity7, the disease- and tissue-specific manifestations are most likely topromote understanding of transcription regulation.

METHODSSubjects and samples. The study included 241 individuals (85 affected)from 52 Gypsy families with CCFDN residing in Bulgaria, Germany,Austria, Romania, Hungary, Italy, the Czech Republic, Serbia and the USA.Population screening for the mutation causing CCFDN included a total of887 control subjects: 105 non-Gypsy Europeans and 782 individuals ofGypsy ethnicity from Bulgaria representing 13 different Gypsy groups.Cultured cells from individuals with CCFDN used for RNA studies includedskin fibroblasts from three individuals, lymphoblastoid cells from two indi-viduals, Schwann cells from one sural nerve biopsy sample and myoblastsfrom one muscle biopsy sample. Individuals with MSS were referred byneurological and genetics departments in Australia, Germany, Bulgaria andJapan. None of these individuals were of Gypsy ethnic background. Weobtained informed consent from all participants or, in the case of minors,from their parents. The study complies with the ethical guidelines of theinstitutions involved.

Genetic and physical mapping of the CCFDN critical region. We detectednovel microsatellites by searching the published 18qter sequence for the com-mon microsatellite repeat motifs. We identified SNPs and indels during thesequencing of positional candidate genes in the CCFDN critical region. Wecarried out sequencing reactions on PCR products using the amplificationprimers and BigDye 3.0 (Applied Biosystems). We ran the reactions on the ABI377 and ABI PRIZM 3100 DNA Analyzers. Detailed information can be foundin Supplementary Table 1 online.

We genotyped microsatellites and indels using PCR amplification with flu-orescently labeled primers and length separation on an ABI 377 DNAAnalyzer. For SNP typing, we carried out PCR amplification and restrictionfragment–length polymorphism analysis (where possible) or direct sequenc-ing of the PCR products. We constructed haplotypes manually from familygenotyping data. The order of markers followed the published physical map ofchromosome 18 (NCBI contig NT_010879.13).

For the linkage analysis, we used the microsatellite and indel genotypingdata, an autosomal recessive model with complete penetrance, a gene frequencyof 0.05 and polymorphic allele frequencies determined from the data set. Theorder of markers followed the published physical map, with genetic distancestaken from the deCODE map for the known markers and extrapolated for themarkers identified in the present study. We used MLINK24,25 for two-point lodscore analysis and GENEHUNTER26 for multipoint lod score analysis.

Mutation analysis. We used a panel of DNA samples for the PCR amplifica-tion and sequencing of CTDP1 exons and flanking intronic sequences (seeSupplementary Table 2 online for primers and conditions). The NlaIIIrestriction assay for the detection of the mutation causing CCFDN isdescribed in Supplementary Methods online.

Expression studies. We used a phenol-chloroform procedure (Trizol,Invitrogen) to extract RNA from cultured cells of individuals with CCFDNand controls. For first-strand cDNA synthesis, we used MMLV reverse tran-scriptase and random hexamer primers in a final volume of 20 µl containing0.2–0.3 µg total RNA (10 min at 20 °C, 40 min at 42 °C, 6 min at 98 °C).Subsequent PCR reactions used 2 µl of the cDNA product. PCR primers andcycling conditions for the expression studies can be found in SupplementaryTable 3 online.

188 VOLUME 35 | NUMBER 2 | OCTOBER 2003 NATURE GENETICS

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 5: Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome

L E T T E R S

To sequence the abnormal CTDP1 transcript, we carried out PCR ampli-fication with primers located in exons 6 and 8, separated products by elec-trophoresis in a 2% agarose gel, excised the bands and cloned the elutedproducts with the TOPO Cloning Kit (Invitrogen). We used M13 primersfor the sequencing. Competitive allele-specific PCR amplification used acommon reverse primer in exon 7 and two forward primers. An exon 6primer amplified the normal product as well as the mutant product con-taining the full insertion. The second forward primer, located in the inser-tion, amplified part of the mutant product alone. Results of additionalexperiments showing the abnormal splicing product are presented inSupplementary Figure 1 online.

In the real-time PCR analysis, CTDP1 was the test gene and 18S ribosomalRNA served as the control. We designed the CTDP1 primers to amplify a 102-bp fragment derived only from the normal transcript. We used the Rotorgene(Corbett) PCR machine and 10 µl reaction volumes containing 1.5 mMMgCl2, 1× SYBR green (Molecular Probes), 0.25 mM dNTPs, 20 ng of eachprimer, 0.025 U Hotstar Taq (Qiagen) and 2 µl cDNA. We analyzed the melt-ing curves in the 60–99 °C range using the Rotorgene software (Corbett). Weplotted the threshold cycle values obtained for cDNA samples against a stan-dard curve. To prepare the standard curves, we used standard PCR amplifica-tion of the above CTDP1 and 18S rRNA fragments followed by cloning of theproducts and a range of dilutions from 10–1 to 10–9. We expressed normalizedratios as concentration of the test gene divided by concentration of 18S rRNA.

NQIBD analysis. We estimated the time to the most recent common ances-tor (TMRCA) using the DMLE+ software package27. The TMRCA estimatesfor the disease-associated chromosomes (T2) were based on 128 CCFDNhaplotypes (one affected individual per nuclear family) and 120 normal hap-lotypes spanning 12 microsatellites and 4 indels over a distance of 5.34 cM.Carrier frequency was set at 5%, and population growth rate at 1.32, basedon previous estimates28. The TMRCA (T1) of all related chromosomes wasestimated for the disease-associated chromosomes and the nontransmittedparental chromosomes whose haplotypes appeared identical to the con-served haplotype associated with the disease, Ht A. Taking an upper boundfor T1 and a lower bound for T2 places the origin of the disease mutation inan interval of at most 16 generations. For a mutation rate of 10–8 per site pergeneration29, the expected number of mutations in the 155-kb critical regionis 0.0248. Modeling the number of mutations that would be discovered bycomplete sequencing of the 155-kb region as a Poisson random variable, theprobability of observing more than one mutation is P = 0.000302. Thisapproach ignores mutation hot spots, because, given the long stretch (155kb) of DNA considered, the small size of mutation-prone sequence motifs30

would not affect the above estimate. For example, assuming that the 155-kbregion contains a hot spot spanning 5 kb, where the mutation rate is 100times higher, the probability that the region will harbor more than onemutation is still low, at P = 0.005.

URLs. The National Center for Biotechnology Information is available athttp://www.ncbi.nlm.nih.gov, and the University of California Santa CruzGenome Bioinformatics is available at http://genome.ucsc.edu.

Note: Supplementary information is available on the Nature Genetics website.

ACKNOWLEDGMENTSWe thank affected individuals and their families for participating in this project;A. Corches, C. Lupu, M. Molnar, A. Kelemen, P. Seeman, V. Milic-Rasic forreferring individuals with CCFDN; M. Delatycki, K. Jones, J. Colomer and T.Ishikawa for referring individuals with MSS; K. Sperling and J. Kunze forsupporting this study; J. Reeve for help with the DMLE+ software; A. Usheva fordiscussions; N. Laing for critical comments on the manuscript; and D. Chandlerand I. Martins for technical help. The study was funded by the National Healthand Medical Research Council of Australia, The Wellcome Trust, the AustralianResearch Council, the Deutsche Forschungsgemeinschaft and partly by theGerman Ministry of Education and Research.

COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.

Received 10 June; accepted 2 September 2003Published online at http://www.nature.com/naturegenetics/

1. Tournev, I. et al. Congenital Cataracts Facial Dysmorphism Neuropathy (CCFDN)syndrome, a novel complex genetic disease in Balkan Gypsies: clinical and electro-physiological observations. Ann. Neurol. 45, 742–750 (1999).

2. Tournev, I., King, R., Muddle, J., Kalaydjieva, L. & Thomas, P.K. Peripheral nerveabnormalities in the congenital cataract facial dysmorphism neuropathy (CCFDN)syndrome. Acta Neuropathol. (Berlin) 98, 165–170 (1999).

3. Merlini, L. et al. Genetic identity of Marinesco-Sjogren/myoglobinuria and CCFDNsyndromes. Neurology 58, 231–236 (2002).

4. Angelicheva, D., Tournev, I., Dye, D., Chandler, D., Thomas, P.K. & Kalaydjieva, L.Congenital cataracts facial dysmorphism neuropathy (CCFDN) syndrome: a noveldevelopmental disorder in Gypsies maps to 18qter. Eur. J. Hum. Genet. 7, 560–566(1999).

5. Chambers, R.S. & Dahmus, M.E. Purification and characterization of a phosphatasefrom HeLa cells which dephosphorylates the C-terminal domain of RNA polymeraseII. J. Biol. Chem. 269, 26243–26248 (1994).

6. Archambault, J. et al. An essential component of a C-terminal domain phosphatasethat interacts with transcription factor TFIIF in Saccharomyces cerevisiae. Proc.Natl. Acad. Sci. USA 94, 14300–14305 (1997).

7. Vermeulen, W. et al. Three unusual repair deficiencies associated with transcriptionfactor BTF2 (TFIIH): evidence for the existence of a transcription syndrome. ColdSpring Harb. Symp. Quant. Biol. 59, 317–329 (1994).

8. Müller-Felber, W. et al. Marinesco-Sjögren syndrome with rhabdomyolysis. A newsubtype of the disease. Neuropediatrics 29, 97–101 (1998).

9. Kalaydjieva, L., Gresham, D. & Calafell, F. Genetic studies of the Roma (Gypsies): areview. BMC Med. Genet. 2, 5 (2001).

10. Kalaydjieva, L. et al. Patterns of inter- and intra-group genetic diversity in the VlaxRoma as revealed by Y chromosome and mitochondrial DNA lineages. Eur. J. Hum.Genet. 9, 97–104 (2001).

11. Gresham, D. et al. Origins and divergence of the Roma (Gypsies). Am. J. Hum.Genet. 69, 1314–1331 (2001).

12. Mitchell, G.A. et al. Splice-mediated insertion of an Alu sequence inactivatesornithine ∆-aminotransferase: A role for Alu elements in human mutation. Proc.Natl. Acad. Sci. USA 88, 815–819 (1991).

13. Lev-Maor, G., Sorek, R., Shomron, N. & Ast, G. The birth of an alternatively splicedexon: 3′ splice site selection in Alu exons. Science 300, 1288–1291 (2003).

14. Licciardo, P., Ruggiero, L., Lania, L. & Majello, B. Transcription activation by tar-geted recruitment of the RNA polymerase II CTD phosphatase FCP1. Nucleic AcidsRes. 29, 3539–3545 (2001).

15. Maniatis, T. & Reed, R. An extensive network of coupling among gene expressionmachines. Nature 416, 499–506 (2002).

16. Chambers, R.S., Wang, B.Q., Burton, Z.F. & Dahmus, M.E. The activity of COOH-ter-minal domain phosphatase is regulated by a docking site on RNA polymerase II andby the general transcription factors IIF and IIB. J. Biol. Chem. 270, 14962–14969(1995).

17. Cho, H. et al. A protein phosphatase functions to recycle RNA polymerase II. GenesDev. 13, 1540–1552 (1999).

18. Cho, H., Kobor, M., Kim, M., Greenblatt, J. & Buratowski, S. Opposing effects ofCtk1 kinase and Fcp1 phosphatase at Ser 2 of the RNA polymerase II C-terminaldomain. Genes & Dev. 15, 3319–3329 (2001).

19. Mandal, S.S., Cho, H., Kim, S., Cabane, K. & Reinberg, D. FCP1, a phosphatasespecific for the heptapeptide repeat of the largest subunit of RNA polymerase II,stimulates transcription elongation. Mol. Cell. Biol. 22, 7543–7552 (2002).

20. Hengartner, C.J. et al. Temporal regulation of RNA polymerase II by Srb10 andKin28 cyclin-dependent kinases. Mol. Cell 2, 43–53 (1998).

21. Washington, K. et al. Protein phosphatase-1 dephosphorylates the C-terminaldomain of RNA polymerase 2. J. Biol. Chem. 277, 40442–40448 (2002).

22. Yeo, M., Lin, P.S., Dahmus, M.S. & Gill, G.N. A novel RNA polymerase II C-terminaldomain phosphatase that preferentially dephosphorylates serine 5. J. Biol. Chem.278, 26078–26085 (2003).

23. Lagier-Tourenne, C. et al. Homozygosity mapping of Marinesco-Sjögren syndrome to5q31. Eur. J. Hum. Genet. (in the press).

24. Cottingham, R.W., Idury, R.M. & Shaffer, A.A. Faster sequential genetic linkagecomputations. Am. J. Hum. Genet. 53, 252–263 (1993).

25. Lathrop, G.M. & Lalouel, J.-M. Easy calculations of lod scores and genetic risks onsmall computers. Am. J. Hum. Genet. 36, 460–465 (1984).

26. Kruglyak, L., Daly, M.J. & Lander, E.S. Rapid multipoint linkage analysis of reces-sive traits in nuclear families, including homozygosity mapping. Am. J. Hum. Genet.56, 519–527 (1995).

27. Reeve, J.P. & Rannala, B. DMLE+: Bayesian linkage disequilibrium gene mapping.Bioinformatics 18, 894–895 (2002).

28. Hunter, M. et al. The P28T mutation in the GALK1 gene accounts for galactokinasedeficiency in Roma (Gypsy) patients across Europe. Pediatr. Res. 51, 602–606(2002).

29. Nachman, M.W. & Crowell, S. Estimate of the mutation rate per nucleotide inhumans. Genetics 156, 297–304 (2000).

30. Rogozin, I.B. & Pavlov, Y.I. Theoretical analysis of mutation hotspots and their DNAsequence specificity. Mutat. Res. 544, 65–85 (2003).

NATURE GENETICS VOLUME 35 | NUMBER 2 | OCTOBER 2003 189

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics