identification of a pax paired domain recognition sequence and

7
THE JOURNAL OF BIO~ICAL CHEMISTRY 0 1994 by The American Society for Biochemistry and Molecular Biology, Inc. Vol. 269, No. 11, Issue of March 18, pp. 8355-8361, 1994 Printed in U.S.A. Identification of A Pax Paired Domain Recognition Sequence and Evidence for DNA-dependent Conformational Changes* (Received forpublication, October 12, 1993, and in revised form, December 1, 1993) Jonathan EpsteinS, Jiexing Cai, Tom Glaser, Lisa Jepeal, and Richard Maas§ From the Divisions of Genetics and $Cardiology, Department of Medicine and Howard Hughes Medical Institute, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115 Pax genes encode a family of developmentally regu- lated transcription factors that have been implicated in a number of human and murine congenital disorders, as well as in tumorigenesis (Gruss, P., and Walther, C. (1992) Cell 69, 719-722; Hill, R., and van Heyningen, V. (1992) mends Genet. 8, 119-120;Chalepakis, G., Tremblay, P., and Gruss, P. (1992) J. Cell Sci. Suppl. 16, 6147; Maul- becker, C. C., and Gruss, P. (1993) EMBO J. 12,2361-2367; Walther, C., Guenet, J. L., Simon, D., Deutsch, U., Jostes, B., Goulding, M. D., Plachov, D., Balling, R., and Gruss, P. (1991) Genomics 11,424-434; Barr, R. G., Galili, N., Holick, J., Biegel, J. A, Rovera, G., and Emanuel, B. S. (1993) Nature Genet. 3,113-117). These genes are defined by the presence of an evolutionarily conserved DNA binding domain, termed the paired domain. The structure and the DNA binding characteristics of the paired domain remain largely unknown. We have utilized repetitive rounds of a polymerase chain reaction-based selection method to identify the optimal DNA binding sequences for the Pax-2 and Pax-6 paired domains. The results sug- gest that the paired domain family of peptidesbind simi- lar DNA sequences. Identification of this binding site has revealed an important structural clue regarding the mechanism of paired domain binding to DNA.CD and NMR structural analyses of the purified Pax-6 paired domain reveal it to be largely structureless in solution. Upon binding the recognition sequence, the complex be- comes markedly less soluble and displays CD spectro- scopic evidence of significant cy-helical structure. Paired box genes were first identified in Drosophila as a family of related genes (7) encoding a 128-amino acid DNA binding domain (8). At least nine murine and human paired box genes have been identified (9, lo), and those that have been studied are expressed in temporally and spatially restricted patterns during development. The paired box is highly con- served across millions of years of evolution; in the case of Pax-6, the paired domain is identical in axolotol and man.l Missense mutations within the paired domain of Pax genes have been associated with congenital disorders in both mouse and man (11-15). Also, Pax genes have been demonstrated to have on- cogenic potential (41, and a translocation involving the paired box portion of PAX3 has been associated with a human tumor Grant lROlEY1012301.The costs of publication of this article were * This work was supported in part by National Institutes of Health defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 6 An Assistant Investigator of the Howard Hughes Medical Inst. To whom correspondence should be addressed: Howard Hughes Medical Inst., Brigham and Women’s Hospital, Thorn Research Bldg., Rm. 910, 20 Shattuck St., Boston, MA 02115. Tel.: 617-732-5979 or -5119; Fax: 617-738-5575. T. Glaser and R. Maas, unpublished results. (6). The phenotypes associated with Pax gene mutations dem- onstrate that these gene products are critical during organo- genesis. Nevertheless, with only a few exceptions, specific tar- get genes for these transcription factors remain unknown. In fact, no optimal DNA recognition sequence for any paired do- main has been identified. The Drosophila paired gene (prd l2 encodes a paired domain that can bind two sequences, termed e4 and e5, found in the upstream region of the Drosophila even skipped gene (8, 16). A single amino acid change in the Pax-I paired domain correlates with decreased binding to modified e5 sequences and is respon- sible for the mouse mutation undulated (17). However, regula- tion of the even skipped gene by prd is unlikely, since prd- deficient mutantshave shown no changesin even skipped regulation (18), and ectopic expression ofprd also resulted in no change in even skipped expression (19). Also, binding of the Pax-I paired domain, as well as other Pax gene paired domains, is enhanced when the e5 sequence is modified (17). Based on the relative affinities of several modified e5 sequences, it has been suggested that a GTTCC core sequence is necessary for binding of the Pax-1, Pax-2, and Pax-3 paired domains (17, 20, 21). Nevertheless, despite a high degree of conservation among different paired domains, examples of sequences that bind vari- ous paired domains have appeared unrelated to one another or to the modified e5 sequences (22,231. These have not contained a GTTCC motif. Here, we report the optimal DNA recognition sequences for the Pax-2 and Pax-6 paired domains. The unusually long rec- ognition sequences unify other sequences that are bound by other paired domain proteins and that had previously appeared disparate. This suggests that many paired domain-containing proteins recognize a common sequence motif. Also, we demon- strate that the Pax-6 paired domain changes conformation upon binding its recognition sequence. MATERIALS AND METHODS Paired Domain Glutathione S-Dunsferase Fusion Protein Preparations-A full-length cDNA clone for the human PAX6 gene (24) was used as template in a PCR reaction with primers selected to am- plifysequencecorresponding to aminoacids1-130.Theaminoacid sequence of the murine and human paired domains are identical, and we have chosen to refer tothis construct with the murine nomenclature Pax-6 to avoid confusion. The PCR product was cloned into the SmaI site of pGEX2T (Amrad, Melbourne, Australia) and the resulting clones verified by direct sequencing. BL21(DE3) Escherichia coli were trans- formed and the fusion protein purified according to standard protocol (25, 26) using glutathione-agarose beads (Sigma). The Pax-:! paired domain glutathione S-transferasefusion protein was made in a similar fashion using the murine cDNA (kindly provided by Dr. Greg Dressler, NIH) as template. Polyhistidine-tagged Pax-6 Paired Domain Protein Preparation- Sequence corresponding to amino acids 1-130 of human PAX6 (24) was * The abbreviations usedare: prd, the Drosophila paired gene; PCR, polymerase chain reaction; EMSA, electrophoretic mobility shiR assay. a355

Upload: vanhuong

Post on 12-Feb-2017

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of a Pax paired domain recognition sequence and

THE JOURNAL OF B I O ~ I C A L CHEMISTRY 0 1994 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 269, No. 11, Issue of March 18, pp. 8355-8361, 1994 Printed in U.S.A.

Identification of A Pax Paired Domain Recognition Sequence and Evidence for DNA-dependent Conformational Changes*

(Received for publication, October 12, 1993, and in revised form, December 1, 1993)

Jonathan EpsteinS, Jiexing Cai, Tom Glaser, Lisa Jepeal, and Richard Maas§ From the Divisions of Genetics and $Cardiology, Department of Medicine and Howard Hughes Medical Institute, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115

Pax genes encode a family of developmentally regu- lated transcription factors that have been implicated in a number of human and murine congenital disorders, as well as in tumorigenesis (Gruss, P., and Walther, C. (1992) Cell 69, 719-722; Hill, R., and van Heyningen, V. (1992) mends Genet. 8, 119-120; Chalepakis, G., Tremblay, P., and Gruss, P. (1992) J. Cell Sci. Suppl. 16, 6147; Maul- becker, C. C., and Gruss, P. (1993) EMBO J. 12,2361-2367; Walther, C., Guenet, J. L., Simon, D., Deutsch, U., Jostes, B., Goulding, M. D., Plachov, D., Balling, R., and Gruss, P. (1991) Genomics 11,424-434; Barr, R. G., Galili, N., Holick, J., Biegel, J. A, Rovera, G., and Emanuel, B. S . (1993) Nature Genet. 3,113-117). These genes are defined by the presence of an evolutionarily conserved DNA binding domain, termed the paired domain. The structure and the DNA binding characteristics of the paired domain remain largely unknown. We have utilized repetitive rounds of a polymerase chain reaction-based selection method to identify the optimal DNA binding sequences for the Pax-2 and Pax-6 paired domains. The results sug- gest that the paired domain family of peptides bind simi- lar DNA sequences. Identification of this binding site has revealed an important structural clue regarding the mechanism of paired domain binding to DNA. CD and NMR structural analyses of the purified Pax-6 paired domain reveal it to be largely structureless in solution. Upon binding the recognition sequence, the complex be- comes markedly less soluble and displays CD spectro- scopic evidence of significant cy-helical structure.

Paired box genes were first identified in Drosophila as a family of related genes ( 7 ) encoding a 128-amino acid DNA binding domain (8). At least nine murine and human paired box genes have been identified (9, lo), and those that have been studied are expressed in temporally and spatially restricted patterns during development. The paired box is highly con- served across millions of years of evolution; in the case of Pax-6, the paired domain is identical in axolotol and man.l Missense mutations within the paired domain of Pax genes have been associated with congenital disorders in both mouse and man (11-15). Also, Pax genes have been demonstrated to have on- cogenic potential (41, and a translocation involving the paired box portion of PAX3 has been associated with a human tumor

Grant lROlEY1012301. The costs of publication of this article were * This work was supported in part by National Institutes of Health

defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

6 An Assistant Investigator of the Howard Hughes Medical Inst. To whom correspondence should be addressed: Howard Hughes Medical Inst., Brigham and Women’s Hospital, Thorn Research Bldg., Rm. 910, 20 Shattuck St., Boston, MA 02115. Tel.: 617-732-5979 or -5119; Fax: 617-738-5575.

T. Glaser and R. Maas, unpublished results.

(6). The phenotypes associated with Pax gene mutations dem- onstrate that these gene products are critical during organo- genesis. Nevertheless, with only a few exceptions, specific tar- get genes for these transcription factors remain unknown. In fact, no optimal DNA recognition sequence for any paired do- main has been identified.

The Drosophila paired gene ( p r d l2 encodes a paired domain that can bind two sequences, termed e4 and e5, found in the upstream region of the Drosophila even skipped gene (8, 16). A single amino acid change in the Pax-I paired domain correlates with decreased binding to modified e5 sequences and is respon- sible for the mouse mutation undulated (17). However, regula- tion of the even skipped gene by prd is unlikely, since prd- deficient mutants have shown no changes in even skipped regulation (18), and ectopic expression o fprd also resulted in no change in even skipped expression (19). Also, binding of the Pax-I paired domain, as well as other Pax gene paired domains, is enhanced when the e5 sequence is modified (17). Based on the relative affinities of several modified e5 sequences, it has been suggested that a GTTCC core sequence is necessary for binding of the Pax-1, Pax-2, and Pax-3 paired domains (17, 20, 21). Nevertheless, despite a high degree of conservation among different paired domains, examples of sequences that bind vari- ous paired domains have appeared unrelated to one another or to the modified e5 sequences (22,231. These have not contained a GTTCC motif.

Here, we report the optimal DNA recognition sequences for the Pax-2 and Pax-6 paired domains. The unusually long rec- ognition sequences unify other sequences that are bound by other paired domain proteins and that had previously appeared disparate. This suggests that many paired domain-containing proteins recognize a common sequence motif. Also, we demon- strate that the Pax-6 paired domain changes conformation upon binding its recognition sequence.

MATERIALS AND METHODS Paired Domain Glutathione S-Dunsferase Fusion Protein

Preparations-A full-length cDNA clone for the human PAX6 gene (24) was used as template in a PCR reaction with primers selected to am- plify sequence corresponding to amino acids 1-130. The amino acid sequence of the murine and human paired domains are identical, and we have chosen to refer to this construct with the murine nomenclature Pax-6 to avoid confusion. The PCR product was cloned into the SmaI site of pGEX2T (Amrad, Melbourne, Australia) and the resulting clones verified by direct sequencing. BL21(DE3) Escherichia coli were trans- formed and the fusion protein purified according to standard protocol (25, 26) using glutathione-agarose beads (Sigma). The Pax-:! paired domain glutathione S-transferase fusion protein was made in a similar fashion using the murine cDNA (kindly provided by Dr. Greg Dressler, NIH) as template.

Polyhistidine-tagged Pax-6 Paired Domain Protein Preparation- Sequence corresponding to amino acids 1-130 of human PAX6 (24) was

* The abbreviations used are: prd, the Drosophila paired gene; PCR, polymerase chain reaction; EMSA, electrophoretic mobility shiR assay.

a355

Page 2: Identification of a Pax paired domain recognition sequence and

8356 Identification of a Paired Domain Recognition Sequence

amplified from full-length PAXfi cDNA with PCR primers engineered to contain BnmHI restriction sites. The amplified product was cloned into the BnrnHI site of pETlfib (Novagen, Madison, WI), RL21(DE3) E. coli were transformed. and isolated clones were verified by direct sequenc- ing. Bacterial cultures were induced with 1 mv isopropyl-1-thio-B-r)- galactopyranoside for A h, collected, and washed with cold phosphate- buffered saline and resuspended in French press buffer (25 mM NRHPO.~. pH 7.4, 150 mM NaCI. 4 mM P-mercaptoethanol. 10V sucrose) with 2 mu EDTAand lysed with a French press at 1100 p.s.i. twice. The lysate was spun at 16,000 x g for 10 min and the pellet washed with cold French press buffer. The desired protein was then eluted from the pellet with 0.5 M NaCl in French press buffer and loaded on a nickel resin column (Quiagen, Chatsworth, CA), washed as directed (Novngen PET system manual), and eluted in 0.5 M imidazole. 0.5 \I NaCI, 20 mM Tris, pH 7.9. The eluate was then loaded on a NAP-10 column (Pharmacia LKB Riotechnology Inc.), eluted with 300 mM NaCI, 10 mhl NaHPO.,, pH 4.6 (for NMR studies) or 150 mM KCI. 10 mM NaHPO,l. pH 7.4 (for CD studies), and concentrated in a Centriprep filter apparatus (Amicon. Danvers, MA). CD spectra of the I ' m - t i paired domain were identical when the protein was prepared and analyzed in either buffer, and the P a x 4 paired domain was able to hind appropriate recognition se- quences a t both pH values as assessed by EMSA (data not shown).

Full-length Pnx-fi Protein Preparation-The entire coding region of the human PAXfi gene (24) was amplified by PCR and cloned into the BamHI site of pVL941 (Pharmagen. San Diego, CA) and processed according to the supplier's protocol. One milliliter of amplified baculo- virus stock was used to infect -7 x 10" TN5Rl-4 cells (JRH Biosciences, Woodland. CA) in 100-mm tissue culture dishes. After 72 h, cells from each plate were collected, resuspended in 200-pl phosphate-buffered saline, and vortexed to disrupt cells. This cell extract was used imme- diately and was diluted 1000-fold for EMSA reactions.

Optimal Binding Site Srlection Procedure-An 8.7-base pair oligo- nucleotide was synthesized with the following sequence: 5'-GTCAA-

CAGTCCCTATCC-3'. The length of the central randomer stretch was selected, because our prior selection experiments with only 12 random positions resulted in selection of binding sites that overlapped the flank- ing sequences (data not shown) and because a binding site of 24 base pairs had been implicated by ethylation interference for Pax-1 (17). The pool of "randomers" was made double-stranded by annealing with primer R (5'-CGATAGGGACTGAGCACGGATCCCT-3') and extending with the Klenow fragment of DNA polymerase I under standard condi- tions (26) including Iw:"PICTP, prior to gel purification on a 10% poly- acrylamide gel. Approximately 5 pg of the eluted douhle-stranded oli- gonucleotides were used for the first round of selection. Each round of selection was carried out in a 20-pI volume including 2 pI of fusion protein attached to glutathione-agarose heads in a 1:l slurry with phos- phate-buffered saline corresponding to about 50 ng of protein. Samples were gently agitated a t room temperature for 40 min in 20 mM Tris, pH A, 150 mM KCI, 0.5 mM EDTA, 1 mu dithiothreitol, 10'7 glycerol, 0.5 mg/ml bovine serum alhumin, 250 pg/ml poly(d1,dC) with approxi- mately 100 ng of labeled DNA. The fusion protein heads were then washed with ice-cold buffer twice and resuspended in 25 pI of a solution containing 5 units of thrombin (Sigma, catalog No. T6634) and slowly vortexed for 15 min. Thrombin cleavage allowed only specifically hound probe to he released into the supernatant, while DNA hound nonspe- cifically to the agarose beads remained in the pellet. The supernatant was phenol- and chloroform-extracted. and 5 pl was used as template for a PCR reaction containing 500 ng of primer A (R'-GTCAACGTC- GACACCGAATTCGCGG-3') and primer R (20 p~ dATP. dTTP, dGTP, and 2 p~ dCTP with 5 pl [n-Y'ICTP (3000 Ci/mmol. DuPont NEN)). Amplification was performed for 20 cycles (94 "C for 30 s, 65 "C for 30 s. and 72 "C for 1 min) followed by 1 min a t 94 "C, 1 min a t 65 "C, and 10 min at 72 "C after additional primers and Taq polymerase were added. This step was to insure homoduplex formation. The samples were then phenol- and chloroform-extracted, gel-purified on a 1 0 3 polyacrylamide gel, and used for the next round of selection.

GMSA-EMSA reactions were camed out at 4 "C in 10 mM HEPES pH 7.4. 50 msf KCI, 1 mM 8-mercaptoethanol, 2.5 mg/ml bovine serum albumin. 250 pg/ml poly(dI.dC), 20% glycerol. 10.000 cpm DNA probe (2-5 x 10"cpm/pg), 1 pr (unless otherwise indicated) paired domain protein (as determined by Rio-Rad protein assay with bovine serum albumin as standard) in a 20-pI volume for 30 min prior to loading on a 6% precooled polyacrylamide gel in 0.5 x TRE and electrophoresing for 90 min a t 240 V a t 4 "C. The gels were then dried and subjected to autoradiography.

DNase I Footprinting Assay-DNA probes were labeled on one strand by first phosphorylating either primer A or R with Iy-,""PIATP and

CGTCGAGACGGAATTCGCGGCCGC(N)Z,CTCGAGGGATCCGTGCT-

ROUND: 0 1 2 3 4 5 6 7 8 9

SHIFTED + COMPLEX

E5 PROBE-)

FIG. 1. Selection of specifically bound sequences with PAX4 paired domain. EMSA demonstrating progressive ennchment of spe- c~fically hound sequences after successive rounds of selection is shown. An 83-base pair oligonucleotidr contatning a central r r ~ o n of 25 ran- dom nucleotides was used a s probe in lnnc I . After each round of selec- tion (see "Materials and Methods"). the enriched pool of sequences was used as probe in successive lanes. For comparison, a 45-hasr. pair prohe containing the e5 sequence was included in the Innt lane with 1 p~ I'ax.ti paired domain protein demonstrating that none of the e5 prohe is shifted under these conditions.

polynucleotide kinase and then amplifying the sequence to be foot- printed using unlabeled second primer with PCR for 20 cycles with cycle temperatures as described above. Probes were purified on a loci poly- acrylamide gel prior to use. DNase I footprint reactions ,271 wrre car- ried out in 20 pI of EMSA buffer with 10 mu MSI,. 2 mM CaCI,. and 0.05 units of RNase-free DNase I iRnchringer Sfannheiml for 30 s a t room temperature. Reactions were quenched with ice-cold 1 u ammo- nium acetate and 1 mu EDTA, and the DNA was ethanol-precipitated. Samples were resuspended. heated to 90 C for 2 min. placed on ICC. and electrophoresed through a 10q polyacrylamide gel containing A M urea. dried, and subjected to autoradiography.

Methylation Interference Assav-Interference assays were performed as described (26) with dimethyl sulfate and hydrazine-modified probes (281. DNA probes were end-laheled on one strand as described above or by end labeling a synthesized oligonucleotide prior to annealing with its reverse complement. Modified probes specifically hound during EMSA by the P a x 4 paired domain were eluted, cleaved with piprridtne, and electrophoresed on 1 2 7 polyacrylamide gels containing urea in parallel with total and nonbound (free) prohe. CD Spectroscop.v-The purified polyhistidine tagged P m - f i paired

domain was used for CD spectroscopy a t a concentratlon of 1.5 p~ with or without 0.15 p~ double-stranded DNA (5'-(XCCGCAGGATG- CAA"'TCACCCATGAGTGCCTCGA-~'I using an Aviv 62DS type spec- trometer. Standard conditions (unless noted) were 150 mu KCI. 10 m.w NaHPO,l. pH 7.4, 25 "C, 1-cm path length cuvette. Thermal melt ex- periments were performed between 25 and 90 'C.

RESULTS AND DISCUSSION

In order to identify the optimal DNA sequence recognized hy the Pax-2 and Pax4 paired domains, we expressed these do- mains as fusion proteins with glutathione S-transferase and used these purified paired domain proteins to select specific sequences from among a pool of 83-base pair oligonucleotides containing a core of 25 random nucleotides. Sequences that hound specifically were amplified by polymerase chain reaction and subjected to further rounds of selection. After each round, the selected sequences were used as probes in an EMSA, and enrichment for specifically bound sequences was assessed f Fig. 1). Conditions of selection were stringent and were chosen to prevent nonspecific binding of the paired domains to unselected randomers. Fig. 1 shows the successive enrichment achieved through serial rounds of selection with the Pax-6 paired do- main. A similar experiment using sequences selected with the Pax-2 paired domain also resulted in a clear progressive en- richment in the fraction of probe shifted (data not shown). The amount of DNA used for the first round of selection ( 5 pg) corresponds to only about O.5a of all the possihle sequences generated with 25 random positions (42' possibilities). I t is

Page 3: Identification of a Pax paired domain recognition sequence and

Identification of a Paired Domain Recognition Sequence 8357 TABLE I

Sequences selected with the Pax4 and Pax-2 paired domains Sequences obtained after successive rounds of selection are shown. After the final round of selection, bound sequences were amplified by PCR,

cloned into pGEMBZ, and sequenced. Sequences corresponding to the 25 random positions were aligned with the aid of the Genetics Computer Group program “Pileup” (GCG, Madison, Wisconsin). The number of each nucleotide in each position was tabulated as shown in Table 11. For each position, distributions that are nonrandom by chi-square analysis ( p < 0.05) are included in the consensus shown at the bottom of each table. Nucleotides that agree with the consensus are shown in upper case letters. W = A or T. K = G or T. M = A or C. Y = T or C. R = A or G. S = Gor C. A, sequences selected with the Pax4 paired domain. B, sequences selected with the Pax-2 paired domain.

A . . . . . . . . . . .

2 gtga AgtgTttCGC AcCgtTTCaC t..... 1 tgtga AgtcaacCGt AaGAaTTAaa

4 . . . . tgggat AaaTTCACGC gTtgacTga. . . . . . .

7 AatgTtACGC TTCAcTTAaC acgca. 6 AatgTtACGC TTCAcTTAaC acgca.

8 . . . . . . . . . . AttTTtcCGC TTCAaTTCaT ccgcg.

10 .taTTtcCGg TTnAaTGAaT gtttac 9 .taTTtcCGg TTnAaTGAaT gtttac

11 . . . . . . . . . . .taTTtcCGg TTnAaTGAaT gtttac 12 . . . . . . . tca ctcTTtACGC TTGAacGAtT ct . . . . 13 . . . . . . . . . . .gccTtACGg TTCAcTGAgC gttagt 14 . . . . . . . . . . ctgTTCACGC ATGAcTGgtT acgcg. 15 . . . . . . . . . . cagTTgAaGC ATGAagGttT ttacg.

. . . . . . 3 . . . . catgtt AgaTTCACGg TcGgcTTgg. . . . . . . 5 . . . . . . acgt caaTgCACtt TctcgaTggC t.....

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

16 . . . . . . . . tg gtcTTCAaGn ATaAaTTtta cgc . . . 17 . . . . . . . . gt tcgTTCAaGC ATaAaTTtta cgc . . . 18 . . . . . . . gac ctgaacgtaa ATtAaTTtca cg . . . .

. . . . . . . . . 20 cgtaaaa AccTTCAtGg TTCAacTg.. 19 cgtaaaa AccTTCAtGC TTCAacTg..

21 . . . cggaaat cttgTtACGC CTCAtTTg.. . . . . . . 22 ..cgtgaaat taaTTtACGC TTCAggT . . . . . . . . . 23 . . . aggatgc AgtTTCACGC ATGAgTGC.. . . . . . . 24 . . . aggatgc AgtTTCACGC ATGAgTGC.. . . . . . . 25 . . . aggatgc AgtTTCACGC ATGAgTGC.. . . . . . . 26 . . . aggatgc AgtTTCACGC ATGAgTGC.. . . . . . .

28 cacaggatgc AatTTCACGC ATGAg.... . . . . . . .

30 . . . . agtaat tcgaTCACGC ATGgtTTCa. . . . . . . 31 . . . . agtaat tcgaTCACGC ATGgtTTCa. . . . . . . 32 . . . . . gtagg ccgTTCACGC TTGAcaTCag . . . . . . 33 . . . . . ataaa ttcTTCACGC ATGAgTTCcC . . . . . . 34 . . . . . ataaa ttcTTCACGC ATGAgTTCcC . . . . . . 35 . . . . . ataaa ttcTTCACGC ATGAgTTCcC . . . . . . 36 . . . . . ataaa ttcTTCACGC ATGAgTTCcC . . . . . . 37 . . . . . . . . . a AgtTTCACGC ATGAgTaAcC atga.. 38 . . . . . . agtg ctcTTtACGC ATGgtTTCaC a,... . 39 . . . . . acgtg gctTTCAtGC ATGAcTTCaC . . . . . .

41 . . . tagtgaa gtaTTCACGg TTCAgTGA.. . . . . . . 43 . . . . . tggaa AatTTCACGC cTGAcTGggT . . . . . .

. . . . . . . . .

27 . . . aggatgc AatTTCACGC ATGAgTGC.. . . . . . . 29 . . . atgcacg AatTTtACGC ATGAtTGA.. . . . . . .

40 ..atgctgaa AtgTTCACGC TcGAggG . . . . . . . . .

42 . . . . . . gcga AtgTTCACGC TTCAtTTgtT t.....

44 . . . . . ttcat tccTTaACGC TTCAcTGAcg . . . . . . 45 . . . . . ttcat tccTTaACGC TTCAcTGAcg . . . . . . 46 . . . tccgcta cgcTTtAtGC TTGAcTGC.. . . . . . . 47 ..acatctcg AtcTTCACtC TTGAaTG . . . . . . . . .

Consensus A--TTCACGC WTSA-TK”Y

1 2 3 4 5 6 7 8

1 0 9

11 12 13 14 15 1 6 1 7 18

20 19

21 22 23 24 25 26 2 7 28 29 30 31 32

. . . . . . . . . . . . . . . . cacG TCACGCATGA ctacgctcat

. . . . . . . . . . . . . . . . gTta TCACGCATGA ctgcgcacac

. . . . . . . . . . . . . . . . . tag TCACGCATGA gtagtcccct

. . . . . . . . . . ..ccaacatG TttCGttTGA ctgctcg . . .

. . . . . . . . . . . . . . . . tTaa TgAgcCATGA cgttacggct

. . . . . tgtca acgtcacTcG TCACGCATGA . . . . . . . . . .

. . . . . tgtca acttcncTcG TCACGCATGA . . . . . . . . . .

. . . . . tagga ctcatttTcG TCACGCATGA . . . . . . . . . .

. . . . . . . . ca tttctctccG TCACGCAccA ttg . . . . . . .

. . . . . . . . . . ggtttaacgG TCACtgGatc ccata . . . . .

. . . . . . . . aa cgcgtagagG TCACGgGccA gGG . . . . . . .

. . . . . . . . . . . . . . . cggtcagTgG TggataAcaA cacag

ttt aggaaatcga TCATGgcctA gc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ggcc ctagacgcac cagTcgtTGt g cgtg atagcgcgta TtgTcgtTGA g

. . . . . . . . . . . . . . .

. . . . . . . gtg tcatgcacca TCAgtgATGc tc . . . . . . . .

. . . . . . . gtg tcatgcacca TCAgtgATGc tc.... . . . .

. . . . . . cgtg agatgtgTta aCATGtGTGc a . . . . . . . . .

. . . . . . ccag ccacgcgTtG TCtaatGTaA t . . . . . . . . .

. . . . . . . . . . . . . . . aag cgaacatTaG TtcattGatA cg gcat cgtatcaatG TttaatGaag t

. . . . . . . . . . . . . . .

. . . . . . . . ag cgcatcaTaG attTaCGgtA tgc . . . . . . .

. . . . . ggaac agtttgtctG atAaGtGTtg . . . . . . . . . .

. . . . . . . act cgcttcgTtt atAaGCGTtA ag . . . . . . . .

. . . . . . . ctg tgctggtagt ctcTGCATtA tg . . . . . . . .

. . . . . . . . . . .catgtcTgt TCATtCAcac ttgatt . . . .

. . . . . . . . . . .tttgtcact TCtCGatgct tctatt . . . .

. . . caagtca tgatgccTaG TCATGtGa.. . . . . . . . . . . aggtgtttta agctgccacG TCATG . . . . . . . . . . . . . . . cactgactaa gtcacttTcG aCACG . . . . . . . . . . . . . . . . . . . . . taca catggttgaG TCcCGaGTGt c... . . . . . . . . . . . . . . . . ..utcuccaa TuATGCAutt aucutcc . . .

“ “

Consensus T-G TCAYGCRTGA

therefore possible that the optimal sequence was not present in the original mixture. However, the consensus sequence re- ported here was obtained after analysis of many individual isolates and should therefore converge toward the optimal binding sequence.

The selected oligonucleotides are nonrandomly distributed and share a common sequence motif (Tables I and 11). The consensus binding sequences for Pax-6 and P a x 2 paired do- mains, though not identical, are very similar. The Pax-6 con- sensus spans 20 base pairs and shares a central 10-base pair region of homology with the Pax-2 consensus consisting of TCACGC-TGA (where the dash indicates a difference between the two sequence^).^ This sequence is preceded by a guanine in the Pax-2 consensus and by a thymidine in the Pax-6 sequence

We do not believe that cross-contamination during the PCR steps of the selection is likely, since these selections were carried out at different times, and each was performed in parallel with other selections (not reported here) that converged to other appropriate sequences. The se- lection observed for peripheral residues in the Pux-6 selection uersus the Pax-2 selection likely derives from slightly more stringent conditions in the former and the larger number of sequences analyzed.

(Fig. 2C, position 1). Of note, neither consensus sequence nor any individual selected sequence contains a GTTCC.

The purified paired domain proteins exhibited high affinity for their respective consensus sequences. An apparent dissocia- tion constant was estimated as described (29) by performing binding reactions followed by EMSA with low concentrations of the Pax-6 consensus sequence probe relative to Pax-6 paired domain protein. These experiments yielded a Kd of 2.5 x M for the Pax-6 paired domain with its consensus. This value is in the range of that reported for other DNA-binding proteins, such as the Antennapedia homeodomain and its target sequence (30). Although we have not directly measured the stoichiometry of binding, experiments indicated the Pax-6 paired domain binds to this consensus sequence as a monomer. This was dem- onstrated by performing EMSA reactions with a mixture of a glutathione S-transferase-Pax-6 paired domain preparation (molecular mass, -42 kDa) and a polyhistidine-tagged Pax-6 paired domain preparation (molecular mass, -17.6 kDa) as described (31). Two shifted bands with differing mobilities were seen, corresponding to those seen when each preparation was

Page 4: Identification of a Pax paired domain recognition sequence and

8358 Identification of a Paired Domain Recognition Sequence

TABLE I1 Summary of sequences obtained after successive rounds of selection

nucleotide in each position was calculated. The consensus sequence is shown in the bottom row. For each position, distributions that are nonrandom Sequences corresponding to the 25 random positions were aligned so as to maximize homology as shown in Table I, and the number of each

by chi-square analysis ( p < 0.05) are included in the consensus. Positions outside those shown in the table appeared random.

A. Sequences selected with the Pax-6 paired domain

G A

6 3 1 0 9 4 1 1 1 0 4 4 7 1 0 26 6 1 5 3 20 1 4 2 1 1 0 8 3 2 3 4 0 3 1 1 23 1 2 4 0 1 4 2 1 13 14 4

9 4 3

T 10 10 18 16 38 44 14 1 5 2 2 21 42 3 0 7 37 25 C 7 9 9 1 4 2 0 2 9 5 39 0 3 6 2 4 13 1 1 1 4 0 17 8 12

4 6 9

Consensus: A T T C A C G C A T T G I C A T GIT A/C TIC

B. Sequences selected with the Pax-2 uaired domain

G 4 7 2 6 1 9 0 3 3 3 2 0 7 1 1 3 1 4 2 5 7 5 A

2 7 5 7 8 8 5 1 2 1 6 3 3 1 4 4 4 1 8 3 0 3 3

T 6 9 1 5 8 4 2 5 8 5 1 0 6 7 4 1 8 8 4 9 6 2 C

T Consensus: G T C A C I T G C G I A T G A

2 1 0 1 0 8 1 0 1 2 2 0 3 13 3 1 3 1 5 3 5 8 5 3 3

-

used individually (data not shown.) No band of intermediate mobility was observed. This type of analysis assumes that dimers, if they exist, are in rapid equilibrium with the mono- mer pool, and a stable or small amount of dimer formation would not be detected by this method. The affinity of the Pax-6 paired domain for its consensus was similar when expressed as a glutathione S-transferase fusion or a polyhistidine tagged protein.

We assessed the ability of the Pax-2 and Pax-6 paired do- mains to recognize the consensus sequences compared with the e5 and modified e5 sequences. The modified e5 sequence de- noted PRS4 was reported to bind more tightly to the Pax-l and Pax-2 paired domains than other sequences tested (17, 20). It has been suggested that Pax-6 prefers another modified e5 sequence, PRSS (4). Fig. 2.4 shows the results of gel shift ex- periments performed at high stringency compared with those performed to measure the binding constant. A high nonspecific competitor DNA concentration (250 pg/ml poly(dI.dC)) is pre- sent in order to better demonstrate the relative specific binding affinities. Under stringent conditions, the Pax-2 paired domain binds to its consensus sequence preferentially, less well to the Pax-6 consensus, and even less well to PRS4. The Pax-6 paired domain binds equally well to the Pax-6 and Pax-2 consensus sequences and less well to PRS4. No detectable binding to e5 or PRSS under these conditions was noted in either case.

Support for the physiologic relevance of the consensus se- quences we have identified comes from the identification of similar sequences within the promoter regions of candidate target genes. Zannini et al. (23) have reported sequences within the thyroperoxidase and thyroglobulin promoters that are rec- ognized by Pax-8. In cotransfection experiments, P a x 4 was able to activate transcription from these promoters. Pax-8 is expressed in the developing thyroid gland and is an early marker of thyroid progenitor cells. Also, Adams et al. (22) have identified two sequences within the CD19 promoter recognized by the Pax-5 paired domain and showed that Pax-5 can activate transcription when one of these binding sites is placed up- stream of a heterologous promoter. They also noted that a single nucleotide insertion within one of the CD19 promoter sequences improved binding significantly and that the Pax-1 product binds more tightly to this mutated CD19 sequence than it does to PRS4. Pax-5 is expressed in B and pre-B lym- phocytes, and CD19 is a membrane protein found in these cells thought to be involved in signal transduction.

Optimal alignment of these sequences reveals striking ho- mology between the consensus sequences we have determined for the Pax-2 and Pax-6 paired domains and the thyroid-specific and CD19 promoter sequences recognized by the Pax-8 and Pax-5 paired domains, respectively (Fig. 2C). Among the larger

family of Pax genes, Pax-2, Pax-5, and Pax-8 have been classi- fied as a subgroup based on similar amino acid sequence and intron-exon boundaries (5). Also, homology exists outside the paired domain including the presence of a homeodomain rem- nant consisting of only the first a-helix (3). The Pax-6 and Pax-4 paired domains, which are similar to each other, are closely related to this group. When the ability of the Pax-2 and Pax-6 paired domains to recognize these promoter sequences was analyzed, it was found that the consensus sequences were bound most tightly, followed by the mutated CD19.2 sequence (Fig. 2 A ) . Interestingly, of the sequences tested, the mutated CD19.2 sequence is most similar to the consensus sequences (Fig. 2C). The CD19.1 sequence was also well recognized, fol- lowed by the thyroglobulin and thyroperoxidase sequences. We have also demonstrated that the full-length Pax-6 protein ex- pressed in insect cells using a baculovirus expression vector binds efficiently to these various probes with the same rank order affinity as the purified paired domain (Fig. 2 A , top panel ). The suggestion that many paired box-containing proteins bind a similar core sequence is supported by the recent results of similar selection experiments with Drosophila prd and the Pax-8 paired domain that have identified a very similar se- quence c01-e.~

Interference experiments with dimethyl sulfate and hydra- zine-modified probes also support this hypothesis. The thy- roperoxidase sequence may be bound least well due to the lack of a guanine residue on the opposite strand corresponding to the fifth position of the sequences as listed in Fig. 2C. This guanine is implicated as a contact point in our experiments with Pax-6 (Fig. 3, B and D ) and by those of Zannini et al. (23) with Pax-8. Our analysis also identifies several other contacted nucleotides on both sides of the double helix and concentrated within a central region of the consensus sequence. This region correlates precisely with the conserved sequence motif that was identified by inspection of the selected sequences and is con- served between the Pax-2 and Pax-6 paired domain consensus sequences. Comparison of our interference data with those re- ported for the thyroid-specific and CD19 promoter sequences suggests that several positions within the conserved motif are critical for binding. When a guanine is present in position 14 of Fig. 2C, it is always contacted. This is also true for the binding of the sea urchin protein TSAP, an invertebrate Pax-5 homo- logue, to four histone promoter sequences that all contain a contacted guanine at this position (32).5 Also, guanines on the

S. Jun and C. Desplan, personal communication. We have chosen not to include a full analysis of the histone se-

quences, since TSAP is not a murine Pax gene product and since TSAP has not been established to be a paired domain-containing protein,

Page 5: Identification of a Pax paired domain recognition sequence and

A Identification of a Paired Domain Recognition Sequence

B 8359

Bac-Pax6 + "1 Y Y

GST-Pax2 PD -+ & - Probe + ,*

PaxGCON PaxGCON \4

11 Complex+ Y - GST-Pax6 PD -+ J3 dlr

Free Probe + Probe *

PaxGCON \7 Pax6CON \13

C 1 2 3 4 5 6 7 8 9 10111213141515

1. PAX-6 CON TF-~FA*CGCWT'SXNTRMN 2. PAX-2 CON G ~ C A Y G C R T G A N N N N N 3. "UT CD19.2 GTCACGCCTCAGTGCC 4. CD 19.1 GGCACTCAACCATGGG 5. TG AGAACA,CTTGA'CTGGG 6. TPO T C T A A ~ C T T G A G T G G G

7. PRS4 TTCCGCTCTAGATATC 8. E5 TTCCGCTCAGGCTCGC

FIG. 2. Re la t ive b ind ing of the Pax-6 p a i r e d d o m a i n to puta t ive recogni t ion 8equences .A. EMSA reactions were prrformrd with 0.1 pw P a x 4 or Pox-2 paired domain or 0.01''; of a wholc crll lysate ohtainrd from a 100-mm dish nf infected TN.CBI4 crlls rxprrssing full-lrnflh Pax-6 (Bac-Pax6). All DNA prohes were made to idrntical specific activity hy end lahrling onr strand in parallrl rractions pnor to annrallng with the, opposite strand and gel purification. The entire gel for the P o x 4 rxperiment is shown, with only thr shiftrd sprcirs for th r Pax-2 and full-lrnflh Pax-6 experiments shown in the fop two pnnrls. The PRSS prohr was not included in the Bac-Pax6 expcrimrnr. Thr srqurncrs of thv prohrs urcd a re as follows: P a x 4 paired domain consensus (PaxRCon), 5 ' - T G G A A 1 T C A C G A ( ~ A ( ~ ( ; ( ' T T ( ; ( ' 7 T ( ; A G T T ; Pm.2

CAAGAACAC?TGACTGGGCAGTGGAGCATGGAGTCCAT-.1'; thyroperoxidase 1 TPO (2.1, . :i'-('(:TCT<'TM(;CTT(;AC:7'(;C,(;(:AT('A(;ACC:AT(;- paired domain consensus (Pax2Cnn). 5'-TGGAATTCAGGAGTCACGCATGAGTGA(~TC(;TTAC(~T~~CA(;TA-:j'; thyroglohulin IT(; 1 ( 2 3 I. 5'4'T-

GAGTCCAT-3'; mutated CD19.2 (mutCD19.2) (22). S'-AACGCGC.TG(:TCA(~CC~~TCAGTGCC~(~A1T(~T('(~A(;~~A(;:j ' ; ( 'Dl9 .1 (221, 5'4XXY'T-

5'-CA7TAGCACCG7TACGCTCTAGATATCTCGACGTGC-.1'; rS( 17). S'-CCGCACGA1TAC.CACCCTTCC(;('T(:AC.(;('T('(;(;-:~'. B . EMSh rrac- tions were performed undrr identical stringent conditions with varying amounts of Pax-6 pairrd domain and rlthrr thr wild typr Pax-6 cnnsrnsus prohe (PuxGCON) or similar prohes containing single nucleotide alterations. PaxfiCONA4 contains an A to (J changr at the. fnurth pns~t~on as numhered in part C . PaxfiCONA7 contains a C to Achange at position 7. PaxRCONAI.1 contains a T to C changr at positlon 13. Forrach prohv uwd. sequential lanes contain 0.0.01.0.1, and 0.25 VM glutathione S-transferase-Pax6 pairrd domain. (1. optimal alignmrnt of thr prohr srqurncrs usrd in A is shown. Nucleotides identical to those found in the same position of the Pax-2 and I'rrx-6 consrnsus srqurncrs arr shnrfcd. I'RS4 and r 5 sequences are included, hut optimal alignmrnt is not implied. Symbols proposed by the IUR nomenclaturr committrr are usrd (391 as fnllows: \\' = A or T, S = C or C , K = G or T, M = A or C. R = A or G. and N = G. T, A, or C. 7 C . thyroplohulin; TI'O. thyroprroxidasr.

GGAGCGCACTCAACCATGGCTGTCTCCGGGGATAG-.1'; PRS4 ( 17 ), S'-TCGCCTCACCGTTCCCCT~TAC.ATAT('T('C.A(.(;TC.~~.'j'; I'RS9 1 17 I.

opposite strand corresponding to positions 5 and 7 of Fig. 2C are also contacted whenever present in the Pax gene binding se- quences. Fig. 2C reveals three positions that are invariant cor- responding to positions 4, 7, and 13, each of which is contacted in our experiments. Oligonucleotides synthesized with single nucleotide alterations in each of these positions were used in EMSA experiments with pax-6 paired domain protein and the relative affinities compared with the Pax-6 consensus probe (Fig. 2B). In each case, binding to the mutated probes was re- duced but only slightly. This suggests that no single nucleotide

although this is very likely. However. a t least one of these histone sequences (H2A-2.2) matches our consensus sequence very closely (32), suggesting that paired domain proteins from distantly related species also may recognize the consensus sequence we have selected.

position is essential for binding of the pairrd domain. The consensus sequences reported here show little homolop

to the e5 sequence bound by the Drosophila prri grnr product. In this regard, i t is interesting to notr that the 1)Nasr I foot- print reported for prri with e5 covered only I 5 basr pairs. and binding was unaffected by deletion of the carboxyl-trrminnl third n-helix of the paired domain or hy mutation of t h r s rcond helix (8). The amino-terminal portion of the pairrd domain. including the first helix, was thereforr thought to hr respon- sible for DNA binding. However, deletion of the carhxyl-ter- minal portion of the Pax-1 pairrd domain did rrducr hinding affinity to modified e5 sequences ( 17 ). and similar deletions of the Pax--5 product resulted in deceased hinding to a histone promoter sequence (22). DNase I footprint analysis of several

Page 6: Identification of a Pax paired domain recognition sequence and

8360 Identification of a Paired Domain Recognition Sequence

A, [PaxG-PD] G B2 DMS

C. Hydrazine

c c F B F B For Rev

A T T G " For

C B F

G T A

C T A C G T T A A A G T G C G T A C T C A

- c G

FIG. 3. Characterization of the P a s 4 paired domain recognition se- quence. The binding reaction of the Pax-6 paired domain with one of the se- lected sequences, which closely resembled the derived consensus sequence, was characterized. A, DNase I footprint. The first fane shows DNase-treated probe. The following fanes contained increasing amounts of Pax-6 paired domain. A G lad- der is included in the final fane, marked G , derived from Maxam and Gilbert se- quencing (28). B and C, methylation in- terference with DMS and hydrazine- modified probes. B = bound probe. F = free probe. The first two lanes in each panel represent the forward strand (For) . The next two fanes represent the reverse strand (Rev) . Corresponding hands in hound and free lanes were compared after quantifying with a PhosphorImager (Mo- lecular Dynamics) and normalizing for the total number ofcounts in each lane. D,' summary of DNase I footprint and inter- ference results. Footprinted area of the sequence is shaded. Residues that, when modified, reproducibly interfered with binding are indicated by triangles. Num- bers above the nucleotide sequence corre- spond to positions as numbered in Fig. 2C. Data from experiments not shown, in- cluding DNase I footprinting of the re- verse strand. are also included.

C G r G A

J c G "

C A C T T T

C T

T T I G T -6-

t - 0 - 6 ..

A A I C A : T C - C G A T T a

A G - * C A

I" - - e

- * . I-

D 1 2 3 4 5 6 7 8 9 10 11 1213141516

A A A A A A A A A A A A A A

not accounted for by the CD spectrum of the DNA alone, which was negligible a t the concentration used, or by change in the conformation of DNA (33j7 and could not be reproduced when a nonspecific DNA such as the e5 sequence was added. I t should be emphasized that the spectroscopic change shown in Fig. 4 resulted from the addition ofonly one-tenth molar ratio of DNA to protein. We were limited in our ability to add increasing amounts of DNA, because, under the conditions used, the pro- tein-DNA complex was markedly less soluble than the protein or DNA alone, and we were unable to achieve greater than 0.2 p~ concentrations of complex. The protein, however, did not become insoluble when the e5 oligonucleotide was added, even at 10-fold higher concentrations. Furthermore, CD and NMR analyses of an alternatively spliced form of the Pax-6 paired domain, which contains a 14-amino acid insertion and does not bind the consensus sequence in EMSA experiments, revealed similar base-line spectra, but no change in the CD spectra was observed with addition of DNA (data not shown.) An altered conformation upon binding DNA has been reported for the basic DNA binding regions of GCN4 f34,35), c-Myc (36 ), Fos and Jun (37). and other leucine zipper-containing peptides ( 3 8 ) . Al- though the paired domain may not be structureless when ex- pressed in the context of the full-length protein, these results provide the first experimental evidence for n-helical regions in the paired domain and support the importance of these n-hel- ices for DNA binding.

It is very unlikely that changes in DNAconformation account for t h r altered spectrum we have observed, since extensivr CD studies of DNA conformational changes have demonstrated dlfferencr sprctra In t h r 222-nm range corresponding to only a fraction of that rrported hr r r (3.7).

sequences selected with the Pax-6 paired domain showed that the protein protected a 28-base pair segment of DNA from DNase I digestion on each strand (Fig. 3, A and D ). We suggest tha t the e5 sequence offered appropriate contacts for only a portion of the paired domain and, in the presence of a more optimal binding sequence, other portions of the paired domain protein are important for the proper recognition of DNA.

Although secondary structure predictions indicate that the paired domain contains three regions of potential a-helicity, no direct structural analysis of the paired domain has been re- ported. In an attempt to define the tertiary structure of the paired domain, a highly purified and concentrated solution (0.6 mM) of the Pax-6 paired domain was investigated by one- and two-dimensional (nuclear Overhauser effect spectroscopy) NMR and found to be nearly structureless in solution." This stands in marked contrast to the well defined tertiary structure evident by NMR analysis of the isolated Antennapedia home- odomain.

When the pure P a x 4 paired domain was analyzed by CD spectroscopy, a spectrum with relative minima at 208 and 222 nm was observed (Fig. 4A). Although this could be consistent with a very small amount of n-helical structure, thermal dena- turation CD studies (Fig. 4B) failed to show a cooperative tran- sition, consistent with the protein being largely unstructured. However, when a %-base pair oligonucleotide containing a se- quence selected by the Pax-6 paired domain was added to the solution, the CD spectrum of the protein-DNA complex re- vealed decreased minima at 208 and 222 nm consistent with a marked increase in a-helical content (Fig. 4A ). This change was

fi J. Epstein, S. Hyberts. R. Maas, and G. Wagner. unpublished data.

Page 7: Identification of a Pax paired domain recognition sequence and

Identification of a Paired Domain Recognition Sequence 8361

A

-20 I t- 0.15pM DNA ALONE - 1SpM PaxbPD "t Pax&PD/DNA COMPLEX + DIFFERENCE SPECTRUM

-254 ' ' ' ' ' '

190 210 230 250

WAVELENGTH (nm)

8 0 1 I

-8000-1 I 3 0 5 0 7 0 9 0

TEMPERATURE (OC)

with and without specifically recognized DNA. A, the spectrum FIG. 4. Circular dichroism spectra of Pax-6 paired domain

reveals decreased minima a t 208 and 222 nm when DNA is added. Note that only one-tenth molar ratio of DNA is used. The data are displayed as millidegrees of ellipticity as a function of wavelength in order to

when DNA is present. The protein has 157 amino acids, and the calcu- avoid assumptions regarding the fraction of protein in the bound state

that all of the DNA is in the bound state occupying one-tenth of the lated molar ellipticity at 222 nm is -4830 mdeg.cm2/dmol. Assuming

protein molecules present, then the molar ellipticity at 222 nm of the bound protein decreases to -23,073 mdeg.cm2/dmol. B, thermal dena- turation of the Pax-6 paired domain fails to demonstrate a cooperative transition. CD at 222 nm was monitored, while the temperature of the cuvette was increased from 25 to 90 "C. The results indicate a lack of ordered structure in the absence of DNA.

In summary, we have identified consensus DNA binding se- quences for the Pax-2 and Pax-6 paired domains that are simi- lar to each other and to endogenous promoter sequences that have previously been thought unrelated. While subtle differ- ences in DNA recognition sequences for specific Pax gene prod- ucts are likely, these results indicate that the products of the Pax-1, Pax-2, Pax-5, Pax-6, and Pax-8 genes can all recognize a similar sequence. In addition, the paired domain, when binding to such a DNA sequence, adopts an a-helical conformation and contacts residues within a large span of the DNA molecule. The relatively long Pax recognition consensus sequence may make it possible to identify target genes by directly searching data bases of known promoter sequences for homology. An initial

search of the GenBank data base has identified the thyroglobu- lin promoter sequence already investigated, among other can- didates. As further genomic sequence becomes available, we hope that this will provide a simple and rapid tool for identi- fying genes potentially regulated by Pax gene products.

Acknowledgments-We thank Drs. Gerhard Wagner and Sven Hy- berts for assistance with the NMR analyses; Drs. Peter Kim and Carl Pabo and members of Dr. William Chin's and Dr. Alan Michelson's laboratories for helpful comments on the manuscript; David Wilson, Susie Jun, and Dr. Claude Desplan for binding site selection protocol and for sharing unpublished results; and Dr. Greg Dressler for kindly providing the Pax-2 cDNA.

Note Added in Proof-After submission of this manuscript, an article appeared (Czerny, T., Schaffner, G., and Busslinger, M. (1993) Genes & Deu. 7, 2048-2061) that identifies, by mutational analysis of a histone BSAP binding sequence, a recognition sequence very similar to the consensus sequences we have reported here.

REFERENCES 1. Gruss, P., and Walther, C. (1992) Cell 69, 719-722 2. Hill, R., and van Heyningen, V. (1992) Den& Genet. 8,119-120

4. Maulbecker, C. C., and Gruss, P. (1993) EMBO J. 12,2361-2367 3. Chalepakis, G., Tremblay, P., and Gruss, P. (1992) J. Cell Sci. Suppl. 16 ,6147

5. Walther, C., Guenet, J. L., Simon, D., Deutsch, U., Jostes, B., Goulding, M. D.,

6. Barr, F. G., Galili, N., Holick, J., Biegel, J . A,, Rovera, G., and Emanuel, B. S. Plachov, D., Balling, R., and Gruss, P. (1991) Genomics 11,424434

7. Bopp, D., Bum, M., Baumgartner, S., Frigerio, G., and Noll, M. (1986) Cell 47, (1993) Nature Genet. 3, 113-117

1033-1040 ~~~~ ~~

8. Treisman, J., Hams , E. and Desplan, C. (1991) Genes & Deu. 5, 594-604 9. Stapleton, P., Weith, A,, Urbanek, P., Kozmik, Z., and Busslinger, M. (1993)

10. Pilz, A. J., Povey, S., Gruss, P., and Abbott, C. M. (1993) Mamm. Genome 4,

11. Balling, R., Deutsch, U., and Gruss, P. (1988) Cell 65, 531-535 12. Baldwin, C. T., Hoth, C. F.,Amos, J. A., da-Silva, E. O., and Milunsky,A. (1992)

13. Vogan, K. J., Epstein, D. J., Trader, D. G., and Gros, P. (1993) Genomics 17, Nature 356,637-638

364-369 14. Hoth, C. F., Milunsky, A,, Lipsky, N., Sheffer, R., Clarren, S. K., and Baldwin,

15. Tassabehji, M., Read, A. P., Newton, V. E., Patton, M., Gruss, P., Hams, R., and C. T. (1993)Am. J. Hum. Gen. 52, 455-462

Strachan, T. (1993) Nature Genet. 3, 2 6 3 0

17. Chalepakis, G., Fritsch, R., Fickenscher, H., Deutsch, U., Goulding, M., and 16. Hoey, T., and Levine, M. (1988) Nature 332, 858-861

18. Frasch, M., and Levine, M. (1987) Genes & Deu. 1, 981-995 19. Momssey, D., Askew, D., Raj, L., and Weir, M. (1991) Genes & Deu. 5, 1684-

20. Fickenscher, H. R., Chalepakis, G., and Gruss, P. (1993) DNA Cell Biol. 12,

21. Goulding, M. et al. (1991) EMBO J. 10, 1135-1147 22. Adams, B. et al. (1992) Genes & Deu. 6, 1589-1607 23. Zannini, M., Francis, L. H., Plachov, D., and DiLauro, R. ( 1992) Mol. Cell. Biol.

24. Glaser, T., Walton, D. S., and Maas, R. L. (1992) Nature Genet. 2, 1-8 25. Smith, D. B., and Johnson, K. S. (1988) Gene 67, 3 1 4 0 26. Ausbel, F. (1989) Current Protocols in Molecular Biology, John Wiley & Sons,

27. Perbal, B. (1988) A Practical Guide to Molecular Cloning, pp. 723-725, John

28. Maxam, A. M., and Gilbert, W. (1980) Methods Enzymol. 65,499-560 29. Fried, M. G. (1989) Electrophoresis 10, 366376 30. Molter, M., Percival-Smith, A,, Muller, M., Leupin, W., and Gehring, W. J.

31. Hope. I. A., and Struhl. K. (1987) EMBO J. 6. 2781-2784

Nature Genet. 3, 292-298

78-82

Gruss, P. (1991) Cell 66, 873-884

1696

381-391

12,4230-4241

Inc., New York

Wiley & Sons, Inc., New York

(1990) Proc. Natl. Acad. Sci. U. S. A. 87,40934097

32. Baiberis, A., Superti-Furga, G., Vitelli, L., Kemler, I . , and Busslinger, M.

33. Gray, D. M., Ratliff, R. L., and Vaughan, M. R. (1992) Methods Euymol. 211, (1989) Genes & Deu. 3,663-675

389405 34. Weiss, M. A,, Ellenberger, T., Wobbe, C. R., Lee, J. P., Hamson, S. C., and

35. Talanian, R. V., McKnight, C. J., and Kim, P. S. (1990) Science 249, 769-771 36. Fisher, D. E., Parent, L. A,, and Sharp, P. A. (1993) Cell 72,467476 37. Patel, L., Abate, C., and Curran, T. (1990) Nature 347, 572-575 38. ONeil, K. T., Hoess, R. H., and DeGrado, W. F. (1990) Science 249, 774-778 39. Nomenclature Committee (1985) Eur. J . Biochem. 150, 1-5

Struhl, K. (1990) Nature 347, 575-578