genome-wide classification and evolutionary … et al., 1997; smolen et al., 2002; chinnusamy et...

15
Genome-Wide Classification and Evolutionary Analysis of the bHLH Family of Transcription Factors in Arabidopsis, Poplar, Rice, Moss, and Algae 1[W] Lorenzo Carretero-Paulet*, Anahit Galstyan, Irma Roig-Villanova 2 , Jaime F. Martı´nez-Garcı´a, Jose R. Bilbao-Castro, and David L. Robertson Department of Applied Biology (Area of Genetics), University of Almerı ´a, 04120 Almerı ´a, Spain (L.C.-P., J.R.B.-C.); Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, United Kingdom (L.C.-P., D.L.R.); Department of Plant Molecular Genetics, Centre for Research in Agricultural Genomics, Consejo Superior de Investigaciones Cientı ´ficas-Institut de Recerca i Tecnologia Agroalimenta `ries-Universitat Auto `noma de Barcelona, 08028 Barcelona, Spain (A.G., I.R.-V., J.F.M.-G.); Institucio ´ Catalana de Recerca i Estudis Avanc ¸ats, 08010 Barcelona, Spain (J.F.M.-G.); and Biocomputing Unit, National Centre of Biotechnology, Universidad Auto ´ noma de Madrid, 28049 Madrid, Spain (J.R.B.-C.) Basic helix-loop-helix proteins (bHLHs) are found throughout the three eukaryotic kingdoms and constitute one of the largest families of transcription factors. A growing number of bHLH proteins have been functionally characterized in plants. However, some of these have not been previously classified. We present here an updated and comprehensive classification of the bHLHs encoded by the whole sequenced genomes of Arabidopsis (Arabidopsis thaliana), Populus trichocarpa, Oryza sativa, Physcomitrella patens, and five algae species. We define a plant bHLH consensus motif, which allowed the identification of novel highly diverged atypical bHLHs. Using yeast two-hybrid assays, we confirm that (1) a highly diverged bHLH has retained protein interaction activity and (2) the two most conserved positions in the consensus play an essential role in dimerization. Phylogenetic analysis permitted classification of the 638 bHLH genes identified into 32 subfamilies. Evolutionary and functional relationships within subfamilies are supported by intron patterns, predicted DNA-binding motifs, and the architecture of conserved protein motifs. Our analyses reveal the origin and evolutionary diversification of plant bHLHs through differential expansions, domain shuffling, and extensive sequence divergence. At the functional level, this would translate into different subfamilies evolving specific DNA-binding and protein interaction activities as well as differential transcriptional regulatory roles. Our results suggest a role for bHLH proteins in generating plant phenotypic diversity and provide a solid framework for further investigations into the role carried out in the transcriptional regulation of key growth and developmental processes. Most biological processes in a eukaryotic cell or organism are finely controlled at the transcriptional level by transcription factors. Transcription factors usually contain two different functional domains in- volved in DNA binding and protein dimerization, activities that may be regulated by several mechanisms, including differential dimer formation (Riechmann et al., 2000; Amoutzias et al., 2007). In addition, tran- scription factors are usually encoded by multigene families, multiplying the number and complexity of possible transcriptional regulatory roles (Riechmann et al., 2000). Basic helix-loop-helix proteins (bHLHs) are widely distributed in all three eukaryotic kingdoms and con- stitute one of the largest families of transcription factors (Riechmann et al., 2000; Ledent and Vervoort, 2001). bHLHs represent key regulatory components in transcriptional networks controlling a number of bio- logical processes. In unicellular eukaryotes, such as yeast, bHLH proteins are involved in chromosome segregation, general transcriptional enhancement, and metabolism regulation (Robinson and Lopes, 2000). In animals, bHLHs have been involved in sensing envi- ronmental signals, in regulating the cell cycle and circadian rhythms, as well as in the regulation of diverse essential developmental processes, including 1 This work was supported by the Generalitat de Catalunya (Xarxa de Refere `ncia en Biotecnologia and Grup de Recerca Consolidat) and the Spanish Ministry of Science and Innovation (MICINN)-Fondo Europeo de Desarrollo Regional (grant no. BIO2008–00169 to J.F.M.-G.), by the Spanish Ministry of Education and Science (MEC) and the European Social Fund (Juan de la Cierva program grant to L.C.-P. and J.R.B.-C.), and by the Spanish MEC (Formacio ´n Profesorado Universitario program) and MICINN (For- macio ´n Personal Investigador program; predoctoral fellowships to A.G. and I.R.-V., respectively). 2 Present address: Dipartimento di Scienze Biomolecolari e Bio- tecnologie, Universita ` degli Studi di Milano, Via Celoria 26, 20133 Milan, Italy. * Corresponding author; e-mail [email protected]. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Lorenzo Carretero-Paulet ([email protected]). [W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.110.153593 1398 Plant Physiology Ò , July 2010, Vol. 153, pp. 1398–1412, www.plantphysiol.org Ó 2010 American Society of Plant Biologists www.plantphysiol.org on June 12, 2018 - Published by Downloaded from Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Upload: vuongthuan

Post on 29-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

Genome-Wide Classification and Evolutionary Analysisof the bHLH Family of Transcription Factors inArabidopsis, Poplar, Rice, Moss, and Algae1[W]

Lorenzo Carretero-Paulet*, Anahit Galstyan, Irma Roig-Villanova2, Jaime F. Martınez-Garcıa,Jose R. Bilbao-Castro, and David L. Robertson

Department of Applied Biology (Area of Genetics), University of Almerıa, 04120 Almerıa, Spain (L.C.-P.,J.R.B.-C.); Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, United Kingdom (L.C.-P.,D.L.R.); Department of Plant Molecular Genetics, Centre for Research in Agricultural Genomics, ConsejoSuperior de Investigaciones Cientıficas-Institut de Recerca i Tecnologia Agroalimentaries-UniversitatAutonoma de Barcelona, 08028 Barcelona, Spain (A.G., I.R.-V., J.F.M.-G.); Institucio Catalana de Recerca iEstudis Avancats, 08010 Barcelona, Spain (J.F.M.-G.); and Biocomputing Unit, National Centre ofBiotechnology, Universidad Autonoma de Madrid, 28049 Madrid, Spain (J.R.B.-C.)

Basic helix-loop-helix proteins (bHLHs) are found throughout the three eukaryotic kingdoms and constitute one of the largestfamilies of transcription factors. A growing number of bHLH proteins have been functionally characterized in plants.However, some of these have not been previously classified. We present here an updated and comprehensive classification ofthe bHLHs encoded by the whole sequenced genomes of Arabidopsis (Arabidopsis thaliana), Populus trichocarpa, Oryza sativa,Physcomitrella patens, and five algae species. We define a plant bHLH consensus motif, which allowed the identification of novelhighly diverged atypical bHLHs. Using yeast two-hybrid assays, we confirm that (1) a highly diverged bHLH has retainedprotein interaction activity and (2) the two most conserved positions in the consensus play an essential role in dimerization.Phylogenetic analysis permitted classification of the 638 bHLH genes identified into 32 subfamilies. Evolutionary andfunctional relationships within subfamilies are supported by intron patterns, predicted DNA-binding motifs, and thearchitecture of conserved protein motifs. Our analyses reveal the origin and evolutionary diversification of plant bHLHsthrough differential expansions, domain shuffling, and extensive sequence divergence. At the functional level, this wouldtranslate into different subfamilies evolving specific DNA-binding and protein interaction activities as well as differentialtranscriptional regulatory roles. Our results suggest a role for bHLH proteins in generating plant phenotypic diversity andprovide a solid framework for further investigations into the role carried out in the transcriptional regulation of key growthand developmental processes.

Most biological processes in a eukaryotic cell ororganism are finely controlled at the transcriptionallevel by transcription factors. Transcription factors

usually contain two different functional domains in-volved in DNA binding and protein dimerization,activities thatmaybe regulated by severalmechanisms,including differential dimer formation (Riechmannet al., 2000; Amoutzias et al., 2007). In addition, tran-scription factors are usually encoded by multigenefamilies, multiplying the number and complexity ofpossible transcriptional regulatory roles (Riechmannet al., 2000).

Basic helix-loop-helix proteins (bHLHs) are widelydistributed in all three eukaryotic kingdoms and con-stitute one of the largest families of transcriptionfactors (Riechmann et al., 2000; Ledent and Vervoort,2001). bHLHs represent key regulatory components intranscriptional networks controlling a number of bio-logical processes. In unicellular eukaryotes, such asyeast, bHLH proteins are involved in chromosomesegregation, general transcriptional enhancement, andmetabolism regulation (Robinson and Lopes, 2000). Inanimals, bHLHs have been involved in sensing envi-ronmental signals, in regulating the cell cycle andcircadian rhythms, as well as in the regulation ofdiverse essential developmental processes, including

1 This work was supported by the Generalitat de Catalunya(Xarxa de Referencia en Biotecnologia and Grup de RecercaConsolidat) and the Spanish Ministry of Science and Innovation(MICINN)-Fondo Europeo de Desarrollo Regional (grant no.BIO2008–00169 to J.F.M.-G.), by the Spanish Ministry of Educationand Science (MEC) and the European Social Fund (Juan de la Ciervaprogram grant to L.C.-P. and J.R.B.-C.), and by the Spanish MEC(Formacion Profesorado Universitario program) and MICINN (For-macion Personal Investigador program; predoctoral fellowships toA.G. and I.R.-V., respectively).

2 Present address: Dipartimento di Scienze Biomolecolari e Bio-tecnologie, Universita degli Studi di Milano, Via Celoria 26, 20133Milan, Italy.

* Corresponding author; e-mail [email protected] author responsible for distribution of materials integral to the

findings presented in this article in accordance with the policydescribed in the Instructions for Authors (www.plantphysiol.org) is:Lorenzo Carretero-Paulet ([email protected]).

[W] The online version of this article contains Web-only data.www.plantphysiol.org/cgi/doi/10.1104/pp.110.153593

1398 Plant Physiology�, July 2010, Vol. 153, pp. 1398–1412, www.plantphysiol.org � 2010 American Society of Plant Biologists www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 2: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

neurogenesis, myogenesis, sex and cell lineage deter-mination, proliferation, and differentiation (Atchleyand Fitch, 1997; Ledent and Vervoort, 2001; Amoutziaset al., 2004; Stevens et al., 2008). The R gene product Lcwas the first plant protein reported to possess a bHLHdomain and is involved in the control of flavonoid/anthocyanin biosynthesis in maize (Zea mays; Ludwiget al., 1989). The R gene belongs to a small subfamilycomprising three additional genes (R, B, and Sn)for which the corresponding orthologs have beenreported in Arabidopsis (Arabidopsis thaliana; AtTT8)and rice (Oryza sativa; OsRa-c; Hu et al., 1996; Nesiet al., 2000).The number of characterized plant bHLHs has in-

creased in recent years, revealing the wide and diversearray of biological processes in which they are in-volved. They have been reported to function in lightsignaling (Ni et al., 1998; Halliday et al., 1999; Fairchildet al., 2000; Huq and Quail, 2002; Khanna et al., 2004;Oh et al., 2004; Hyun and Lee, 2006; Roig-Villanova et al.,2007; Leivar et al., 2008), hormone signaling (Abe et al.,1997; Friedrichsen et al., 2002; Yin et al., 2005; Leeet al., 2006), wound and drought stress responses (dePater et al., 1997; Smolen et al., 2002; Chinnusamy et al.,2003; Kiribuchi et al., 2004), symbiotic ammonium trans-port (Kaiser et al., 1998), shoot branching (Komatsuet al., 2001), fruit and flower development (Rajani andSundaresan, 2001; Liljegren et al., 2004; Szecsi et al., 2006;Zhang et al., 2006; Gremski et al., 2007), and microspore(Sorensen et al., 2003), trichome (Payne et al., 2000;Morohashi et al., 2007), stomata (Pillitteri et al., 2007;Kanaoka et al., 2008), and root (Menand et al., 2007;Ohashi-Ito and Bergmann, 2007) development.These proteins are defined by the bHLH signature

domain (Ferre-D’Amare et al., 1993), which is com-posed of approximately 60 amino acids arrangedaccording to the typical bifunctional structure. Thebasic region, an N-terminal stretch of approximately15 to 20 residues typically rich in basic amino acids, isinvolved in DNA binding. Certain conserved aminoacids in the basic region determine recognition tothe so-called core E-box hexanucleotide consensussequence 5#-CANNTG-3#, whereas other residueswould provide specificity for a given type of E-box(e.g. the G-box [5#-CACGTG-3#]). In addition, flank-ing nucleotides outside the core have also been shownto play a role in binding specificity (Shimizu et al.,1997; Atchley et al., 1999; Martinez-Garcia et al., 2000;Massari and Murre, 2000). The HLH region is com-posed of two amphipathic a-helices mainly consistingof hydrophobic residues linked by a more diverged(both in length and primary sequence) loop region.The HLH domain promotes protein-protein interac-tion, allowing the formation of homodimeric or het-erodimeric complexes (Massari and Murre, 2000).Cocrystal structural analysis has shown the interac-tion between the HLH regions of two bHLH proteinsand that each partner binds to half of the DNArecognition sequence (Ma et al., 1994; Shimizu et al.,1997).

Outside the bHLH domain, bHLH proteins usuallyexhibit low, if any, sequence conservation. However,groups of evolutionary and/or functionally relatedbHLH proteins may share additional motifs. Some ofthese have been characterized in animals to determinespecificity in DNA-binding sequence recognition anddimerization activities, as responsible for the activa-tion or repression of target genes or for the binding tosmall molecules (e.g. dioxin; Ledent and Vervoort,2001). One example is provided by the highly con-served Leu zipper (ZIP) motif characterized by heptadrepeats of Leu residues adjacent to the second helix ofthe bHLH domain and predicted to adopt a coiled-coilstructure that permits dimerization between proteins(Lupas, 1996). Other domains commonly found inanimal bHLH proteins are the PAS domain, the Or-ange domain, the WRPW motif, and the COE domain(Ledent and Vervoort, 2001; Stevens et al., 2008).

Previous classifications of animal bHLHs haveled to the definition of six major functional and evo-lutionary lineages (groups A–F; Atchley and Fitch,1997; Ledent and Vervoort, 2001) that can be furthersubdivided into smaller orthologous subfamilies(Simionato et al., 2007). Most bHLH proteins areclassified as group A or B and are expected to bindthe core E-box consensus sequences. Group B includesmembers specifically displaying a G-box-binding mo-tif configuration and proteins that share a ZIP domainat the COOH-terminal end of the protein or thatcontain the Orange domain. Group C bHLH proteinsshare a pair of PAS domains and bind non E-boxsequences. Group E includes bHLH proteins thatcontain a conserved Pro or Gly residue at a keyposition within the basic region, preferentially bind tosequences referred to as N-boxes, and share an addi-tional WRPW motif. Groups D and F represent pro-teins particularly diverged at the basic region. Somegroup D proteins, described as unable to bind DNA,might form heterodimers that function as dominant-negative regulators of DNA-binding activity of other-wise DNA-binding bHLHs (Fairman et al., 1993).Group F includes the so-called COE proteins, whichshare the COE domain. It has been suggested thatthe ancestral bHLH sequence was a group B proteinpresent in early eukaryote evolution, from whichbHLHs from different lineages evolved independently(Ledent and Vervoort, 2001; Heim et al., 2003).

Previous classifications of the family of bHLH pro-teins encoded by the Arabidopsis and rice genomes(Heim et al., 2003; Toledo-Ortiz et al., 2003; Li et al.,2006b) are essentially based on a bHLH consensusmotif constructed from the alignment of 392 sequencesmostly from groups A and B of animal DNA-bindingbHLHs (Atchley et al., 1999). The consensus wasexpected to identify bHLH domain-containing pro-teins with a high degree of accuracy. However, highlydiverged bHLH proteins are poorly predicted from theconsensus (Atchley et al., 1999), and recent studieshave identified and characterized novel atypicalbHLHs in Arabidopsis (Fairchild et al., 2000; Hyun

Genome-Wide Analysis of Plant bHLHs

Plant Physiol. Vol. 153, 2010 1399 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 3: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

and Lee, 2006; Lee et al., 2006; Roig-Villanova et al.,2007). They were particularly diverged at the basicregion and usually lacked sequence features charac-terized as critical for proper DNA binding (Massariand Murre, 2000).

Functional diversification in gene families encodingtranscription factors is emerging as a major source ofmorphological and physiological diversity underlyingevolution (Doebley and Lukens, 1998; Riechmannet al., 2000; Tsiantis and Hay, 2003; Kellogg, 2004).We present here a comprehensive classification to-gether with a structural and evolutionary analysis ofthe plant bHLH gene family. This analysis was per-formed at a genome-wide level across distantly relatedland plant evolutionary lineages, including threeangiosperms, Arabidopsis (eudicot-eurosids II), Popu-lus trichocarpa (poplar; eudicot-eurosids I), and rice(monocot), as well as Physcomitrella patens (moss;bryophyte; Arabidopsis Genome Initiative, 2000; In-ternational Rice Genome Sequencing Project, 2005;Tuskan et al., 2006; Rensing et al., 2008). In terms ofevolution, moss can be considered as a basal speciesfor land plants and, therefore, might enable inference ofthe ancestral state of the land plant bHLH family(Kenrick and Crane, 1997; Karol et al., 2001). Further-more, to have a broader perspective into the earlyevolutionary history of the plant bHLH family, wealso searched for bHLH genes in five algal species,including four green algae species (Volvox carteri, Chla-mydomonas reinhardtii, Ostreococcus tauri, Ostreococcuslucimarinus), which diverged from the land plants over1 billion years ago, and the primitive red alga Cyani-dioschyzon merolae (Matsuzaki et al., 2004; Merchantet al., 2007; Palenik et al., 2007). This is a first steptoward further investigations into the biological andmolecular functions of novel bHLH transcription fac-tors as well as into their role in plant evolutionarydiversification.

RESULTS

Identification and Classification of Arabidopsis, Poplar,Rice, Moss, and Algae bHLH Gene Families

Previous surveys of Arabidopsis and rice bHLHgene families had identified 162 and 167 members,respectively (Bailey et al., 2003; Li et al., 2006b). Allbut seven of these sequences encoded for proteinsannotated as matching the INTERPRO 001092 do-main, corresponding to the dimerization region of thebHLH domain. To define the bHLH gene familiesfrom poplar, moss, V. carteri, C. reinhardtii, O. tauri,O. lucimarinus, and C. merolae, we searched through thecorresponding whole sequenced genomes for genesencoding proteins containing the INTERPRO 001092domain. The resulting sequences were named follow-ing the generic system proposed for Arabidopsis(Heim et al., 2003), discarding the “bHLH.” Namesare composed of a number, corresponding to the rela-

tive position resulting from searches for the bHLHdomain, followed by the most common name as re-trieved in the literature. Correspondences of sequencenames with gene and protein identifiers from thecorresponding genome browsers are shown in Table Iand Supplemental Table S2.

In recent years, novel atypical Arabidopsis bHLHproteins, most of them not identified as such inprevious surveys, have been reported: At163KDR,At164PRE5, At165PAR1, and At166PAR2 (Hyun andLee, 2006; Lee et al., 2006; Roig-Villanova et al., 2007).Another group of putative novel bHLH sequenceswere identified in microarray analysis as down-regulated in At165PAR1 constitutively overexpressing

Table I. Summary of novel atypical sequences accepted anddiscarded as bHLHs

Protein sequences newly identified in this study putatively corre-sponding to bHLHs are in boldface. Sequences were accepted ordiscarded as bHLHs according to their fit to the animal and the plantbHLH consensus used as predictive motifs. TAIR, The ArabidopsisInformation Resource; TIGR, The Institute for Genomic Research.

Sequence Name TAIR/TIGR/JGI Gene Identifier

Novel atypical bHLH sequencesAt163KDR At1g26945At164PRE5 At3g28857At165PAR1 At2g42870At166PAR2 At3g58850At167P1R1 At5g57780At168P1R3 At3g29370At169 At5g39240At170 At2g18969Os168 LOC_Os02g54870Os169 LOC_Os01g43950Os170 LOC_Os02g51320Os171 LOC_Os08g16030Os172 LOC_Os06g12210Os173 LOC_Os10g26460Os174 LOC_Os10g26410Os175 LOC_Os04g56500Os176 LOC_Os03g19780Os177 LOC_Os08g31950Os178 LOC_Os07g48900Pt183 Eugene3.00061353Pt184 eugene3.00180893Pt185 eugene3.00002147Pt186 estExt_fgenesh4_pg.C_LG_XIV0893Pt187 grail3.0003051401Pt190 fgenesh4_pg.C_LG_XVIII000779

Sequences discardedAt111 At1g31050At133 At2g20100At152 At1g22380Pt032 grail3.1832000301Pt090 eugene3.00051537Pt102 grail3.0033027401Pt122 eugene3.00040401Pt170 fgenesh4_pg.C_LG_IX000768Pt188 grail3.0139003601Pt189 eugene3.00170483OlbHLH2 eugene. 1400010176CrbHLH2 pasa_Sanger_mRNA29676|Chlre4

Carretero-Paulet et al.

1400 Plant Physiol. Vol. 153, 2010 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 4: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

plants, designated in this work as P1R1 (for PAR1-RESPONSIVE1), P1R2, and P1R3 (corresponding toAt167P1R1, At159P1R2, and At168P1R3, respectively).From the latter, only At159P1R2 had been previouslyclassified as a member of the bHLH family. With theaim of identifying additional putative homologs tothese novel bHLH proteins in different plant species,we implemented a BLAST-HMM (for hidden Markovmodel)-based combined search strategy.BLAST searches were performed using the novel

atypical Arabidopsis bHLHs as queries. In each case, alarge number of hits were obtained, mostly corre-sponding to proteins annotated as containing thebHLH domain. However, among the best-scoringmatches, 19 Arabidopsis, poplar, and rice sequencesnot previously annotated as bHLHs were also re-trieved. These sequences were subsequently aligned,and the resulting alignments were used as a seed togenerate HMM profiles. The HMM profiles were inturn used as queries in searches against selected plantproteome databases, resulting in the identification ofeight additional matches (Table I).The 27 putative novel bHLH sequences were com-

bined with the previous estimates of bHLH families,resulting in a primary data set of 650 amino acidsequences putatively corresponding to bHLH do-mains. On the basis of the corresponding alignment,a consensus motif composed of the 25 most conservedpositions was obtained, 11 of them corresponding tokey functional residues also conserved in a consensuspreviously defined from animal bHLHs (Atchley et al.,1999; Fig. 1). Some positions specific to the plant bHLHconsensus were occupied by highly conserved aminoacids, including R16 at the basic region and P32 at theend of the helix 1 region. Furthermore, amino acidfrequencies at some of the positions common to bothplant and animal bHLH consensus were sharply differ-ent (Supplemental Table S1). These differences underliethe early divergence between animal and plant bHLHs.To confirm our data set of amino acid sequences as

bHLHs, we examined the fit of every sequence to bothconsensus motifs by counting the number of matchesat each region of the predicted bHLH domain (Sup-plemental Table S2). In previous works, sequenceswith more than eight to 10 mismatches from theanimal bHLH consensus motif were discarded (Buck

and Atchley, 2003; Heim et al., 2003; Toledo-Ortiz et al.,2003; Li et al., 2006b). To ensure that atypical bHLHdomains were not eliminated by lack of correspon-dence to the consensus, we used a low stringentcriterion by allowing 10 and 13 mismatches from theanimal and plant bHLH consensus, respectively.

From the whole data set of putative bHLH se-quences, only 13 sequences did not match any bHLHconsensus and were eliminated from further analysis(Table I). This criterion was relaxed for At168P1R3,identified in our phylogenetic analysis as a recentparalog of At169. The remaining 638 sequences, rep-resenting an updated classification of bHLH familiesin the species examined in this study, are shown inSupplemental Table S2.

Dimerization Activities of Atypical bHLH Proteins

As away to evaluate the accuracy of our searches foratypical bHLHs, we tested dimerization activity ofAtPAR1 by performing yeast two-hybrid assays. Asshown in Figure 2A, the GAL4 activation domain (AD)fused to AtPAR1 interacts strongly with the GAL4binding domain (BD) fused to AtPAR1, revealing theability of AtPAR1 to specifically interact with itself.Therefore, we conclude that AtPAR1 has retainedprotein interaction activity. Together with previousresults demonstrating that nuclear localization is re-quired for AtPAR1 function as a direct transcriptionalrepressor of specific targets (Roig-Villanova et al.,2007), it supports our analyses including it as an actualbHLH.

Conserved hydrophobic residues in the HLH regionof the animal bHLH domain presumably define pro-tein interaction activities (Massari and Murre, 2000).Leu-27 and Leu-73 of helix 1 and 2, respectively, havebeen identified as the most highly conserved residuesacross plant bHLHs (Fig. 1; Supplemental Table S1).Furthermore, most of the amino acid changes in thesepositions were conservative (Supplemental Fig. S1). Totest whether these residues played a role in dimeriza-tion activities of plant atypical bHLHs, two mutatedversions of AtPAR1, PAR1-L1mut (Leu-27Glu) andPAR1-L2mut (Leu-73Lys), were generated. WhenPAR1-L1mut and PAR1-L2mut were fused to the ADand tested against wild-type BD-PAR1, yeast growth

Figure 1. Plant and animal bHLH consensus. Alignment of the plant and animal bHLH consensus used as predictive motifs. Theplant consensus is based on an alignment of plant bHLHs and contains positions conserved in more than 50% of the sequences.In such positions, amino acids conserved in more than 10% of the sequences were also included. The animal consensus is basedon Atchley et al. (1999). Shown at the bottom are the boundaries of the different regions of the bHLH domain.

Genome-Wide Analysis of Plant bHLHs

Plant Physiol. Vol. 153, 2010 1401 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 5: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

was clearly affected, indicating that the interactionwas greatly reduced (PAR1-L2mut) or completelyabolished (PAR1-L1mut; Fig. 2B).

Phylogenetic Analysis of Plant bHLHs

To examine the evolutionary relationships amongplant bHLH proteins, a maximum likelihood (ML)phylogenetic analysis based on the alignment of thecorresponding bHLH domains (Supplemental Fig. S1)was carried out. The 638 plant bHLH proteins could beclassified into 32 subfamilies identified as clades withhigh support values (Fig. 3; Supplemental Fig. S2). Asummary of information of bHLH proteins groupedinto their respective subfamilies is shown in Supple-mental Table S3. Our analysis was robust to the align-ment method employed, as almost every sequenceclustered similarly in MUSCLE and MAFFT-basedanalysis (data not shown). Furthermore, tree topologyresulting from neighbor joining (NJ) and maximumparsimony (MP) analyses was essentially the same,most of the subfamilies being retrieved (Supplemen-tal Fig. S2). Most plant bHLH subfamilies identifiedin a recent survey (Pires and Dolan, 2010) were alsodetected in our analysis. Newly identified atypicalbHLHs either formed new subfamilies (subfamilies18–22) or grouped within previously defined subfam-ilies (subfamily 16).

We found 18 sequences that were not members ofany of the identified subfamilies or showed ambigu-ous clustering between different phylogenetic trees. Inan attempt to solve their evolutionary relationshipswith defined plant bHLH subfamilies, a Bayesiananalysis (BA) was performed on a restricted data setof the original alignment, which also included repre-sentatives from the 32 subfamilies. From the resultingtree, three additional sequences were classified withinmany other subfamilies (Supplemental Fig. S3). Theremaining 15 sequences were considered as orphans,most likely representing highly diverged lineage-spe-cific bHLH sequences or our phylogenetic analysiscould not resolve their evolutionary relationships. Astypically observed in bHLH protein phylogenies, deep

nodes, those determining interclade relationships,commonly showed low statistical support and variedbetween different phylogenetic methods, likely reflect-ing the large number of sequences being examined, thehigh divergence of the motif combined with its shortlength, and the occurrence of many ancient paralogs(Atchley and Fitch, 1997). Although beyond the scopeof this work, the BA tree also provided some prelim-inary insights into the deep evolutionary history of theplant bHLH domain.

Our phylogenetic analysis permitted the estimationof the number of ancestral bHLH genes in the mostrecent common ancestors (MRCA) of plants (Namet al., 2004). For instance, assuming that shared clades,composed of ortholog sequences from the four landplant species examined, are descendants of an ances-tral bHLH gene, we obtained aminimum estimate of 14bHLH genes in the hypothetical MRCA of land plants(Fig. 4). However, this number could represent anunderestimate if we assume that the four additionalsubfamilies including moss representatives, as well asthe 13 orphan genes found in land plants, representdivergent members of additional ancestral familieslost in specific plant lineages. Assuming the latter, weobtained a maximum estimate of 31 bHLH genes in theMRCA of land plants (Fig. 4). The actual number ofbHLH genes will range between these two values andwill be dependent on the prevalence of gene duplica-tion or loss in specific evolutionary lineages. A similarapproach was performed to get estimates of the num-ber of genes in the MRCA of eudicots and monocots aswell as of eurosids I and eurosids II (Fig. 4). Interest-ingly, we found chlorophyte representatives in sub-families 4 and 14, likely representing the descendantsof ancestral green plant bHLH genes. Cr7 also tends tocluster at the base of subfamily 4 (Supplemental Figs.S2 and S3), although with lower support. The rest ofchlorophyte bHLHs clustered in subfamily 32 at thebase of the tree. The single representative from C.merolae did not group into any of the subfamilies,suggesting that plant bHLH subfamilies evolved afterdivergence of red algae from other photosyntheticeukaryotes 1.5 billion years ago (Yoon et al., 2004).

Figure 2. Yeast two-hybrid analysis of AtPAR1 protein interaction activities. A, Homodimerization activity of wild-type AtPAR1.B, Homodimerization activity of two mutated versions of AtPAR1, L1mut (Leu-27Glu) and L2mut (Leu-73Lys). SD-LT refers to theselective medium for transformed yeast cells, and SD-AHLT refers to the selective medium to perform the growth assay indicativeof protein-protein interaction. Numbers refer to the combinations of BD and AD yeast constructs used in each section, asindicated in the right panels. All transformations within a section were done simultaneously. Cotransformations were repeated atleast twice with identical results.

Carretero-Paulet et al.

1402 Plant Physiol. Vol. 153, 2010 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 6: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

Sequence and Structural Analysis Provide FurtherSupport to Plant bHLH Subfamilies Definition

Intron/Exon Structure within the bHLH Domains

We analyzed the intron pattern, including introndistribution, positions, and phases over genomic re-gions encoding for the bHLH domains. Approxi-mately 20% of bHLH genes had no introns at thebHLH coding region (Fig. 5, pattern k). The rest of thegenes had up to three introns that, according to rela-tive positions and phases, could be arranged into 21

different splicing patterns. Patterns a to g, composedof one to three introns distributed at three highlyconserved specific positions, accounted for approxi-mately 72% of bHLH genes. As previously observed inArabidopsis and rice bHLH genes (Toledo-Ortiz et al.,2003; Li et al., 2006b), patterns a and f were found to bethe most common ones also in poplar and moss bHLHgenes but were not found in algae (Fig. 5). Theremaining bHLH genes have introns at positions dif-ferent from the rest of the family, forming patterns h tol as well as nine additional patterns exclusive of single

Figure 3. Phylogenetic relationships, intron pattern, DNA-binding motifs, and architecture of conserved protein motifs in 32plant bHLH subfamilies. A, ML tree of 638 plant bHLH proteins (for the full representation of the tree, see Supplemental Fig. S2).The tree has been rooted using the single representative from C. merolae. Subfamilies are represented collapsed as triangles(except for subfamilies 5, 12, and 24), with both depth and width proportional to sequence divergence and size, respectively.Subfamilies supported by bootstrap values greater than 50 in NJ or MP analysis are colored black. Subfamilies 5, 12, and 24,highlightedwith gray shading, were ambiguously retrieved in NJ, MP, and BA trees. Orphan genes are represented as single lines.The tree is drawn to scale, with branch lengths proportional to evolutionary distances between nodes. The scale bar indicates theestimated number of amino acid replacements per site. B, Summary of information of 32 plant bHLH subfamilies. PredictedDNA-binding motifs are as follows: I, E non G; II, G binder; III, non E binder; IV, E-box; V, G-box; VI, non DNA binder. For intronpattern designations, see Figure 5. C, Architecture of protein conserved motifs. Motifs are graphically represented as white boxesdrawn to scale for a representative plant bHLH protein of each subfamily. Motifs matching regions of the bHLH domain arecolored gray.

Genome-Wide Analysis of Plant bHLHs

Plant Physiol. Vol. 153, 2010 1403 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 7: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

bHLH genes. It is interesting that intron pattern distri-bution was almost absolutely conserved within mostsubfamilies, providing an independent criterion fortesting the reliability of our phylogenetic analysis (Fig.3B). An interesting exception is provided by the greenplant ancestral subfamily 4, which clustered represen-tatives of intron patterns i and j.

Figure 5 also shows, in each case, the position ofsplicing with respect to codon (i.e. the intron phase).An intron was designated as occurring in one of threephases, phase 0, 1, or 2, depending on whether thesplicing occurred between codons, after the first nu-cleotide, or after the second nucleotide of the codon,respectively. Among the 887 introns analyzed here, agreat majority (840) had phase 0, whereas only 15 and32 had phases 1 and 2, respectively. Among phase 0introns, we found all introns from patterns a to g.

Predicted DNA-Binding Properties

By examining the amino acid sequence at the basicregion of the bHLH domain, plant bHLH proteinscould be classified into different DNA-binding groups.The distribution of the different predicted DNA-binding categories was represented across the bHLHphylogenetic tree, revealing that most subfamiliesshare predicted DNA-binding properties (Fig. 3B).

bHLH domains with at least five basic amino acidsat the basic region are expected to bind DNA (Massariand Murre, 2000). A larger group composed of 471plant bHLH proteins was found to fit this criterion(Table II). DNA-binder bHLHs can be further subdi-vided into additional DNA-binding categories. Ac-cording to three-dimensional structural analysis ofbHLH proteins, Glu-13 and Arg-16 have been reportedto be essential in E-box-binding recognition (Fig. 1;Ferre-D’Amare et al., 1994; Shimizu et al., 1997). ThisE-box-binding recognition motif has been identified in

359 plant bHLH proteins. Seven more sequences havethe conservative amino acid change Arg-16Lys, whichhas been shown not to interfere with E-box binding(Hua et al., 1993). Moreover, three additional residuesat the basic region, His/Lys-9, Glu-13, and Arg-17,provide DNA-binding specificity for a specific type ofE-box, the so-called G-box (Ferre-D’Amare et al., 1994;Shimizu et al., 1997). Eighty-six of the 366 E-box DNA-binder bHLHs lacked the G-box recognition motif, therest (280) being classified as G-box DNA binders(Table II). The remaining 105 bHLHs, lacking residuesdefining E-box-binding recognition specificities buthaving more than five basic amino acids at the basicregion, were classified as non E-box DNA binders.

A total of 167 out of 638 plant bHLH proteins lacked abasic region and were tentatively predicted to be nonDNA binders. However, a subset of these sequencesdisplayed the E-box-binding (seven) andG-box-binding(66) recognition motifs and, in some cases, groupedwithin subfamilies mostly composed of DNA-binderbHLHs (Fig. 3B). It remains to be determined whetherthese sequences have retained DNA-binding activity inspite of their low basic region.

Some bHLH sequences displayed a significantlyhigher frequency of specific amino acids. For instance,subfamily 23 grouped several non E-binder bHLHsequences displaying up to four Pro residues at thebasic region (Supplemental Table S3). The presence ofPro residues in the basic region has been claimed toindicate a differential positioning with respect to theDNA as a result of modified folding (Toledo-Ortizet al., 2003). Moreover, in most non DNA-binderbHLHs, basic residues at the basic region have beenreplaced by specific amino acids such as Ser (e.g.subfamilies 16 and 17), Gly (e.g. subfamily 22), or evenacidic amino acids (e.g. subfamily 21). The functionalsignificance of such specific amino acid replacementsat the basic region is yet unknown.

Architecture of Conserved Protein Motifs

A search for conserved motifs in plant bHLH pro-teins identified 50 motifs of variable length (8–80amino acids; Supplemental Table S4). In most cases,protein architecture is remarkably conserved withinspecific subfamilies, giving further support to the phy-logenetic analysis based on bHLH domains (Fig. 3C).

Motifs 1 and 2 were identified as the helix 2 andhelix 1 regions of the bHLH domain, respectively, inalmost every bHLH protein sequence analyzed. Thebasic and loop regions of the bHLH domain appear tobe less conserved; consequently, no single motif wasdetected matching these regions across plant bHLHs.By contrast, some specific motifs were identified asmatching the basic and loop regions of specific sub-families (Fig. 3C).

Outside the bHLH domain, some subfamily-specificmotifs had been previously characterized as definingadditional functional properties. For instance, motif 9,observed in all members of subfamily 4, was unam-

Figure 4. Evolution of bHLH gene family size in plants. Estimates ofbHLH gene family size in the MRCA of examined plant species arerepresented at the corresponding nodes of a tree depicting theirevolutionary relationships. Numbers correspond to minimum andmaximum estimates. Branch lengths are proportional to evolutionarydivergence time, according to previous estimates (Chaw et al., 2004;Yoon et al., 2004; Tuskan et al., 2006; Merchant et al., 2007; Rensinget al., 2008). The scale bar represents millions of years ago. The numberof bHLH genes (subfamilies) identified in extant species is indicated forArabidopsis (At), poplar (Pt), rice (Os), moss (Pp), four chlorophytespecies (Ch), and C. merolae (Cm).

Carretero-Paulet et al.

1404 Plant Physiol. Vol. 153, 2010 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 8: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

biguously identified as a ZIP dimerization domain.Motif 6, shared by members of subfamily 23, has beencharacterized in AtLHW as necessary for homodi-merization (Ohashi-Ito and Bergmann, 2007). Motif 14,identified in subfamilies 2, 5, and 23, corresponds to amotif previously identified in AtMYC3/ATR2. A con-served Asp residue in this region has been reported tobe functionally important for correct expression ofseveral downstream genes acting in the Trp biosyn-thesis pathway (Smolen et al., 2002). Motif 44, con-served among phytochrome-interacting members ofsubfamily 24, has been characterized as providing aphytochrome B-specific recognition module (Khannaet al., 2004). Finally, motifs 7, 4, and 19 have beenreported to form the highly conserved C-terminaldomain of AtSPCH, AtMUTE, and AtFMA, groupedwithin subfamily 10. Although the biological role ofthis domain is still uncertain, its overexpression leadsto a weak partial reversion of the fama mutant stomataphenotype (Pillitteri et al., 2007).

To gain further insights into the origin and mode ofevolution of bHLH motifs, we examined their distri-bution across species as well as their spatial locationsacross bHLH proteins. Most conserved motifs werealready present in the ancestor of land plants, as all butmotif 6 were identified in moss bHLH proteins. Apartfrom motifs 1 and 2, only four conserved motifs werealso found in Cm1, while a total of 19 out of 50 motifswere detected in chlorophyte bHLH proteins. The ZIPmotif (9) was the only one to have been found outsideplants. However, no similarities were found betweensubfamily 30 of plant bHLH-ZIP proteins and animalbHLH-ZIP proteins, and previous works supportedthe independent acquisition of the motif multipletimes during plant and animal evolution (Atchleyand Fitch, 1997; Morgenstern and Atchley, 1999; Piresand Dolan, 2010). The bHLH domain itself provides aninteresting example of variation in the relative spatiallocation. In specific subfamilies, the bHLH domainis located at the NH2-terminal, middle, or COOH-

Figure 5. Intron patterns within the bHLH domains. Alignment of bHLH domains representative of 11 intron patterns, namedfrom a to l. The ? indicates nine additional gene-specific intron patterns. Locations of introns are indicated by triangles, and thenumber within the triangle corresponds to the intron phase. The number of bHLHs displaying each pattern in Arabidopsis (At),poplar (Pt), rice (Os), moss (Pp), and algae is given in the table at right of the alignment.

Table II. Classification of plant bHLHs according to the presence of DNA-binding motifs in the basic region of the bHLH domain

Species

DNA-Binding Motif

.5 Basic Amino Acids ,5 Basic Amino Acids

E13, R/K16 H/K9, E13, R/K16, R17 Not Defined E13, R/K16 H/K9, E13, R/K16, R17 Not Defined

E Non G G Binder Non E Binder E-Box G-Box Non DNA Binder

Arabidopsis 20 78 35 3 11 20Poplar 29 62 27 2 19 44Rice 31 75 32 2 18 19Moss 3 60 10 0 16 9Algae 3 5 1 0 2 2Total 86 280 105 7 66 94

Genome-Wide Analysis of Plant bHLHs

Plant Physiol. Vol. 153, 2010 1405 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 9: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

terminal region of the protein (Fig. 3C). In addition,motifs 12, 22, 26, 35, 39, 45, and 48 also showed spatialvariation relative to the bHLH domain.

DISCUSSION

Comparative studies of bHLH gene numbers indifferent plant species show a gradual increase in thenumber of bHLHs from algae to flowering plants (Fig.4), which correlates with increasing organism com-plexity (Richardt et al., 2007). Although the loss ofancestral bHLH genes in specific lineages cannot beruled out, it is unlikely that gene loss explains the

observed pattern. Our results, more likely, supportevolutionary diversification of the bHLH familythrough extensive expansion at key milestones duringplant evolution, a pattern similar to that observed inanimal bHLHs (Amoutzias et al., 2004; Simionatoet al., 2007).

According to our analysis, two subfamilies (4 and14) might configure the set of bHLH transcriptionalregulatory networks ancestral to the green plant line-age. However, the most important expansion in thebHLH family occurred after the split between greenalgae and land plant species. This led to the establish-ment of most of the diversity of DNA-binding motifs,intron patterns, and protein motifs of plant bHLH

Table III. Summary of functionally characterized bHLHs from plant species examined in this study classified by bHLH subfamilies

Single, double, and triple asterisks in subfamily numbers indicate angiosperm, green plant, and dicot shared subfamilies, respectively.

Subfamily Reported Members Biological Function

1* At021AMS, At022DYT1, At029FRU, At033SCRM,At116ICE Os005TD, Os006RERJ

Response to freezing and chilling, guard mother celldifferentiation, flower development, response to ironion, response to cytokinin and jasmonic acid stimulus,microspore development, tapetal layer and antherdevelopment

2* At004MYC4, At005MYC3, At006MYC, At013MYC7,At017AI, Os009MY

Wound, insect, drought, and oxidative stress responses,jasmonic acid and abscisic acid signaling, regulationof anthocyanin metabolism, response to chitin

3* At020NAI1 Endoplasmic reticulum body development, response tofungus

4** At105ILR3, Os062bHLH Metal homeostasis regulation, response to auxinstimulus, stress response, seed development

5*** At001GI3, At002EGL, At012MYC, At042TT8,Os013OSB1, Os016OSB2

Regulation of flavonoid/anthocyanin metabolism,trichome initiation, (epidermal) cell fate specification

10* At045MUTE, At097FM, At098SPC, Os051FM,Os053SPC1, Os054SPC, Os055MUTE

Stomatal complex development

11* At095ZOU Embryonic development12*** At038ORG2, At039ORG3 Response to iron ion, response to salicylic acid stimulus14** At046BIM1, At102BIM, At141BIM3 Brassinosteroid signaling15*** At154ERP Response to ethylene stimulus16*** At134PRE2, At135PRE3, At136PRE1, At161PRE4,

At163KDR, At164PRE5Gibberellic acid-light and gibberellic acid signaling

17*** At142SAC5 Unidimensional cell growth20*** At159P1R2, At167P1R1 Light and auxin signaling, shade avoidance21*** At165PAR1, At166PAR2 Light and auxin signaling, shade avoidance22 At168P1R3 Light and auxin signaling, shade avoidance23*** At155CPu, At156LHW Root development24* At008PIF3, At009PIF, At015PIL At016UN10, At024SPT,

At026HFR1, At065PIL6, At072PIF7, At073ALC,At124PIL1, At132PIL2, Os102BP5

Shade avoidance, light signaling, deetiolation, femalegametophyte development, double fertilizationforming zygote and endosperm, fruit dehiscence,gibberellic acid signaling, regulation of anthocyaninmetabolism, regulation of chlorophyll metabolism,negative gravitropism, regulation of seed germination,regulation of photomorphogenesis

25* At031ZCW, At044BEE1, At050BEE3, At058BEE2,At063CIB1

Brassinosteroid and abscisic acid signaling, floraltransition, petal morphogenesis

26* At059UN12, Os096PTF Female gametophyte development, double fertilizationforming zygote and endosperm, response tophosphate deficiency stress

28* At083RHD6, At086RSL1, Pp94RSL1, Pp96RSL2 Root hair, rhizoid, and caulonemata development31* At037HEC2, At040IND, At043HEC3, At088HEC1,

Os123LAXFlower and fruit development, initiation/maintenance ofaxillary meristems

Orphans At108MEE8 Embryonic development ending in seed dormancy

Carretero-Paulet et al.

1406 Plant Physiol. Vol. 153, 2010 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 10: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

proteins and probably reflects the transition fromaquatic to terrestrial habitats. A similar evolutionaryscenario has also been postulated in a recent analysis(Pires and Dolan, 2010). Other studies conclude thata first evolutionary expansion of the bHLH comple-ment in metazoans and plants might have been re-lated to the acquisition of multicellularity (Ledentand Vervoort, 2001) or even earlier (Simionato et al.,2007). At least in certain green algae lineages, evolu-tionary expansion may have precededmulticellularity,as revealed by the seven bHLH genes found in thesingle-celled C. reinhardtii.A second significant expansion was observed after

the split between moss and vascular plants, as reflectedin the 12 angiosperm-specific subfamilies and thegreater size of 12 land plant ancestral bHLH subfam-ilies in angiosperms (Fig. 3B). This expansion mightreflect the more complex body plan and specializationof vascular and flowering plants (Richardt et al., 2007).Our results support birth-and-death evolution

through repeated gene duplication and eventual lossdriving plant bHLH evolutionary expansion and di-versification (Nei and Rooney, 2005; Zhang et al.,2008). Signatures of birth-and-death evolution areobserved at both the sequence and genomic levels.At the sequence level, this would translate into bHLHsequences showing similar or higher between-speciesdivergence. To examine whether this was the case forthe plant bHLH family, we estimated sequence diver-gence at the amino acid level for the four land plantspecies and C. reinhardtii. As expected, differences insequence divergence appear not to be significant inany comparison at the within-species level and areslightly increased in between-species comparisonswith C. reinhardtii (Supplemental Table S5). Previousstudies on the genome distribution of Arabidopsis andrice bHLH genes supported a prominent role for ge-nome segments and tandem duplication in the expan-sion of this gene family (Heim et al., 2003; Toledo-Ortizet al., 2003; Li et al., 2006b). Similarly, recurrent eventsof single-gene duplication have been inferred to driveanimal bHLH diversification (Amoutzias et al., 2004).Some duplicated genes will accumulate mutations as apseudogene and gradually lose their function. Wehave identified several truncated and apparentlynonexpressed bHLH genes in poplar and moss ge-nomes, likely corresponding to pseudogenes (data notshown), which had been identified in a previoussurvey also in Arabidopsis and rice (Li et al., 2006b).More interestingly, some other duplicated genes re-main in the genomes as differentiated functionallyspecialized genes, providing a source to generateevolutionary novelty in the form of new regulatoryfunctions (Nam et al., 2004; Nei and Rooney, 2005).Regulatory roles of bHLHs are essentially based on

the recognition of a specific hexanucleotide sequencecore at the promoter of target genes (Martinez-Garciaet al., 2000; Massari and Murre, 2000). A prominentrole has been attributed to key residues at the basicregion in discriminating between variants of this

hexanucleotide core motif, allowing the classificationof plant bHLHs into DNA-binding categories. None ofthese DNA-binding categories formed monophyleticgroups, supporting the independent acquisition ofspecific DNA-binding properties at different timesduring plant bHLH gene family evolution (Fig. 3).Moreover, a role for specific amino acids outside thebasic region in conferring additional DNA-bindingspecificity through elements that lie outside of thehexanucleotide core recognition motif cannot be ruledout. Studies of the Drosophila melanogaster bHLH tran-scription factor Deadpan have led to the identificationof a single Lys residue at the loop region whosereplacement severely reduces DNA-binding affinity(Nair and Burley, 2000; Winston et al., 2000). A similarrole might be inferred in plant bHLHs, as this residuehas been identified as highly conserved in 77.4% of thesequences (position 46; Fig. 1). A second position of theloop (position 56; Fig. 1) has also been found to beparticularly conserved, being occupied by an Aspresidue in approximately 65.5% of plant bHLHs.

Most of the novel bHLH genes and subfamiliesidentified, and classified by our analysis, correspondto atypical bHLHs, in which basic residues at the basicregion are commonly replaced by nonbasic aminoacids and are consequently predicted to lack DNA-binding activity (Supplemental Table S3). AtKDR(subfamily 16) and AtPAR1 and AtPAR2 (subfamily21) constitute the first characterized plant bHLHproteins predicted to be non DNA binders. AtKDRhas been reported to negatively regulate AtHFR1(Hyun and Lee, 2006), a bHLH protein that hadbeen previously reported to function as a branchingpoint of phytochrome-dependent signaling responses(Fairchild et al., 2000). Later molecular and overex-pression studies suggested that AtKDR, together witha set of five additional, closely related homologs(AtPRE1–AtPRE5), could play a role in GA-dependentresponses (Lee et al., 2006). AtPAR1 and AtPAR2 act asdirect transcriptional repressors of specific targets dur-ing shade-avoidance responses, including atypicalbHLHs (AtP1R1–AtP1R3) and specific auxin-responsivegenes (Roig-Villanova et al., 2007). Plant atypical bHLHswould act as negative regulators of DNA-bindingbHLHs by forming heterodimers, as reported for IDproteins from group D of animal bHLHs. Consistently,AtKDR has been shown to heterodimerizewith AtHFR1(Hyun and Lee, 2006), and the AtPAR1 HLH domainhas retained protein interaction function (Fig. 2A).However, no sequence similarity has been found be-tween this group of plant bHLHs and groupD of animalbHLHs. Plant atypical bHLHs emerge as a group oftranscriptional regulators playing regulatory roles inplant-specific biological processes, notably, by integrat-ing phytochrome- and hormone-dependent signalingpathways.

Plant bHLH proteins have been reported to dimer-ize with a wide and diverse range of transcriptionalregulators, including members of the bHLH family(Toledo-Ortiz et al., 2003), other transcription factors,

Genome-Wide Analysis of Plant bHLHs

Plant Physiol. Vol. 153, 2010 1407 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 11: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

such as R2R3-MYBs (Goff et al., 1992; Dubos et al.,2008), BZR1-BES1 (Yin et al., 2005), or AP2s (Chandleret al., 2009), signal transduction proteins, such asWD40repeat proteins (Ramsay and Glover, 2005), and epi-genetic regulators of gene expression (Thorstensenet al., 2008). Dimerization activities of bHLH pro-teins allow expanding regulatory roles of bHLH pro-teins by defining additional protein interaction andDNA-binding specificities (Massari and Murre, 2000;Toledo-Ortiz et al., 2003). The HLH region of thebHLH domain is responsible for the dimerizationactivities of bHLH proteins. However, little is knownabout how the specificity of this interaction is defined.Three-dimensional structural analysis of the mam-malian Max protein together with site-directed mu-tagenesis experiments on human E47 and E12characterized two conserved Leu residues at the helix1 and 2 regions, respectively, as essential for dimer-ization (Voronova and Baltimore, 1990; Ferre-D’Amareet al., 1993). Both Leu residues have been identified asthe most conserved residues across plant bHLHs(positions 27 and 73; Fig. 1). Such an essential role indimerization activity would also be conserved in plantbHLHs, as revealed by yeast two-hybrid protein in-teraction assays using two mutated versions at thesepositions of the highly diverged AtPAR1 protein(Fig. 2B).

We observed an excess of phase 0 introns and ofsymmetric exons within the bHLH domain (Fig. 5).This provides an interesting mechanism to explain theexchange of protein motifs, facilitating exon shufflingby avoiding interruptions of the open reading frame.Introns would be inserted (or eventually excised) fromthe bHLH coding region in a subfamily-specific man-ner, in accordance with previous results showing thatnumerous introns have been specifically inserted intoplants and retained in the genome (Rogozin et al.,2003). The scattered distribution through the bHLHphylogeny of pattern k, lacking introns, together withits occurrence in bHLH sequences from algae speciesmight be indicative of its ancestral nature, consistentwith this model.

Most plant bHLH proteins are multidomain pro-teins composed of a set of conserved motifs alreadypresent in the MRCA of land plants. Many motifsconsist of short conserved sequences arranged fol-lowing a mosaic pattern (Fig. 3C). This arrangementmight be mostly explained by modular evolution withdomain shuffling, as suggested in animal bHLHs(Morgenstern and Atchley, 1999). Shuffling of func-tional domains among bHLH proteins, including spe-cific regions of the bHLH domain, would promotefurther functional diversification in specific lineages.

One might anticipate that ortholog bHLH proteinsclosely clustering in a subfamily and sharing similarintron/exon organization, the architecture of proteinmotifs, predicted DNA-binding motifs, and additionalsequence features should have recent common evolu-tionary origins and consequently related molecularand biological functions. However, the extent of func-

tional diversification within specific subfamilies isvariable, ranging from functional redundancy to mem-bers displaying highly diverged specialized functions(Table III).

Such apparent functional redundancy is observed insubfamilies 14 and 25, clustering AtBIM and AtBEEgenes, respectively, involved in brassinosteroid signal-ing (Friedrichsen et al., 2002; Yin et al., 2005). Func-tional specialization may be observed in other plantbHLH subfamilies. AtSPCH, AtMUTE, and AtFMA,members of subfamily 10, have been characterized tocontrol stomatal development at three consecutivesteps: initiation, meristemoid differentiation, andguard cell morphogenesis, respectively (Pillitteriet al., 2007). The corresponding rice orthologs ofsubfamily 10 also provide an interesting example offunctional conservation (Liu et al., 2009). An outstand-ing example of functional diversification is encoun-tered in subfamily 1, which clusters nine plant bHLHgenes involved in very diverse biological roles (TableIII; Chinnusamy et al., 2003; Sorensen et al., 2003;Jakoby et al., 2004; Kiribuchi et al., 2004; Li et al., 2006a;Zhang et al., 2006; Kanaoka et al., 2008).

We found moss orthologs of bHLHs related tobiological processes specific to vascular and floweringplants. The above-mentioned AtBIM and AtBEE genesprovide a first example. Brassinosteroids play a keyrole in the differentiation of vascular tissues (xylemand phloem; Cano-Delgado et al., 2004). Consistently,nonvascular moss is devoid of brassinosteroid biosyn-thetic and signaling pathway genes (Rensing et al.,2008). Interestingly, subfamily 14 also grouped consis-tently chlorophyte representatives Cr1 and Vc3. Sev-eral moss bHLH orthologs also showed clustering insubfamily 24, whose members have been reported fortheir role in phytochrome-dependent photomorpho-genic responses that appeared later in the evolutionarylineage of vascular plants, such as shade avoidanceand seed germination (Ni et al., 1998; Huq and Quail,2002; Yamashino et al., 2003; Oh et al., 2004). Arabi-dopsis AtHEC genes, also grouping within subfamily24, have been shown to work in concert with AtSPT tocoordinately regulate development of the female re-productive tract, probably in an auxin-dependentmanner (Gremski et al., 2007). An interesting questionfor future research will be to investigate whether moss(and algae) bHLH orthologs, within these subfamilies,have retained the ancestral function or have evolvednew functions. Studies on AtRHD6 and AtRSL1 (sub-family 28), which control root hair development, pro-vide a first insight into this question. Interestingly, thecorresponding orthologs PpRSL1 and PpRSL2 in mossalso control the development of nonhomologous or-gans with a rooting function (Menand et al., 2007).

Some other bHLHs functionally characterized be-long to angiosperm-specific subfamilies subjected tolineage-specific expansions, which may reflect species-specific adaptations. Subfamilies 12 and 15 provideexamples of dicot- and monocot-specific expansion,respectively. Subfamily 12 clustersAtORG2 andAtORG3,

Carretero-Paulet et al.

1408 Plant Physiol. Vol. 153, 2010 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 12: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

regulated by iron ion deficiency-mediated stress and thephytohormones salicylic acid and jasmonic acid (Kanget al., 2003). Subfamily 15 includes AtERP, which isinvolved in GA signaling acting downstream of DELLAproteins (Zentella et al., 2007), conserved growth repres-sors that modulate GA responses. Furthermore, subfam-ily 23 clusters together six poplar sequences but onlythree Arabidopsis and rice homologs. Subfamily 23 isrepresented by AtLHW, which is involved in the regu-lation of the Arabidopsis root vascular initial population(Ohashi-Ito and Bergmann, 2007). Similar poplar-specificsignificant expansion has been found in the MADS boxsubfamily clustering AtANR1, which is also known to beinvolved in root development (Zhang and Forde, 1998;Leseberg et al., 2006), and in the R2R3-MYBC1 subfamily,whose members showed particularly abundant expres-sion in roots (Wilkins et al., 2009).Twelve out of the 32 bHLH subfamilies defined here

lack any functionally characterized member. Somesubfamilies might regulate biological roles essentialfor land plant development, as they conform to bigsubfamilies, including representatives from the fourland plant species (e.g. subfamilies 9 and 27), or arespecific to angiosperms (e.g. subfamilies 7, 18, 19, and20). We expect the comprehensive classification andevolutionary analysis of plant bHLHs presented hereto provide a useful framework to ortholog identifica-tion. This is a first step to infer the role of newlyidentified plant bHLH proteins in the transcriptionalregulation of growth and development processes aswell as toward further investigations into the role ofthe bHLH family in plant phenotypic diversification.

MATERIALS AND METHODS

Plant bHLH Sequence Identification and Analysis

Putative novel bHLH sequences were identified using BLAST (Altschul

et al., 1997) and profile HMMs (Durbin et al., 1988), generated and calibrated

with HMMER software version 2.3.3. Local searches were performed through

the proteomes of Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar

(Populus trichocarpa), and moss (Physcomitrella patens), downloaded from The

Arabidopsis Information Resource, The Institute for Genomic Research Rice

Genome Annotation, Joint Genome Institute (JGI) Ptri version 1.1, and JGI

Ppatens version 2.0 browsers, respectively. Similar searches were performed

on the whole sequenced genomes of Volvox carteri, Chlamydomonas reinhardtii,

Ostreococcus tauri, and Ostreococcus lucimarinus (JGI Volca version 1.0, JGI

Chlre version 4.0, JGI Ostta version 2.0, and JGI Ostlu version 2.0, respec-

tively), as well as Cyanidioschyzon merolae (http://merolae.biol.s.u-tokyo.ac.

jp/). Only hits returning E-values of less than 0.001 were considered for

further analysis. Redundant sequences were identified through BLASTCLUST

from the BLAST stand-alone package and subsequently discarded.

The bHLH sequences were aligned using the ClustalW, MUSCLE version

5.0, and MAFFT 6.0 (FFT-NS-2 algorithm) programs (Thompson et al., 1997;

Katoh et al., 2002; Edgar, 2004), and the resulting alignments were sub-

sequently manually edited using GENEDOC 2.6.002. Limits of the bHLH

domains were taken according to the proposed predictive consensus motif

(Atchley et al., 1999), constructed referring to the structure of the human MAX

bHLH protein (Ferre-D’Amare et al., 1993), and further corrected for predicted

plant-specific bHLH domain boundaries (Toledo-Ortiz et al., 2003; Roig-

Villanova et al., 2007).

The MEME version 3.5.7 tool was used to identify conserved motifs shared

among bHLH proteins (Bailey and Elkan, 1994; Bailey et al., 2006). The

following parameter settings were used: maximum number of different motifs

to find, 50; optimum motif width, 8 to 100. Subsequently, the MAST program

was used to search detected motifs in protein databases (Bailey and Gribskov,

1998). The motifs were further scanned against different domain databases,

including the National Center for Biotechnology Information’s Conserved

Domain Database, INTERPRO, and PROSITE (Apweiler et al., 2001).

Exon/intron location, distribution, and phases at the genomic sequences

encoding for the bHLH domain were examined through comparisons with the

predicted encoded protein using GENEWISE (Birney et al., 2004).

Phylogenetic Analysis

Reconstruction of evolutionary relationships was performed on the basis of

amino acid sequences of bHLH proteins. Only the bHLH domain was used,

because the flanking sequences of bHLH proteins from independent subfam-

ilies are either nonhomologous or too divergent to be reliably aligned. bHLH

sequences from the different species examined were added sequentially to the

analysis, and the resulting trees were compared with previous classifications

(Bailey et al., 2003; Buck and Atchley, 2003; Heim et al., 2003; Toledo-Ortiz

et al., 2003; Li et al., 2006b; Pires and Dolan, 2010).

The Jones, Taylor, and Thorton (JTT) with an estimated proportion of the

invariable sites (I) and an estimated g-distribution parameter (G) was selected

as the best-fitting amino acid substitution model with the Akaike information

criterion implemented in ProtTest version 1.4 (Jones et al., 1992; Abascal et al.,

2005). TheML analyses were performed using PHYML version 2.4.5 (Guindon

and Gascuel, 2003), using the JTT+I+G model. Heterogeneity of amino acid

substitution rates was corrected using a g-distribution with eight categories.

Tree topology searching was optimized using the subtree pruning and

regrafting option. The statistical support of the retrieved topology was

assessed using the Shimodaira-Hasegawa-like approximate likelihood ratio

test and a bootstrap analysis with 100 replicates. NJ and MP analyses were

implemented with MEGA 4.0 (Tamura et al., 2007). In NJ, distances were

calculated using the JTT amino acid substitution model, the g-distributed rate

among sites, and the g-parameter set as retrieved in ProtTest analysis. To deal

with short insertions/deletions (commonly occurring throughout the loop

region), “pairwise deletion” and “all sites” settings were used in NJ and MP

analyses, respectively. A bootstrap analysis with 1,000 replicates was per-

formed in each case.

BA was implemented in MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001;

Ronquist and Huelsenbeck, 2003). Searches were run with four Markov chains

for 1 million generations and sampling every 100th tree. After stationary

phase was reached (determined by independent runs sampling similar

likelihood values after plotting against the number of generations), the first

100,000 trees were discarded as burn-in and a consensus tree was then

constructed to evaluate clades with Bayesian posterior probabilities greater

than 50%. The JTT model with rate heterogeneity across sites modeled as

g-distributed with eight categories and invariant sites was used.

Yeast Two-Hybrid Interaction Assays

The Matchmaker two-hybrid system (Clontech) was used to perform yeast

two-hybrid assays. The full-length open reading frame of AtPAR1 was

inserted in frame with the DNA BD and transcription AD fusion construct

using the pGBKT7 and pGADT7 vectors, respectively. The NcoI-BamHI

fragment of pACV9 (containing the entire coding sequence of AtPAR1;

Roig-Villanova et al., 2007) was subcloned into the same sites of pGBKT7

and pGADT7, resulting in pCL3 (BD-PAR1) and pCL1 (AD-PAR1), respec-

tively. L1mut was generated by PCR-based site-directed mutagenesis using

the primers RO47 (5#-GATTGAGGCGGAGCAGAGGATTATCCCCGG-

AGGAG-3#) and RO48 (5#-GATAATCCTCTGCTCCGCCTCAATCTTTTCC-

TTGAC-3#). L2mut was similarly generated using the primers RO49

(5#-CATTCTGTCTAAACAATGTCAGATCAAAACCATTA-3#) and RO50

(5#-GATCTGACATTGTTTAGACAGAATGTAACCAGCTG-3#). In both cases,

AtPAR1 was amplified from the binary vector pBF1 (P35S:AtPAR1-GFP, a

pCAMBIA-1302-based binary vector containing full-length AtPAR1 flanked

by NcoI and SpeI) using specific primers from the P35S and GFP coding

sequences. Mutated L1mut and L2mut sequences were subcloned into pCRII-

TOPO (Invitrogen) to generate pIR44 and pMR5, respectively. Site-directed

mutations in these inserts were verified by sequencing. The NcoI-SpeI

fragments of pIR44 and pMR5 were subcloned into the same sites of

pGADT7, resulting in pCM7 (AD-PAR1-L1mut) and pCM8 (AD-PAR1-

L2mut), respectively.

Genome-Wide Analysis of Plant bHLHs

Plant Physiol. Vol. 153, 2010 1409 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 13: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

Yeast (AH109 strain) transformation was performed according to the

manufacturer’s instructions. Yeast cells were cotransformed with the different

pairs of BD-AD constructs. Independent transformants were selected on

minimal synthetic dropout medium (SD)-Leu-Trp (SD-LT). At least 10 inde-

pendent colonies were transferred to SD-Ade-His-Leu-Trp to test for protein-

protein positive interaction (SD-AHLT).

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure S1. ClustalW amino acid sequence alignment of 638

Arabidopsis, poplar, rice, moss, and algae bHLH domains.

Supplemental Figure S2. ML phylogenetic tree of 638 plant bHLH

proteins.

Supplemental Figure S3. BA phylogenetic tree of 50 plant bHLH proteins.

Supplemental Table S1. Plant and animal bHLH predictive consensus

motifs.

Supplemental Table S2. Species classification of 638 plant bHLH se-

quences examined in this study.

Supplemental Table S3. Subfamily classification of 638 Arabidopsis,

poplar, rice, moss, and algae bHLH sequences examined in this study

and additional information.

Supplemental Table S4. Summary of conserved motifs identified by

MEME in plant bHLHs.

Supplemental Table S5. Rates of sequence divergence at the amino acid

level in Arabidopsis, poplar, rice, moss, and C. reinhardtii bHLH

sequences.

ACKNOWLEDGMENTS

We thank F. Paulet-Dubois for critical reading of the manuscript and all

our laboratory members for stimulating discussions and suggestions. We also

thank two anonymous referees for their insightful comments. This work has

been carried out within the University of Almerıa, the University of

Manchester, and the Centre CONSOLIDER for Research in Agricultural

Genomics. Thanks also to the Apple Research and Technology Support

scheme for support.

Received January 18, 2010; accepted May 13, 2010; published May 14, 2010.

LITERATURE CITED

Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models

of protein evolution. Bioinformatics 21: 2104–2105

Abe H, Yamaguchi-Shinozaki K, Urao T, Iwasaki T, Hosokawa D,

Shinozaki K (1997) Role of Arabidopsis MYC and MYB homologs in

drought- and abscisic acid-regulated gene expression. Plant Cell 9:

1859–1868

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,

Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of

protein database search programs. Nucleic Acids Res 25: 3389–3402

Amoutzias GD, Robertson DL, Oliver SG, Bornberg-Bauer E (2004)

Convergent evolution of gene networks by single-gene duplications in

higher eukaryotes. EMBO Rep 5: 274–279

Amoutzias GD, Veron AS, Weiner J III, Robinson-Rechavi M, Bornberg-

Bauer E, Oliver SG, Robertson DL (2007) One billion years of bZIP

transcription factor evolution: conservation and change in dimerization

and DNA-binding site specificity. Mol Biol Evol 24: 827–835

Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M,

Bucher P, Cerutti L, Corpet F, Croning MD, et al (2001) The InterPro

database, an integrated documentation resource for protein families,

domains and functional sites. Nucleic Acids Res 29: 37–40

Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of

the flowering plant Arabidopsis thaliana. Nature 408: 796–815

Atchley WR, Fitch WM (1997) A natural classification of the basic helix-

loop-helix class of transcription factors. Proc Natl Acad Sci USA 94:

5172–5176

Atchley WR, Terhalle W, Dress A (1999) Positional dependence, cliques,

and predictive motifs in the bHLH protein domain. J Mol Evol 48:

501–516

Bailey PC, Martin C, Toledo-Ortiz G, Quail PH, Huq E, HeimMA, Jakoby

M, Werber M, Weisshaar B (2003) Update on the basic helix-loop-helix

transcription factor gene family in Arabidopsis thaliana. Plant Cell 15:

2497–2502

Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maxi-

mization to discover motifs in biopolymers. In Proceedings of the

Second International Conference on Intelligent Systems for Molecular

Biology. AAAI Press, Menlo Park, CA, pp 28–36

Bailey TL, Gribskov M (1998) Combining evidence using p-values: appli-

cation to sequence homology searches. Bioinformatics 14: 48–54

Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and

analyzing DNA and protein sequence motifs. Nucleic Acids Res 34:

W369–W373

Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome

Res 14: 988–995

Buck MJ, Atchley WR (2003) Phylogenetic analysis of plant basic helix-

loop-helix proteins. J Mol Evol 56: 742–750

Cano-Delgado A, Yin Y, Yu C, Vafeados D, Mora-Garcia S, Cheng JC, Nam

KH, Li J, Chory J (2004) BRL1 and BRL3 are novel brassinosteroid

receptors that function in vascular differentiation in Arabidopsis.

Development 131: 5341–5351

Chandler JW, Cole M, Flier A, Werr W (2009) BIM1, a bHLH protein

involved in brassinosteroid signalling, controls Arabidopsis embryonic

patterning via interaction with DORNROSCHEN and DORNRO-

SCHEN-LIKE. Plant Mol Biol 69: 57–68

Chaw SM, Chang CC, Chen HL, Li WH (2004) Dating the monocot-dicot

divergence and the origin of core eudicots using whole chloroplast

genomes. J Mol Evol 58: 424–441

Chinnusamy V, Ohta M, Kanrar S, Lee BH, Hong X, Agarwal M, Zhu JK

(2003) ICE1: a regulator of cold-induced transcriptome and freezing

tolerance in Arabidopsis. Genes Dev 17: 1043–1054

de Pater S, Pham K, Memelink J, Kijne J (1997) RAP-1 is an Arabidopsis

MYC-like R protein homologue, that binds to G-box sequence motifs.

Plant Mol Biol 34: 169–174

Doebley J, Lukens L (1998) Transcriptional regulators and the evolution of

plant form. Plant Cell 10: 1075–1082

Dubos C, Le Gourrierec J, Baudry A, Huep G, Lanet E, Debeaujon I,

Routaboul JM, Alboresi A, Weisshaar B, Lepiniec L (2008) MYBL2 is a

new regulator of flavonoid biosynthesis in Arabidopsis thaliana. Plant J

55: 940–953

Durbin R, Eddy SR, Krogh A, Mitchison G (1988) Biological Sequence

Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cam-

bridge University Press, Cambridge, UK

Edgar RC (2004) MUSCLE: multiple sequence alignment with high accu-

racy and high throughput. Nucleic Acids Res 32: 1792–1797

Fairchild CD, Schumaker MA, Quail PH (2000) HFR1 encodes an atypical

bHLH protein that acts in phytochrome A signal transduction. Genes

Dev 14: 2377–2391

Fairman R, Beran-Steed RK, Anthony-Cahill SJ, Lear JD, Stafford WF III,

DeGrado WF, Benfield PA, Brenner SL (1993) Multiple oligomeric

states regulate the DNA binding of helix-loop-helix peptides. Proc Natl

Acad Sci USA 90: 10429–10433

Ferre-D’Amare AR, Pognonec P, Roeder RG, Burley SK (1994) Structure

and function of the b/HLH/Z domain of USF. EMBO J 13: 180–189

Ferre-D’Amare AR, Prendergast GC, Ziff EB, Burley SK (1993) Recogni-

tion by Max of its cognate DNA through a dimeric b/HLH/Z domain.

Nature 363: 38–45

Friedrichsen DM, Nemhauser J, Muramitsu T, Maloof JN, Alonso J, Ecker

JR, Furuya M, Chory J (2002) Three redundant brassinosteroid early

response genes encode putative bHLH transcription factors required for

normal growth. Genetics 162: 1445–1456

Goff SA, Cone KC, Chandler VL (1992) Functional analysis of the tran-

scriptional activator encoded by the maize B gene: evidence for a direct

functional interaction between two classes of regulatory proteins. Genes

Dev 6: 864–875

Gremski K, Ditta G, Yanofsky MF (2007) The HECATE genes

regulate female reproductive tract development in Arabidopsis thali-

ana. Development 134: 3593–3601

Carretero-Paulet et al.

1410 Plant Physiol. Vol. 153, 2010 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 14: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm

to estimate large phylogenies by maximum likelihood. Syst Biol 52:

696–704

Halliday KJ, Hudson M, Ni M, Qin M, Quail PH (1999) poc1: an

Arabidopsis mutant perturbed in phytochrome signaling because

of a T DNA insertion in the promoter of PIF3, a gene encoding a

phytochrome-interacting bHLH protein. Proc Natl Acad Sci USA 96:

5832–5837

Heim MA, Jakoby M, Werber M, Martin C, Weisshaar B, Bailey PC (2003)

The basic helix-loop-helix transcription factor family in plants: a

genome-wide study of protein structure and functional diversity. Mol

Biol Evol 20: 735–747

Hu J, Anderson B, Wessler SR (1996) Isolation and characterization of rice

R genes: evidence for distinct evolutionary paths in rice and maize.

Genetics 142: 1021–1031

Hua X, Yokoyama C, Wu J, Briggs MR, Brown MS, Goldstein JL, Wang X

(1993) SREBP-2, a second basic-helix-loop-helix-leucine zipper protein

that stimulates transcription by binding to a sterol regulatory element.

Proc Natl Acad Sci USA 90: 11603–11607

Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of

phylogenetic trees. Bioinformatics 17: 754–755

Huq E, Quail PH (2002) PIF4, a phytochrome-interacting bHLH factor,

functions as a negative regulator of phytochrome B signaling in

Arabidopsis. EMBO J 21: 2441–2450

Hyun Y, Lee I (2006) KIDARI, encoding a non-DNA binding bHLH protein,

represses light signal transduction in Arabidopsis thaliana. Plant Mol

Biol 61: 283–296

International Rice Genome Sequencing Project (2005) The map-based

sequence of the rice genome. Nature 436: 793–800

Jakoby M, Wang HY, Reidt W, Weisshaar B, Bauer P (2004) FRU

(BHLH029) is required for induction of iron mobilization genes in

Arabidopsis thaliana. FEBS Lett 577: 528–534

Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of

mutation data matrices from protein sequences. Comput Appl Biosci

8: 275–282

Kaiser BN, Finnegan PM, Tyerman SD, Whitehead LF, Bergersen FJ, Day

DA, Udvardi MK (1998) Characterization of an ammonium transport

protein from the peribacteroid membrane of soybean nodules. Science

281: 1202–1206

Kanaoka MM, Pillitteri LJ, Fujii H, Yoshida Y, Bogenschutz NL,

Takabayashi J, Zhu JK, Torii KU (2008) SCREAM/ICE1 and SCREAM2

specify three cell-state transitional steps leading to Arabidopsis stomatal

differentiation. Plant Cell 20: 1775–1785

Kang HG, Foley RC, Onate-Sanchez L, Lin C, Singh KB (2003) Target

genes for OBP3, a Dof transcription factor, include novel basic helix-

loop-helix domain proteins inducible by salicylic acid. Plant J 35:

362–372

Karol KG, McCourt RM, Cimino MT, Delwiche CF (2001) The closest

living relatives of land plants. Science 294: 2351–2353

Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for

rapid multiple sequence alignment based on fast Fourier transform.

Nucleic Acids Res 30: 3059–3066

Kellogg EA (2004) Evolution of developmental traits. Curr Opin Plant Biol

7: 92–98

Kenrick P, Crane PR (1997) The origin and early evolution of plants on

land. Nature 389: 33–39

Khanna R, Huq E, Kikis EA, Al-Sady B, Lanzatella C, Quail PH (2004) A

novel molecular recognition motif necessary for targeting photo-

activated phytochrome signaling to specific basic helix-loop-helix tran-

scription factors. Plant Cell 16: 3033–3044

Kiribuchi K, Sugimori M, Takeda M, Otani T, Okada K, Onodera H,

Ugaki M, Tanaka Y, Tomiyama-Akimoto C, Yamaguchi T, et al (2004)

RERJ1, a jasmonic acid-responsive gene from rice, encodes a basic helix-

loop-helix protein. Biochem Biophys Res Commun 325: 857–863

Komatsu M, Maekawa M, Shimamoto K, Kyozuka J (2001) The LAX1 and

FRIZZY PANICLE 2 genes determine the inflorescence architecture of

rice by controlling rachis-branch and spikelet development. Dev Biol

231: 364–373

Ledent V, Vervoort M (2001) The basic helix-loop-helix protein family:

comparative genomics and phylogenetic analysis. Genome Res 11:

754–770

Lee S, Lee S, Yang KY, Kim YM, Park SY, Kim SY, Soh MS (2006)

Overexpression of PRE1 and its homologous genes activates gibberellin-

dependent responses in Arabidopsis thaliana. Plant Cell Physiol 47:

591–600

Leivar P, Monte E, Al-Sady B, Carle C, Storer A, Alonso JM, Ecker JR,

Quail PH (2008) The Arabidopsis phytochrome-interacting factor PIF7,

together with PIF3 and PIF4, regulates responses to prolonged red light

by modulating phyB levels. Plant Cell 20: 337–352

Leseberg CH, Li A, Kang H, Duvall M, Mao L (2006) Genome-wide

analysis of the MADS-box gene family in Populus trichocarpa. Gene

378: 84–94

Li N, Zhang DS, Liu HS, Yin CS, Li XX, LiangWQ, Yuan Z, Xu B, Chu HW,

Wang J, et al (2006a) The rice tapetum degeneration retardation gene is

required for tapetum degradation and anther development. Plant Cell

18: 2999–3014

Li X, Duan X, Jiang H, Sun Y, Tang Y, Yuan Z, Guo J, Liang W, Chen L,

Yin J, et al (2006b) Genome-wide analysis of basic/helix-loop-helix

transcription factor family in rice and Arabidopsis. Plant Physiol 141:

1167–1184

Liljegren SJ, Roeder AH, Kempin SA, Gremski K, Ostergaard L, Guimil

S, Reyes DK, Yanofsky MF (2004) Control of fruit patterning in

Arabidopsis by INDEHISCENT. Cell 116: 843–853

Liu T, Ohashi-Ito K, Bergmann DC (2009) Orthologs of Arabidopsis

thaliana stomatal bHLH genes and regulation of stomatal development

in grasses. Development 136: 2265–2276

Ludwig SR, Habera LF, Dellaporta SL, Wessler SR (1989) Lc, a member of

the maize R gene family responsible for tissue-specific anthocyanin

production, encodes a protein similar to transcriptional activators

and contains the myc-homology region. Proc Natl Acad Sci USA 86:

7092–7096

Lupas A (1996) Coiled coils: new structures and new functions. Trends

Biochem Sci 21: 375–382

Ma PC, Rould MA, Weintraub H, Pabo CO (1994) Crystal structure of

MyoD bHLH domain-DNA complex: perspectives on DNA recognition

and implications for transcriptional activation. Cell 77: 451–459

Martinez-Garcia JF, Huq E, Quail PH (2000) Direct targeting of light

signals to a promoter element-bound transcription factor. Science 288:

859–863

Massari ME, Murre C (2000) Helix-loop-helix proteins: regulators of

transcription in eucaryotic organisms. Mol Cell Biol 20: 429–440

Matsuzaki M, Misumi O, Shin IT, Maruyama S, Takahara M, Miyagishima

SY, Mori T, Nishida K, Yagisawa F, Yoshida Y, et al (2004) Genome

sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae

10D. Nature 428: 653–657

Menand B, Yi K, Jouannic S, Hoffmann L, Ryan E, Linstead P, Schaefer

DG, Dolan L (2007) An ancient mechanism controls the development of

cells with a rooting function in land plants. Science 316: 1477–1480

Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman

GB, Terry A, Salamov A, Fritz-Laylin LK, Marechal-Drouard L, et al

(2007) The Chlamydomonas genome reveals the evolution of key animal

and plant functions. Science 318: 245–250

Morgenstern B, Atchley WR (1999) Evolution of bHLH transcription

factors: modular evolution by domain shuffling? Mol Biol Evol 16:

1654–1663

Morohashi K, Zhao M, Yang M, Read B, Lloyd A, Lamb R, Grotewold E

(2007) Participation of the Arabidopsis bHLH factor GL3 in trichome

initiation regulatory events. Plant Physiol 145: 736–746

Nair SK, Burley SK (2000) Recognizing DNA in the library. Nature 404:

715, 717–718

Nam J, Kim J, Lee S, An G, Ma H, Nei M (2004) Type I MADS-box genes

have experienced faster birth-and-death evolution than type II MADS-

box genes in angiosperms. Proc Natl Acad Sci USA 101: 1910–1915

Nei M, Rooney AP (2005) Concerted and birth-and-death evolution of

multigene families. Annu Rev Genet 39: 121–152

Nesi N, Debeaujon I, Jond C, Pelletier G, Caboche M, Lepiniec L (2000)

The TT8 gene encodes a basic helix-loop-helix domain protein required

for expression of DFR and BAN genes in Arabidopsis siliques. Plant Cell

12: 1863–1878

Ni M, Tepperman JM, Quail PH (1998) PIF3, a phytochrome-interacting

factor necessary for normal photoinduced signal transduction, is a novel

basic helix-loop-helix protein. Cell 95: 657–667

Oh E, Kim J, Park E, Kim JI, Kang C, Choi G (2004) PIL5, a phytochrome-

interacting basic helix-loop-helix protein, is a key negative regulator of

seed germination in Arabidopsis thaliana. Plant Cell 16: 3045–3058

Ohashi-Ito K, Bergmann DC (2007) Regulation of the Arabidopsis root

Genome-Wide Analysis of Plant bHLHs

Plant Physiol. Vol. 153, 2010 1411 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.

Page 15: Genome-Wide Classification and Evolutionary … et al., 1997; Smolen et al., 2002; Chinnusamy et ... 15 to 20 residues typically rich in basic amino acids, is ... the so-called core

vascular initial population by LONESOME HIGHWAY. Development

134: 2959–2968

Palenik B, Grimwood J, Aerts A, Rouze P, Salamov A, Putnam N, Dupont

C, Jorgensen R, Derelle E, Rombauts S, et al (2007) The tiny eukaryote

Ostreococcus provides genomic insights into the paradox of plankton

speciation. Proc Natl Acad Sci USA 104: 7705–7710

Payne CT, Zhang F, Lloyd AM (2000) GL3 encodes a bHLH protein that

regulates trichome development in Arabidopsis through interaction

with GL1 and TTG1. Genetics 156: 1349–1362

Pillitteri LJ, Sloan DB, Bogenschutz NL, Torii KU (2007) Termination of

asymmetric cell division and differentiation of stomata. Nature 445:

501–505

Pires N, Dolan L (2010) Origin and diversification of basic-helix-loop-helix

proteins in plants. Mol Biol Evol 27: 862–874

Rajani S, Sundaresan V (2001) The Arabidopsis myc/bHLH gene

ALCATRAZ enables cell separation in fruit dehiscence. Curr Biol 11:

1914–1922

Ramsay NA, Glover BJ (2005) MYB-bHLH-WD40 protein complex and the

evolution of cellular diversity. Trends Plant Sci 10: 63–70

Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H,

Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, et al (2008) The

Physcomitrella genome reveals evolutionary insights into the conquest

of land by plants. Science 319: 64–69

Richardt S, Lang D, Reski R, Frank W, Rensing SA (2007) PlanTAPDB, a

phylogeny-based resource of plant transcription-associated proteins.

Plant Physiol 143: 1452–1466

Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L,

Pineda O, Ratcliffe OJ, Samaha RR, et al (2000) Arabidopsis transcrip-

tion factors: genome-wide comparative analysis among eukaryotes.

Science 290: 2105–2110

Robinson KA, Lopes JM (2000) Survey and summary: Saccharomyces

cerevisiae basic helix-loop-helix proteins regulate diverse biological

processes. Nucleic Acids Res 28: 1499–1505

Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remark-

able interkingdom conservation of intron positions and massive,

lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol

13: 1512–1517

Roig-Villanova I, Bou-Torrent J, Galstyan A, Carretero-Paulet L, Portoles

S, Rodriguez-Concepcion M, Martinez-Garcia JF (2007) Interaction of

shade avoidance and auxin responses: a role for two novel atypical

bHLH proteins. EMBO J 26: 4756–4767

Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic

inference under mixed models. Bioinformatics 19: 1572–1574

Shimizu T, Toumoto A, Ihara K, Shimizu M, Kyogoku Y, Ogawa

N, Oshima Y, Hakoshima T (1997) Crystal structure of PHO4

bHLH domain-DNA complex: flanking base recognition. EMBO J 16:

4689–4697

Simionato E, Ledent V, Richards G, Thomas-Chollier M, Kerner P,

Coornaert D, Degnan BM, Vervoort M (2007) Origin and diversification

of the basic helix-loop-helix gene family in metazoans: insights from

comparative genomics. BMC Evol Biol 7: 33

Smolen GA, Pawlowski L, Wilensky SE, Bender J (2002) Dominant alleles

of the basic helix-loop-helix transcription factor ATR2 activate stress-

responsive genes in Arabidopsis. Genetics 161: 1235–1246

Sorensen AM, Krober S, Unte US, Huijser P, Dekker K, Saedler H (2003)

The Arabidopsis ABORTED MICROSPORES (AMS) gene encodes a

MYC class transcription factor. Plant J 33: 413–423

Stevens JD, Roalson EH, Skinner MK (2008) Phylogenetic and expres-

sion analysis of the basic helix-loop-helix transcription factor gene

family: genomic approach to cellular differentiation. Differentiation 76:

1006–1022

Szecsi J, Joly C, Bordji K, Varaud E, Cock JM, Dumas C, Bendahmane M

(2006) BIGPETALp, a bHLH transcription factor is involved in the

control of Arabidopsis petal size. EMBO J 25: 3912–3920

Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolu-

tionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol

24: 1596–1599

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997)

The CLUSTAL_X windows interface: flexible strategies for multiple

sequence alignment aided by quality analysis tools. Nucleic Acids Res

25: 4876–4882

Thorstensen T, Grini PE, Mercy IS, Alm V, Erdal S, Aasland R, Aalen RB

(2008) The Arabidopsis SET-domain protein ASHR3 is involved in

stamen development and interacts with the bHLH transcription factor

ABORTED MICROSPORES (AMS). Plant Mol Biol 66: 47–59

Toledo-Ortiz G, Huq E, Quail PH (2003) The Arabidopsis basic/helix-loop-

helix transcription factor family. Plant Cell 15: 1749–1770

Tsiantis M, Hay A (2003) Comparative plant development: the time of the

leaf? Nat Rev Genet 4: 169–180

Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,

Putnam N, Ralph S, Rombauts S, Salamov A, et al (2006) The genome

of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:

1596–1604

Voronova A, Baltimore D (1990) Mutations that disrupt DNA binding and

dimer formation in the E47 helix-loop-helix protein map to distinct

domains. Proc Natl Acad Sci USA 87: 4722–4726

Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM (2009) Expansion

and diversification of the Populus R2R3-MYB family of transcription

factors. Plant Physiol 149: 981–993

Winston RL, Ehley JA, Baird EE, Dervan PB, Gottesfeld JM (2000)

Asymmetric DNA binding by a homodimeric bHLH protein. Biochem-

istry 39: 9092–9098

Yamashino T, Matsushika A, Fujimori T, Sato S, Kato T, Tabata S, Mizuno

T (2003) A link between circadian-controlled bHLH factors and the

APRR1/TOC1 quintet in Arabidopsis thaliana. Plant Cell Physiol 44:

619–629

Yin Y, Vafeados D, Tao Y, Yoshida S, Asami T, Chory J (2005) A new class

of transcription factors mediates brassinosteroid-regulated gene ex-

pression in Arabidopsis. Cell 120: 249–259

Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D (2004) A

molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol

Evol 21: 809–818

Zentella R, Zhang ZL, Park M, Thomas SG, Endo A, Murase K, Fleet CM,

Jikumaru Y, Nambara E, Kamiya Y, et al (2007) Global analysis of della

direct targets in early gibberellin signaling in Arabidopsis. Plant Cell 19:

3037–3057

Zhang H, Forde BG (1998) An Arabidopsis MADS box gene that controls

nutrient-induced changes in root architecture. Science 279: 407–409

Zhang R, Wang YQ, Su B (2008) Molecular evolution of a primate-specific

microRNA family. Mol Biol Evol 25: 1493–1502

Zhang W, Sun Y, Timofejeva L, Chen C, Grossniklaus U, Ma H (2006)

Regulation of Arabidopsis tapetum development and function by

DYSFUNCTIONAL TAPETUM1 (DYT1) encoding a putative bHLH

transcription factor. Development 133: 3085–3095

Carretero-Paulet et al.

1412 Plant Physiol. Vol. 153, 2010 www.plantphysiol.orgon June 12, 2018 - Published by Downloaded from

Copyright © 2010 American Society of Plant Biologists. All rights reserved.