polyploidization events shaped the transcription factor ... · 116 (blanc and wolfe,...

36
1 Polyploidization events shaped the transcription factor 1 repertoires in legumes (Fabaceae) 2 3 Kanhu C. Moharana and Thiago M. Venancio * 4 5 Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e 6 Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro; Campos dos 7 Goytacazes, Brazil. 8 9 * Corresponding author 10 Av. Alberto Lamego 2000 / P5 / 217; Parque Califórnia 11 Campos dos Goytacazes, RJ 12 Brazil 13 CEP: 28013-602 14 [email protected] 15 16 . CC-BY-NC-ND 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 21, 2019. ; https://doi.org/10.1101/849778 doi: bioRxiv preprint

Upload: others

Post on 11-May-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

1

Polyploidization events shaped the transcription factor1

repertoiresinlegumes(Fabaceae)2

3KanhuC.MoharanaandThiagoM.Venancio*45Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e6Biotecnologia,UniversidadeEstadual doNorte FluminenseDarcyRibeiro;Camposdos7Goytacazes,Brazil.89*Correspondingauthor10Av.AlbertoLamego2000/P5/217;ParqueCalifórnia11CamposdosGoytacazes,RJ12Brazil13CEP:[email protected] 16

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 2: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

2

Abstract17

Transcription factors (TF) are essential for proper plant growth and development.18Several legumes, particularly soybean, are rich sources of protein and oil, with great19impactintheeconomyofseveralcountries.Herewereportaphylogenomicanalysisof20majorTFfamilies in legumesandtheirpotentialassociationwith importanttraitssuch21as nitrogen fixation and seed development. We used TF DNA-binding domains to22systematically screen the genomes of 15 legume and 5 non-legume species. The23percentageofTFsrangedfrom3-8%ofthegenecomplements.TForthologousgroups24(OG) inextantspecieswereusedtoestimateOGsizes inancestornodesusingagene25birth-deathmodel,whichallowedus to identify lineage-specificexpansions.Together,26OGanalysis and rateof synonymous substitutions (Ks) betweengenepairs show that27major TF expansions are strongly associated with known whole-genome duplication28(WGD)eventsinthelegume(~58mya)andGlycine(~13mya)lineages,whichaccount29fora large fractionof thePh.vulgarisandGl.maxTFrepertoires.Outof the3407Gl.30maxTFs,1808and676canbetracedbacktoasinglehomeolog inPh.vulgarisandVi31vinifera,respectively.WefoundatrendforTFsexpandedinlegumestobepreferentially32transcribedinrootsandnodules,suggestingtheirrecruitmentearlyintheevolutionof33nodulationinthelegumeclade.WealsofoundTFexpansionsintheGlycineWGDthat34werefollowedbygenelossinthewildsoybeanGl.soja,includinggeneslocatedwithin35importantquantitativetraitloci.Together,ourfindingsstronglysupporttherolesoftwo36WGDsinshapingtheTFrepertoiresinthelegumeandGlycinelineages,whicharelikely37relatedtoimportantaspectsoflegumeandsoybeanbiology.3839

Keywords: whole genome duplication, phylogenomics, segmental duplication,40nodulation,soybean.41

Abbreviations:42

TF:transcriptionfactors;DBD:DNAbindingdomains;AP2:APETALA2;ERF:Ethylene43ResponsiveFactor;RAV:RelatedtoABI3/VP1;ARF:Auxinresponsefactor;BBR-BPC:44BarleyBRecombinant(BBR)-BASICPENTACYSTEINE1(BPC1);BES1:BRI1-EMS-45SUPPRESSOR;bHLH:Basichelixloophelix;bZIP:Basicleucinezipper;Dof:DNAbinding46withonefinger;CO-like:CONSTANS-like;LSD:LESIONSIMULATINGDISEASE1(LSD1);47C2H2:CCHH(Zn);C3H:CCCH(Zn);CAMTA:Calmodulinbindingtranscriptionfactors;48CPP:Cystein-richpolycomb-likeprotein;DBB:DoubleB-boxzincfinger;E2F/DP:E249factorproteinandDPprotein;EIL:Ethylene-Insentive3(EIN3)-likeprotein3(EIL3);FAR1:50FAR-REDIMPAIREDRESPONSE1;LFY:LEAFY;G2-like:Golden2(G2)-like;ARR-B:Type-B51phospho-acceptingresponseregulator;GeBP:GLABROUS1enhancer-bindingprotein;52

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 3: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

3

GRAS:GAI,RGA,andSCR;GRF:GROWTH-REGULATINGFACTOR;TALE:ThreeAminoacid53LoopExtension;WOX:WUShomeobox-containingproteinfamily;HB:Homeobox;HB-54PHD:HB-PHDfinger;HB-other:HB-other;HRT-like:Hairy-Relatedtranscription-factor-55like;HSF:Heatshockfactor;LBD:ASYMMETRICLEAVES2/LATERALORGAN56BOUNDARIES;M-type:MADS-typeI;MIKC:MADS-typeII;MYB:Mybproto-oncogene57protein;NAC:NAM,ATAF1,2andCUC2;NF-X1:Nuclearfactor,X-boxbinding1;NF-YA:58NuclearfactorYsubunitA;NF-YB:NuclearfactorYsubunitB;NF-YC:NuclearfactorY59subunitC;Nin-like:NODULEINCEPTION;NZZ/SPL:SPOROCYTELESS/NOZZLE;S1Fa-like:60S1Fa-like;TCP:TEOSINTE-LIKE1,CYCLOIDEA,andPROLIFERATINGCELLFACTOR1;ZF-HD:61Zincfingerhomeodomainprotein;SBP:SQUAMOSApromoterbindingprotein;SRS:SHI62RELATEDSEQUENCE;SAP:STERILEAPETALA;STAT:SignalTransducersandActivatorsof63Transcription;VOZ:VascularplantOne-Zincfinger.6465Introduction66Legumes(Fabaceae)arethethird largestAngiospermfamily,comprisingnearly20,00067specieswithtremendousmorphologicalandecologicalvariation(Lewis,2005).Legumes68arenotoriousfortheirsymbioticinteractionswithspecificdiazotrophicbacteria,which69is a feature of major ecological and agronomic relevance. Out of the six Fabaceae70subfamilies, Papilionoideae alone accounts for two-thirds of all legume species,71including economically important crops, such as Glycine max (soybean), Phaseolus72vulgaris (common bean), Arachis hypogaea (peanut), and Cicer arietinum (chickpea)73(GrahamandVance,2003;Cardosoetal.,2012;Azanietal.,2017).Severallegumegrains74andpulsesarerichsourcesofdietaryproteins,cookingoils,andbiofuels.Currently,at75least15legumegenomesarepubliclyavailable(Table1),includingthosefromwildand76cultivatedsoybean(GlycinesojaandGl.max).Nevertheless,Fabaceaesubfamiliesother77thanPapilionoideaeare largelyunderrepresentedamongsequencedgenomes,despite78the recently published Chamaecrista fasciculata and Mimosa pudica genomes79(Griesmannetal.,2018).80 Virtually all major biological processes are at least partially regulated at the81transcriptional level by specific DNA-binding transcription factors (TFs),which bind to82cis-regulatory elements of target genes by means of DNA binding domains (DBDs).83Because of their key regulatory roles, TFs have been extensively demonstrated to be84critical for plant evolution and adaptation to multiple environments (Doebley and85Lukens, 1998;Lehti-Shiu et al., 2017). A typical TF family encodes proteins sharing a86commonDBD.Over50TFfamilieshavebeenidentifiedinplants(Jinetal.,2017),outof87whichmanyregulatebiologicalprocessessuchasgrowth,development,stresssignaling,88anddefenseagainstpathogens (Lataetal., 2011;Tripathietal., 2016;Heetal., 2018).89

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 4: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

4

Although many TF families are present throughout eukaryotes, their sizes can vary90considerably (Lespinetetal.,2002;Nagataetal.,2016;Lehti-Shiuetal.,2017).PlantTF91families are usually larger than their animal counterparts (Shiu, 2005) and the92exceptional increase in TF family sizes in higher plants is often related with whole93genome duplication (WGD) and triplication (WGT) events, also known as94polyploidization(Lehti-Shiuetal.,2017).95

Polyploidization is widely accepted as an important factor in angiosperm96evolution(PickettandMeeks-Wagner,1995;Soltisetal.,2015).Allcoreeudicotsshare97atleastonepolyploidizationevent(ahexaploidyevent,i.e.theγpolyploidy)(Jaillonet98al.,2007;AközandNordborg,2019).Withtheincreasingavailabilityofplantgenomes,it99becameclearthatseveralotherWGDeventsoccurredaftertheγpolyploidizationevent.100Forexample,Arabidopsishastwoadditionallineage-specificWGDs(αandβpolyploidy)101(Blancetal.,2000;Visionetal.,2000),whereaslegumesshareatetraploidancestorthat102originated~58-60millionyearsago(mya)(Cannonetal.,2006).Furtheranalysisofthe103soybean and narrowleaf lupin (Lupinus angustifolius) genomes uncovered additional104lineage-specificWGDsintheselineages(Schmutzetal.,2010;Krocetal.,2014;Haneet105al., 2017). Importantly, domestication of polyploid species ismore likely than that of106their wild relatives (Salman-Minkov et al., 2016), supporting the importance of such107events in agriculture. In addition to large scale duplication events, small scale108duplications(SSDs)orlocal(e.g.tandem)duplicationsalsocontributedtotheexpansion109ofmultiplegenefamilies(Cannonetal.,2004).110

Therelativecontributionofduplicationmodes inplantgenomes isasubjectof111intenseresearch(Panchyetal.,2016).UponWGD,genelossisthemostcommonfateof112oneof theduplicates, inaprocesscalled fractionation(Freelingetal.,2015;Panchyet113al., 2016;Cheng et al., 2018). Nevertheless, some gene families (e.g. TFs and signal114transductiongenes)areapparentlymorepronetoretainduplicatedcopiesthanothers115(BlancandWolfe,2004;Lehti-Shiuetal.,2017).Differentmechanisticexplanationshave116beenproposedfor thisphenomenon,outofwhichthegenebalancehypothesis is the117most accepted one. According to this hypothesis, upon a WGD, genes with many118interaction partners have higher probability of being retained in duplicates, since119alterations inthestoichiometrytheirproteinproductstendtobedeleterious (Birchler120andVeitia,2007;Freeling,2009;BirchlerandVeitia,2011).Retainedcopiesthentypically121evolve via subfunctionalization (i.e. duplicates acquire complementary functions) or122neofunctionalization (i.e. one of the copies evolves a new function) (Freeling et al.,1232015).124

It is currently accepted that the transition fromwild to domesticated soybean125took place in Central China between 5,000 and 9,000 years ago through a gradual126processthatinvolvedanintermediaryspecies,Gl.gracilis(Hanetal.,2016;Sedivyetal.,127

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 5: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

5

2017).Other linesofevidencesupport independentdomesticationevents inEastAsia128(Korea and Japan) (Zhou et al., 2015;Sedivy et al., 2017). Artificial selection during129domestication involved several distinct traits, such as pod shattering, seed hardness,130adaptationtodifferentphotoperiods,floweringtime,andstressresistance(Zhouetal.,1312015;Sedivy et al., 2017). Several important TFs are involved in these traits. SHAT1-5132(NAC family TF) promote shattering resistance by increasing lignification of fiber cap133cells(Dongetal.,2014).134

Theprogressinsequencingtechnologiesandautomationoverthepast12years135unleashedthepowerofcomparativeandpopulationgenomicsinpinpointingkeygenes136involved indomesticationand improvementofsoybeanandothercrops,as illustrated137by the discovery of many Quantitative Trait Loci (QTL) involved in commercially138importanttraits(e.g.seedweightandoilcontent)bytheresequencingof302soybean139accessions (Zhou et al., 2015). In the present work we systematically investigate the140evolution of TF repertoires in legumes (Fabaceae). Briefly, we performed large-scale141comparativeanalysisof TFs from15 legumeand fivenon-legume species.Our results142unveil a profound impact of polyploidization events on the expansion of TF families143throughoutlegumes.Inparticular,someTFfamiliesthatexpandedinthelegumeWGD144event (~58 mya) are preferentially expressed in roots and nodules, supporting their145importanceintheevolutionofnodulation.Further,TFexpansionsthathappenedatthe146GlycineWGD(~13mya)includegenesthatweresubsequentlylostinGl.soja,including147TF genes that are within Gl. max QTLs associated with leaf shape, area and width,148proportionofFA18inseeds,andbranchdensity.Together,ourresultsstronglysupport149thatWGDeventsdeeply shaped theevolutionof TF repertoires in legumesand likely150generatedTFsthatregulatenodulationandothertraitsofkeyagronomicimportance.151

152153ResultsandDiscussion154

Systematicidentificationandcharacterizationoftranscriptionfactors155

We used a set of diagnostic specific DNA binding domains and forbidden domains156(SupplementaryTableS1)to identifyTFs inthegenomesof20plantspecies(Table1).157Wepredictedatotalof37,008TFs(SupplementaryTableS2),whichwereclassifiedin58158broadfamilies(Table2).Atotalof31,111TFswerepredictedinthe15legumegenomes159(Supplementary Table S2).We benchmarked our pipeline by comparing the detected160TFs with those previously predicted in Ar. thaliana. Out of 1713 Ar. thaliana TFs161availableinPlantTFDB,1673(98%)werecorrectlypredicted(SupplementaryFigureS1).162Further, 59 TFs were exclusively predicted by our pipeline, out of which 40 were163annotated as TFs in the TAIR database (https://www.arabidopsis.org) (Supplementary164

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 6: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

6

TableS3).ThepercentageofTFsacrossgenomesrangedfrom5to8%,whichisinline165with a previous estimation from95 eudicot species (Jin et al., 2017).Gl. soja andSe.166moellendorffii showed the highest and lowest number of TFs, respectively (Figure 1).167LegumestypicallyshowedgreaternumberofTFsthannon-legumes(Figure1),although168thevariationinthesefractionsindicatesthatsomeTFexpansionsplayspecificroles in169particularlineages.170

To better understand the different proportion of TFs across genomes, we171comparedTF familysizesbetweenpairsof species.AllexceptsixTF families (i.e.AP2,172GRAS, B3, Nin-like, HRT-like, and Trihelix) expanded in the basal angiosperm Am.173trichopoda in comparison to the lycophyte Se. moellendorffii, supporting the174contribution ancient WGD events to the TF repertoires of seed plants (Albert et al.,1752013).AlthoughMADSTFstightlyregulateflowerdevelopment,theirdiversificationhas176been proposed to predate the origin of angiosperms (Albert et al., 2013).We found177twicemoreMADSgenes inAm.trichopoda (n=34)than inSe.moellendorffii (n=15). In178particular, theMIKC-type (type II)MADSsubfamilyalonehas increasedby five-fold, in179spiteofthehigherrateofgenebirth/deathoftheM-type(type-I)MADSsubfamily(Nam180etal.,2004;Kumpeangkeawetal.,2019).ByanalyzingTFclusters (described later)we181observed that genes from twoM-typeMADS clusters are exclusively present in Am.182trichopoda, probablyas a resultof a lineage-specific expansion.Wealsoanalyzed the183expansionoftheGRASfamilyinAm.trichopoda(n=44)ascomparedtothebasaldicot184Aq. coerulea (n= 36), which happened via lineage-specific tandem duplications (10185genes) in the former (Supplementary Figure S3). These 10 genes belong to a single186orthologous group (OG, see below) that does not have orthologs from other dicots187exceptone from thebasaleudicotAq. coerulea. Inaddition,we foundonemoreAm.188trichopoda specific OG consisting of two GRAS genes (scaffold00007.332 and189scaffold00007.335).TherearealsoafewremarkableexpansionsinSe.moellendorffiiin190comparison toAm. trichopoda (e.g.HD-Zip,NAC, TCP,GATA, expandedbymore than191twofold)(Table2).Together,theseresultssupportagrowthoftheTFrepertoireearlyin192thediversificationofangiosperms.193

Aq.coeruleaisanancienttetraploidandthistetraploidywaslikelyanimportant194first step towards the gamma hexaploidy (4n+2n) that is shared by all core eudicots195(AközandNordborg,2019).Nevertheless,someTFfamiliesareremarkablylargerinAq.196coeruleathaninVi.vinifera,suchasFAR1(Aco:92,Vvi:19),B3(Aco:104,Vvi:29),GeBP197(Aco: 13, Vvi: 1), andM-TypeMADS (Aco: 50, Vvi: 17) (Figure 2; Table 2). After the198gamma hexaplodization event, Vi. vinifera has not undergone any large scale199duplication event,making it a suitable reference for comparative analysis with other200core eudicots (Jaillon et al., 2007;Severin et al., 2011;Wang et al., 2017).Most large201familiesareexpandedinAr.thalianaandlegumesincomparisontoVi.vinifera(Figure202

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 7: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

7

2).Therearealsosomenotablespecies-specificexpansionsinlegumes,suchasthatof203FAR1,B3,andM-TypeMADS inMe. truncatula (Figure2).FAR1hasalsoexpanded10204timesinAr.ipaensisandAr.duranensis.FAR1hasbeenlinkedwithskotomorphogenesis205andphotomorphogenesis inhigherplantsand itsexpansionmightberelatedwiththe206fructificationprocess in peanuts (Chen et al., 2016;Lu et al., 2018).Unlike the above-207mentionedexpansionsofMADSTFsinAm.trichopoda,onlyM-type(type-I)MADShad208large expansions in all legumes in comparison toVi. Vinifera, as previously discussed209(Nametal.,2004;Feiletal.,2013;Kumpeangkeawetal.,2019).210

TheGlycine genus has amore recentWGD that is not sharedwithPhaseolus.211Accordingly,wefoundanapproximateratioof1:2betweenPh.vulgarisandGl.maxin21290%(52/58)oftheTFfamilies,implyingthattheGlycineWGDhasstronglycontributed213tothesoybeanTFrepertoire.Therearealsoa fewdeviations fromthis trend,suchas214theNAC(Gl.max:142andPh.vulgaris:90)andNOZZLE/SPL(Gl.max:2andPh.vulgaris:2153) families. Of the 42 NAC OGs with genes from Ph. vulgaris and Gl. max, 9 have216identical number of genes, indicating that there are subfamilies that rapidly reverted217back to their configuration before the Glycine WGD, probably due to gene dosage218sensitivity.WealsonoticedthatGl.sojahas38moreTFsthanGl.max.WhilesixteenTF219familieshadidenticalnumberofgenesinbothspecies,otherssuchasERF(Gl.max:290,220Gl.soja:279)andFAR1(Gl.max:68,Gl.soja:79)showedsignificantvariationbetween221cultivatedandwildsoybeans.222

223

OriginofTFparalogs224

We also investigated the modes of duplication shaping TF family sizes.225Paralogouspairshadtheirmodesofduplicationpredictedusingapreviouslydescribed226scheme(Proulxetal.,2011;Qiaoetal.,2018)thatassignduplicatepairsinthefollowing227categories: segmentalduplicates (SD); tandemduplicates (TD);pairs separatedbyone228to five intervening genes were called proximal duplicates (PD); pairs originated via229retrotransposon (rTE) or DNA transposon (dTE) activity. The remaining pairs were230classified as dispersed duplicates (DD). We used the priority order231SD>TD>PD>rTE>dTE>DD to assign a single duplicationmode to each pair. In legumes,232morethan70%oftheTFshaveatleastoneparalog(Table3).Further,itisclearthatSD233isthemainduplicationmode,supportingtheiroriginthroughlargescaleduplication,as234previously reported (Lehti-Shiuetal.,2017).Gl.maxandGl. sojaare thespecieswith235thegreatestnumberofSDTFs,whichcomprise77.7%oftheTFrepertoireintheformer.236In addition, local duplications (i.e. TD and PD) also contributed to TF repertoires,237particularlyinMe.truncatula,whichhas11%(241/1752)oftheduplicateTFsclassified238asTDandPD,especially in theERF,WRKY,andB3 families (SupplementaryTableS4).239

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 8: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

8

TheprevalenceofTDpairsinMe.truncatulahasalsobeenreportedingenesrelatedto240otherregulatoryroles,suchasintheF-boxfamily(Bellieny-Rabeloetal.,2013).Further,241in Ar. duranensis, Ar. ipaensis, and Vi. radiata, nearly 7% of the duplicated TFs are242derivedfromTD,whereasdTEduplicationsaccount for29.8%of theduplicatedTFs in243Ca.fasciculata,especiallyintheMYB,NAC,andbHLHfamilies(Table3;Supplementary244Table S4). There are also some notable differences in the prevalence of modes of245duplicationbetweencloselyrelatedspecies.Forexample,localTFduplicationsaremore246frequentinAr.ipaensisthaninAr.duranensis(Table3).247 Between5.5%(188/3407,inGl.max)and60.7%(560/923,inAm.trichopoda)of248theTFswereclassifiedassingletons(Table3).WhileinAr.thaliana22.58%(392/1736)249oftheTFsweresingletons,inlegumesthisnumberrangesbetween5.5and29%(Table2503). Importantly, a large fraction of these singletons remain syntenic to a reference251outgroup species (Table 4). InPh. vulgaris andMe. truncatula, syntenic singleton TFs252were significantly more expressed than their non-syntenic counterparts (Figure 3A),253suggesting that their greater functional conservation is associatedwith their genomic254context.MostSDTFswerealso foundtobesyntenic in theirclosestoutgroupspecies255(Table 4). We also estimated non-synonymous/synonymous mutation ratios (Ka/Ks)256betweensingletonandSDTFswithpreservedsyntenyinanoutgroupspecies.Orthologs257from SD pairs had significantly lower Ka/Ks than the singleton orthologs (Figure 3B),258leadingustohypothesizethatthesegenesareunderstrongpurifyingselectiondueto259their involvement in intricate regulatory systems emerging from the WGD events.260Similar observations on the strong negative selection of duplicated genes have been261alsoreportedinotherspecies(DavisandPetrov,2004;Jordanetal.,2004).262

ThesignificantnumberofsyntenicSDTFsderivedmostlybygeneretentionafter263successiveWGDevents (Table 3).Geneduplicability, the ability of a duplicate pair to264remain duplicate, is non-random and biased towards specific gene families, including265TFs (Lynch and Conery, 2000;Davis and Petrov, 2004;Li et al., 2016).We analyzed TF266duplicabilityusingGl.maxTFsfromsyntenicblocksthatsurvivedthe58myaandthe13267myaWGDevents.Weusedintra-speciescollinearblockstoidentifyGl.maxSDTFsthat268correspondtosinglesyntenicregionsinPh.vulgaris.WeusedamaximumKsthreshold269of0.4tofiltertheGl.maxSDpairsthatlikelyemergedinthe13myaWGD(Schmutzet270al., 2010). Nearly 81% (1808/2230) of theGl.max SD TFswithin that Ks range had a271syntenicgeneinPh.vulgaris.Further,75%(676/904)ofsuchPh.vulgarisorthologshad272asingleVi.viniferasyntenicortholog.InbothcaseswefoundthatbHLHfamilyhadthe273highestnumberofsyntenicgenepairs(Gl.max-Ph.vulgaris:99pairsandPh.vulgaris-Vi.274Vinifera:43pairs).Conversely,only16% (15/93)of theGl.max syntenic singletonTFs275correspond to single genomic regions inPh. vulgaris andVi vinifera.We hypothesize276thatthesegenesnotonlydependontheconservationofa localgenomiccontext,but277

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 9: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

9

arealsosensitivetogenedosage.Ourresultsclearly illustratethehighduplicabilityof278most TF families in soybean and further support the impact of twoWGDevents that279accountforaprominentfractionoftheTFrepertoireofthisspecies.280

281

LargescaleduplicationeventscorrelatewithincreaseinTFcopynumber282

Many TF families explored here are broad and diversified, often comprising283multiplesub-groups,suchasbHLH(PiresandDolan,2010),MYB(Duetal.,2012),and284ERF(Nakano,2006).ToobtainanoverviewofthediversificationofplantTFfamilies,we285assignedthemtoOGsbyusingall-vs-allreciprocalBLASTPsearch,followedbyMarkov286clustering(seemethodsfordetails).Forexample,AP2had28clusters(Supplementary287Table S5), labeled as AP:1 to AP:28.We found 1557 TF OGs from the 58 TF families288reported above. Nearly 9% (144/1557) of these OGs had no members from legume289species,whereas29%(452/1557)werelegume-specific,and43%(672/1557)hadgenes290fromatleast10species.Expectedly,largerfamilieshadmoreOGs,suchasbHLH,C2H2,291andMYB,withmorethan100OGseach.Conversely,afewfamiliesdivertedfromthis292trend, such as SAP (65 genes and 7 OGs) and EIL (143 members and 13 OGs)293(SupplementaryFigureS4).294

To investigate the evolution of TF families in more detail, we analyzed the295numberofgenesperOGineachspeciesusingCAFE(v.4.2)(Bieetal.,2006;Hanetal.,2962013),anelegantmethodthatusesgenebirth(λ)anddeath(μ)ratestomodelgainand297loss events in different lineages of a given ultrametric species tree (seemethods for298details).Wesearchedforoptimalλandμbasedonthemaximumlikelihoodscore,using299the option to take potentially fragmented genomes into account (see methods for300details).Weused672TFOGswithsufficientvariation innumberofgenesperspecies301(statisticalvariance≥0.5)andcontaininggenesfromat least10species.Forexample,30210outof31AP2clusterswereusedforrateestimation(SupplementaryTableS5).We303repeated the rateparameter search for 50 times and theparameters resulting in the304bestmaximumlikelihoodscorewereusedforfurtheranalysis.Theresultsobtainedwith305CAFElargelyconfirmthegeneraltrendforTFgainuponWGD(Figure4),whichisinline306withtheabove-mentionedcorrelationbetweenSD,theretentionofparalogousTFpairs,307andintraspeciessynteny.Thistrendcanbeexemplifiedbythenodesrepresentingthe308legume and Glycine ancestors, which have a high number of expanded TF families309(Figure4).310

We also investigated the impact of the legume and Glycine WGDs in the TF311repertoires of Gl. max and Gl. soja. Firstly, we analyzed the 138 OGs (from 34 TF312families)thatexpandedinlegumesincomparisontonon-legumes(Figure4).Ifall1557313OGs are considered, an average of 0.21 genes were gained per OG in legumes, in314

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 10: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

10

contrast to 1.09 genes in the 138 expanded OGs. Secondly, we identified rapidly315evolvingOGs(10of138;7.25%)(Table5),whicharethosewithsignificantgenegainor316lossrate(p-value<0.05)(Table6).Inthese10OGs,theaveragerateofgenegainwas3172.0, nearly two and 10 times of that observed in the 138 expanded OGs and in the318completesetof1557OGs,respectively.319

We analyzed the most prevalent modes of duplication in the 138 OGs that320expanded in legumesand found thatamajor fractionof thememergedviaSD.While321SDscomprisemorethan80%oftheTFsinspecieswithmorerecentWGDs(i.e.Gl.max,322Gl. soja,and Lu. angustifolius) (Figure 5), several SDpairsmight have lost collinearity323afterthelegumeWGDandwereclassifiedasDDs.WheninspectingtheKsdistributions324oftheparalogouspairsfromthe138OGsthatexpandedin legumes,wefoundranges325corresponding to both, the legume and Glycine WGDs (Figure 5) (Schmutz et al.,3262010;Cannon,2013),suggestingthatafractionoftheTFsthatexpandedinthelegume327WGDsubsequentlyduplicated fora second time in theGlycineWGD.Deviations from328this range were observed for Me. truncatula, Arachis spp., Cicer spp. and, Ch.329fasciculata,aspreviouslyreported(Cannonetal.,2010;Varshneyetal.,2013;Tangetal.,3302014;Chenetal.,2016).DDparalogshaveamoredispersedKsdistributionthanthatSD,331althoughtheirKsdistributionsalsoindicatethatseveralDDpairswerelikelygenerated332bySDwithsubsequentlossofcollinearity(Figure5).Collectively,theseresultssupport333theassociationbetweentheexpansionoflegume-specificTFexpansionsandtheWGD334eventthattookplace58mya.335

To further explore the functional relevance of legume TF expansions, we336analyzedgeneexpressionpatterns(seemethodsfordetails)acrossmultipletissuesfrom337Me.truncatula,Ph.vulgaris,andGl.max(Figure6;SupplementaryFigureS5).Strikingly,338the 138 OGs that expanded in legumes are enriched in genes with preferential339expression innodules (Fisher'sExactTest,p-values=1.7×10-4and1.4×10-5 forMe.340truncatulaandGl.max,respectively)androots(Fisher'sExactTest,p-values=1.4×10-9341and5.9×10-3 forMe.truncatulaandPh.vulgaris, respectively).Theseresults indicate342that the recruitmentof thesegenespredate theemergenceofnodulation in legumes343andmighthaveplayedrolesintherootphysiologyassociatedwiththisprocess.344

Next, we integrated phylogenetic reconstructions of the 10 rapidly expanded345OGswith gene expression data and found three interesting groups (i.e. bHLH:12,M-346type:1,ERF:10)(Table6).ThebHLH:12OGshowedsignificantlyhigherexpressionduring347nodule development in Me. truncatula, Ph. vulgaris, and Gl. max (Figure 6B and348Supplementary Figure S5). This OG includes two SD pairs ofMe. truncatula bHLHs,349Medtr4g087920-Medtr2g015890andMedtr4g079760-Medtr2g091190withKsvaluesof3501.0523and0.8292,respectively.Ph.vulgarisandGl.maxorthologsofthesegeneswere351alsomoreexpressedinrootsandnodulesthaninothertissues(Figure6).352

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 11: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

11

Another interestingOGencodesTFsfromtheM-typeMADSfamily (M-Type:1).353This OG has independently expanded in different species (Supplementary Figure S6),354includingamajorexpansioninLu.angustifolius.Nineof14Gl.maxgenesinthiscluster355emerged within the Glycine genus, including one gene (Glyma.03G083700.1) with356preferentialexpression in seedsand flowers (SupplementaryFigureS7A).The twoPh.357vulgaris orthologs (Phvul.006G077700 and Phvul.006G077800.1) showed seed-specific358expression, suggesting their importance in seed development (Supplementary Figure359S7B). Interestingly, these Ph. vulgaris genes originated from an ancestral tandem360duplication,as thisorganization isalso found inother legumes (SupplementaryFigure361S6).AlthoughthisOGlackedanAr.thalianamember,theclosestAr.thalianahomologs362include AT5G27810, AT1G22590 (AGAMOUS-LIKE 87, AGL87), and AT5G48670363(AGAMOUS-LIKE80,AGL80). Importantly,AGL80hasbeenshowntoberesponsible for364centralcellandendospermdevelopmentinArabidopsis(Portereikoetal.,2006).365

Wealso analyzed twoERF (ethylene response factor)OGs (ERF:10andERF:18)366containing genes playing critical roles in nodulation. Of these two, only ERF:10 was367amongthe10rapidlyexpandedOGs.ManualcurationrevealedthatERF:10andERF:18368compriseERFrequiredfornoduledifferentiation (EFD)andERF requiredfornodulation369(ERN) genes, respectively. ERN and EFD genes regulate nodulation inMe. truncatula370(Vernieetal.,2008;Youngetal.,2011;Cerrietal.,2012).ThreeofthefourMtERNs(i.e.371Medtr7g085810.1, Medtr6g029180.1, and Medtr8g085960.1) had relatively higher372expression after inoculation than in roots or nodules, supporting their critical role in373nodule development (Supplementary Figure S9A). The biased expression towards374nodules and root tissues are also observed in Ph. vulgaris and Gl. max orthologs375(Supplementary Figure S8). Of the twoMtEFDs,Medtr4g008860 andMedtr3g106290376weremoreexpressed innodulesand inoculated roothairs, respectively. Similarly,Ph.377vulgarisandGl.maxERNsarealsomoreexpressedinrootsandnodulesthaninaerial378tissues(SupplementaryFigureS8).379

380

AssociationbetweenquantitativetraitsandGlycinemaxspecificTFs381

The Glycine node had the largest number of expanded OGs, with 36% (563/1557)382(Figure4)ofthemexpandingbyanaveragerateof1.67genesperOG.Outofthese,57383had rapidly expanded (p-value < 0.05) with an average rate of 2 genes per OG. In384comparison to the Glycine node, 76 and 82 OGs expanded in Gl. soja and Gl. max,385respectively. Of the OGs that expanded in Gl. max, 79% (65/82) showed significant386expansions (p-value < 0.05), with average rate of 1.44. Among these families, ERF (6387OGs),MYB(5OGs),MYB_related(8OGs),bHLH(7OGs),andC2H2(7OGS)TFsgained388more than5genesperOG. Interestingly, 59% (202/341)of thegenes fromexpanded389

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 12: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

12

OGs liewithin SD regions, supporting the importance of theGlycineWGD in shaping390thesefamiliesinGl.max.391

We explored whether some of these OGs could be related with important392soybeanagronomic traits.WesearchedtheGl. soja syntenic regionscorrespondingto393the341Gl.maxTFsfromthe65rapidlyexpandedOGs.Weidentified50TFswithouta394homeolog inGl. soja (Supplementary Table S6), outofwhichonly fivehada syntenic395ortholog inPh. vulgaris. Interestingly, twoERF (Glyma.20G115300,Glyma.14G161900)396and one SBP (Glyma.06G205700) TFs are within previously reported chromosomal397regionsassociatedwithimportantquantitativetraits(SupplementaryFigureS9)(Fanget398al., 2017). The ERF Glyma.20G115300 was located within a region associated with399overall leaf size and average number of seeds per pod. The second ERF,400Glyma.14G161900,iswithinaregionassociatedwithFA18contentandratioinmature401seeds.Finally,theSBPTFGlyma.06G205700iswithinaregionregulatingbranchdensity402(i.e. ratio of branch number and plant height) and beginning bloom date.403(Supplementary Table S6). We envisage that many more of these 50 TFs will be404associatedwithimportanttraits,whichcouldberevealedinbyamorecomprehensive405work integrating QTL information from other genotypes and studies with our406phylogenomicresults.407408

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 13: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

13

Methods409

Genomicdata410

Genome sequences and annotations for 20 plant species were obtained from public411repositories(Table1).Weusedcodingsequencesandpredictedproteinsequencesfrom412thelongestsplicingisoform(whenmorethanoneareavailable).Predictedproteinswith413less than 50 amino acids or containing premature stop codons or more than 20%414ambiguousaminoacidswereexcluded.415

Predictionandclassificationsoftranscriptionfactors(TFs)416

To remove redundancy due to splicing isoforms and incomplete gene predictions,we417removed nearly identical sequences using BLASTCLUST (Altschul et al., 1997) as418previouslydescribed (parameters: -S1.89 -L0.9 -bF) (Gossanietal.,2014;Vidaletal.,4192016).420

We adopted the TF family classification scheme of plantTFDB (Zhang et al.,4212011;Jinetal.,2017).Wecreatedalocaldatabaseofproteindomainsbycombiningall422HMMprofiles fromPFAM-A (Release31.0) (Finnetal., 2016)and13plant specific TF423HMM profiles downloaded from PlantTFDB (Supplementary Table S1). Protein424sequencesweresearchedforconserveddomainsusingHMMER3.0(http://hmmer.org)425(domaine-valuecutoff<0.01).TFswereclassifiedin58familiesaccordingtotheirDBD.426427

Speciesphylogeny428

A species phylogeny was reconstructed using low copy orthologs present in all 20429species. We clustered the predicted proteins on the basis of the pairwise sequence430similarityoftheirlongestproteinproducts,whichwascomputedwithBLAST(e-value≤4311e-5)(Altschuletal.,1997).Sequencepairswithpercentageidentityofatleast35%and432querycoverageofat least50%wereused forMarkovclusteringusingmclblastline (v.43312-068; Inflation parameter: 1.5) (Enright et al., 2002). Clusters containing up to 22434geneswithatleastonegenefromeachspecieswereused.Ifaspecieshadparalogous435genes, the paralog with greater identity to orthologs from other species was used.436Amino acid sequence alignment was performed using DECIPHER (Wright, 2015) and437cDNAalignmentperformedwithPAL2NAL(Suyamaetal.,2006).Weconcatenatedthe438codonalignmentsofthesegenestocreateasuper-alignment.Next,phangorn(Schliep,4392011) was used to estimate the best substitution model for the phylogenetic440reconstruction, which was performed using RAxML (v8.2.11; model: GTRGAMMAIG4,441bootstrap:1000)(Stamatakis,2014).Thephylogramandsequencealignmentwereused442inrelTime-ML(implementedinMEGA-X,v.10.0.1)(Tamuraetal.,2018)togeneratean443

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 14: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

14

ultrametric tree.Weused theTimeTreedatabase (Kumaret al., 2017) to retrieve the444divergencetimesofFabaceaeandVi.vinifera(110mya)andofPh.vulgarisandGl.max445(24mya),whichwereusedasreferences.446

Syntenyandsynonymousmutationrate(Ks)447

We identified segmental duplications using DAGCHAINER (r02062008) (Haas et al.,4482004) by using bidirectional best BLASTPhits (e-value ≤ 1e-10, 35%minimum identity,44950%minimumquery coverage). Aminimumof four collinear geneswere required to450identifyasyntenicblock(DAGCHAINER,parameter-A4),aspreviouslyusedinsoybean451(Severin et al., 2011). Tandem duplicates were also identified using DAGCHAINER452(parameters-T-A2).Tandemorsegmentalgenepairshadtheirnon-synonymous(Ka)453and synonymous (Ks) mutation rates estimated using the bp_pairwise_kaks script,454distributedwithBioPerl(v5.22.1)(Stajichetal.,2002).455

OrthologousclusterandTFparalogidentification456

TF OGs were inspected for different modes of gene duplication. Multiple modes of457duplication can also co-occur in a group of genes. In these cases, only one type of458duplication is reported, following the order SD>TD>PD>rTE>dTE>DD (Proulx et al.,4592011;Qiao et al., 2018). Although this strategy can slightly underestimate some460duplication levels, it helpedus to assess themain forces shaping theexpansionof TF461families.462

EstimationofexpansionsandcontractionsinTFfamilies463

WeusedCAFE(v4.2)(Hanetal.,2013)toassesstheevolutionofTFfamilysizesusing464the time-calibrated species tree and TF OG compositions as inputs. We used the465cafeerror.pyscript,availableintheCAFEpackage,tomodelerrorratesthatmighthave466been introduced in gene family sizes, particularly by species with more fragmented467genomeassemblies(e.g.Lu.angustifolius)(Hanetal.,2013).Thiserrormodelwasused468adjustfamilysizes.WeestimatedλandμbyrunningCAFEfor50timesandselectedthe469parameters that gave the bestmaximum likelihood estimate. These parameterswere470usedtoestimateOGsizesatancestornodesandtopredictrapidlyevolvingOGs(p-value471<0.05),whicharethosethatsignificantlygainedorlostgenes.Theaveragenetchange472ateachnodeonthespeciestreewasexpressedas:473

Theaveragechangeonnode!" = (&'()')+',-

+ 474

wherenisthetotalnumberofOGs,(Mi−Xi)isthedifferenceinOGsizebetweennodeM475and its parent node X for a given OG i. A negative or positive Am value stands for476

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 15: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

15

contractionorexpansion,respectively.SomeremarkablyexpandedorcontractedTFOG477had their phylogenies reconstructed with RAxML (v8.2.11; model: GTRGAMMAAUTO,478bootstrap: 1000) and visualized using Figtree (v.1.4.3)479(http://tree.bio.ed.ac.uk/software/figtree/).480

Geneexpressiondataandtissuespecificity481

Ar. thaliana and Ph. vulgaris normalized gene expression data were obtained from482public databases such as ArrayExpress (Liu et al., 2012) and PvGEA (O’Rourke et al.,4832014), respectively.FouradditionalRNAseqdatasetsweredownloaded fromtheNCBI484SRAdatabase(https://www.ncbi.nlm.nih.gov/sra/).Thefirsttwodatasetscomprisetwo485soybean transcriptome studies (Bioproject PRJNA208048, PRJNA79597) (Libault et al.,4862010;Severin et al., 2010). The third dataset includesMe. truncatula transcriptomes487(Boscari et al., 2013) in the following conditions and tissues: nitrogen-starving roots,488rootsinoculatedwithSinorhizobiummeliloti,androotnodules(BioProjectPRJNA79233).489We also downloaded an additionalMe. truncatula RNAseq data covering 7 different490tissues(BioProjectPRJNA80163).RNAseqreadsweremappedoneachspeciesgenome491using STAR v2.5.3a (Dobin et al., 2013) and normalized gene expression values were492estimatedwithStringTiev1.3.4d(Perteaetal.,2015),bothwithdefaultparameters.493

Expression values lower than 1 were converted to 0 and considered not494expressed.Weadded1toallvalues,whichwerethenlog2transformed.Todetermine495tissue-preferential expression, we transformed the gene expression values in a496transformed z-score index (Kryuchkova-Mostacci and Robinson-Rechavi, 2016).497Dependingonthehighestexpressioninagiventissue,geneswithtransformedz-score498index>0.9werelabeledaspreferentiallyexpressed.499500

Micro-syntenicregionsbetweenGl.max,Gl.soja,andPh.vulgaris501

WeusedDAGCHAINERoutputfilestoidentifythemicrosyntenicregionsinGl.max,Gl.502soja,and Ph. vulgaris. Inparticular,wequeried thegenes fromOGswith significantly503larger (as predictedbyCAFE) sizes inGl.max in comparison to theGlycine node. For504each Gl. max gene, we considered only one collinear region from Gl. soja and Ph.505vulgaris.Whenmorethanonecollinearregionwasdetected,weselectedthatwiththe506highest DAGCHAINER alignment score. We visualized the microsynteny regions using507GenomeContextVieweravailableonLegumeInformationSystem(Clearyetal.,2017).508

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 16: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

16

QTLintervals509

Weobtainedthechromosomalcoordinatesof150QTLssignificantlyassociatedwith57510soybean traits (Fang et al., 2017). Chromosomal coordinates of soybean genes were511mappedtotheseQTLregionsusingbedtoolsv2.26.0(QuinlanandHall,2010).512513Acknowledgements514ThisworkwassupportedbyfundingfromCoordenaçãodeAperfeiçoamentodePessoal515deNível Superior (CAPES, Finance code 001), ConselhoNacional de Desenvolvimento516Científico e Tecnológico (CNPq), and Fundação Carlos Chagas Filho de Amparo à517PesquisadoEstadodoRiodeJaneiro(FAPERJ).518519

520

References521

Aköz,G.,andNordborg,M.(2019).TheAquilegiagenomerevealsahybridoriginofcoreeudicots.522bioRxiv.10.1101/407973523

Albert,V.A.,Barbazuk,W.B.,Depamphilis,C.W.,Der,J.P.,Leebens-Mack,J.,Ma,H.,Palmer,J.D.,524Rounsley,S.,Sankoff,D.,Schuster,S.C.,Soltis,D.E.,Soltis,P.S.,Wessler,S.R.,Wing,R.A.,Albert,525V.A.,Ammiraju,J.S.S.,Barbazuk,W.B.,Chamala,S.,Chanderbali,A.S.,Depamphilis,C.W.,Der,J.P.,526Determann,R.,Leebens-Mack,J.,Ma,H.,Ralph,P.,Rounsley,S.,Schuster,S.C.,Soltis,D.E.,Soltis,527P.S.,Talag,J.,Tomsho,L.,Walts,B.,Wanke,S.,Wing,R.A.,Albert,V.A.,Barbazuk,W.B.,Chamala,528S.,Chanderbali,A.S.,Chang,T.-H.,Determann,R.,Lan,T.,Soltis,D.E.,Soltis,P.S.,Arikit,S.,Axtell,529M.J.,Ayyampalayam,S.,Barbazuk,W.B.,Burnette,J.M.,Chamala,S.,DePaoli,E.,Depamphilis,530C.W.,Der,J.P.,Estill,J.C.,Farrell,N.P.,Harkess,A.,Jiao,Y.,Leebens-Mack,J.,Liu,K.,Mei,W.,531Meyers,B.C.,Shahid,S.,Wafula,E.,Walts,B.,Wessler,S.R.,Zhai,J.,Zhang,X.,Albert,V.A.,532Carretero-Paulet,L.,Depamphilis,C.W.,Der,J.P.,Jiao,Y.,Leebens-Mack,J.,Lyons,E.,Sankoff,D.,533Tang,H.,Wafula,E.,Zheng,C.,Albert,V.A.,Altman,N.S.,Barbazuk,W.B.,Carretero-Paulet,L.,534Depamphilis,C.W.,Der,J.P.,Estill,J.C.,Jiao,Y.,Leebens-Mack,J.,Liu,K.,Mei,W.,Wafula,E.,535Altman,N.S.,Arikit,S.,Axtell,M.J.,Chamala,S.,Chanderbali,A.S.,Chen,F.,Chen,J.-Q.,Chiang,V.,536DePaoli,E.,Depamphilis,C.W.,Der,J.P.,etal.(2013).TheAmborellaGenomeandtheEvolution537ofFloweringPlants.Science342,1241089-1241089.10.1126/science.1241089538

Altschul,S.F.,Madden,T.L.,Schaffer,A.A.,Zhang,J.,Zhang,Z.,Miller,W.,andLipman,D.J.(1997).539GappedBLASTandPSI-BLAST:anewgenerationofproteindatabasesearchprograms.Nucleic540AcidsRes25,3389-3402541

Arabidopsis-Genome-Initiative(2000).Analysisofthegenomesequenceofthefloweringplant542Arabidopsisthaliana.Nature408,796-815.10.1038/35048692543

Azani,N.,Babineau,M.,Bailey,C.D.,Banks,H.,Barbosa,A.,Pinto,R.B.,Boatwright,J.,Borges,L.,544Brown,G.,Bruneau,A.,Candido,E.,Cardoso,D.,Chung,K.-F.,Clark,R.,Conceição,A.D.,Crisp,M.,545Cubas,P.,Delgado-Salinas,A.,Dexter,K.,Doyle,J.,Duminil,J.,Egan,A.,DeLaEstrella,M.,Falcão,546M.,Filatov,D.,Fortuna-Perez,A.P.,Fortunato,R.,Gagnon,E.,Gasson,P.,Rando,J.G.,Azevedo547Tozzi,A.M.G.D.,Gunn,B.,Harris,D.,Haston,E.,Hawkins,J.,Herendeen,P.,Hughes,C.,Iganci,548J.V.,Javadi,F.,Kanu,S.A.,Kazempour-Osaloo,S.,Kite,G.,Klitgaard,B.,Kochanovski,F.,Koenen,549E.M.,Kovar,L.,Lavin,M.,Roux,M.L.,Lewis,G.,DeLima,H.,López-Roberts,M.C.,Mackinder,B.,550Maia,V.H.,Malécot,V.,Mansano,V.,Marazzi,B.,Mattapha,S.,Miller,J.,Mitsuyuki,C.,Moura,551

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 17: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

17

T.,Murphy,D.,Nageswara-Rao,M.,Nevado,B.,Neves,D.,Ojeda,D.,Pennington,R.T.,Prado,D.,552Prenner,G.,DeQueiroz,L.P.,Ramos,G.,RanzatoFilardi,F.,Ribeiro,P.,Rico-Arce,M.D.L.,553Sanderson,M.,Santos-Silva,J.,São-Mateus,W.B.,Silva,M.S.,Simon,M.,Sinou,C.,Snak,C.,De554Souza,É.,Sprent,J.,Steele,K.,Steier,J.,Steeves,R.,Stirton,C.,Tagane,S.,Torke,B.,Toyama,H.,555Cruz,D.T.D.,Vatanparast,M.,Wieringa,J.,Wink,M.,Wojciechowski,M.,Yahara,T.,Yi,T.,and556Zimmerman,E.(2017).AnewsubfamilyclassificationoftheLeguminosaebasedona557taxonomicallycomprehensivephylogeny–TheLegumePhylogenyWorkingGroup(LPWG).Taxon55866,44-77.10.12705/661.3559

Banks,J.A.,Nishiyama,T.,Hasebe,M.,Bowman,J.L.,Gribskov,M.,Depamphilis,C.,Albert,V.A.,560Aono,N.,Aoyama,T.,Ambrose,B.A.,Ashton,N.W.,Axtell,M.J.,Barker,E.,Barker,M.S.,561Bennetzen,J.L.,Bonawitz,N.D.,Chapple,C.,Cheng,C.,Correa,L.G.,Dacre,M.,Debarry,J.,562Dreyer,I.,Elias,M.,Engstrom,E.M.,Estelle,M.,Feng,L.,Finet,C.,Floyd,S.K.,Frommer,W.B.,563Fujita,T.,Gramzow,L.,Gutensohn,M.,Harholt,J.,Hattori,M.,Heyl,A.,Hirai,T.,Hiwatashi,Y.,564Ishikawa,M.,Iwata,M.,Karol,K.G.,Koehler,B.,Kolukisaoglu,U.,Kubo,M.,Kurata,T.,Lalonde,S.,565Li,K.,Li,Y.,Litt,A.,Lyons,E.,Manning,G.,Maruyama,T.,Michael,T.P.,Mikami,K.,Miyazaki,S.,566Morinaga,S.,Murata,T.,Mueller-Roeber,B.,Nelson,D.R.,Obara,M.,Oguri,Y.,Olmstead,R.G.,567Onodera,N.,Petersen,B.L.,Pils,B.,Prigge,M.,Rensing,S.A.,Riano-Pachon,D.M.,Roberts,A.W.,568Sato,Y.,Scheller,H.V.,Schulz,B.,Schulz,C.,Shakirov,E.V.,Shibagaki,N.,Shinohara,N.,Shippen,569D.E.,Sorensen,I.,Sotooka,R.,Sugimoto,N.,Sugita,M.,Sumikawa,N.,Tanurdzic,M.,Theissen,G.,570Ulvskov,P.,Wakazuki,S.,Weng,J.K.,Willats,W.W.,Wipf,D.,Wolf,P.G.,Yang,L.,Zimmer,A.D.,571Zhu,Q.,Mitros,T.,Hellsten,U.,Loque,D.,Otillar,R.,Salamov,A.,Schmutz,J.,Shapiro,H.,572Lindquist,E.,etal.(2011).TheSelaginellagenomeidentifiesgeneticchangesassociatedwiththe573evolutionofvascularplants.Science332,960-963.10.1126/science.1203810574

Bellieny-Rabelo,D.,Oliveira,A.E.,andVenancio,T.M.(2013).Impactofwhole-genomeandtandem575duplicationsintheexpansionandfunctionaldiversificationoftheF-boxfamilyinlegumes576(Fabaceae).PLoSOne8,e55127.10.1371/journal.pone.0055127577

Bertioli,D.J.,Cannon,S.B.,Froenicke,L.,Huang,G.,Farmer,A.D.,Cannon,E.K.,Liu,X.,Gao,D.,578Clevenger,J.,Dash,S.,Ren,L.,Moretzsohn,M.C.,Shirasawa,K.,Huang,W.,Vidigal,B.,Abernathy,579B.,Chu,Y.,Niederhuth,C.E.,Umale,P.,Araujo,A.C.,Kozik,A.,Kim,K.D.,Burow,M.D.,Varshney,580R.K.,Wang,X.,Zhang,X.,Barkley,N.,Guimaraes,P.M.,Isobe,S.,Guo,B.,Liao,B.,Stalker,H.T.,581Schmitz,R.J.,Scheffler,B.E.,Leal-Bertioli,S.C.,Xun,X.,Jackson,S.A.,Michelmore,R.,andOzias-582Akins,P.(2016).ThegenomesequencesofArachisduranensisandArachisipaensis,thediploid583ancestorsofcultivatedpeanut.NatGenet48,438-446.10.1038/ng.3517584

Bie,T.D.,Cristianini,N.,Demuth,J.P.,andHahn,M.W.(2006).CAFE:acomputationaltoolforthe585studyofgenefamilyevolution.22,1269-1271.10.1093/bioinformatics/btl097586

Birchler,J.A.,andVeitia,R.A.(2007).TheGeneBalanceHypothesis:FromClassicalGeneticsto587ModernGenomics.THEPLANTCELLONLINE19,395-402.10.1105/tpc.106.049338588

Birchler,J.A.,andVeitia,R.A.(2011).Protein–proteinandprotein–DNAdosagebalanceand589differentialparalogtranscriptionfactorretentioninpolyploids.FrontiersinPlantGeneticsand590Genomics2,64.10.3389/fpls.2011.00064591

Blanc,G.,Barakat,A.,Guyot,R.,Cooke,R.,andDelseny,M.(2000).Extensiveduplicationand592reshufflingintheArabidopsisgenome.PlantCell12,1093-1101593

Blanc,G.,andWolfe,K.H.(2004).Functionaldivergenceofduplicatedgenesformedbypolyploidy594duringArabidopsisevolution.ThePlantcell16,1679-1691.10.1105/tpc.021410595

Boscari,A.,DelGiudice,J.,Ferrarini,A.,Venturini,L.,Zaffini,A.L.,Delledonne,M.,andPuppo,A.596(2013).ExpressiondynamicsoftheMedicagotruncatulatranscriptomeduringthesymbiotic597interactionwithSinorhizobiummeliloti:whichrolefornitricoxide?PlantPhysiol161,425-598439.10.1104/pp.112.208538599

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 18: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

18

Cannon,S.B.(2013).Themodellegumegenomes.MethodsMolBiol1069,1-14.10.1007/978-1-60062703-613-9_1601

Cannon,S.B.,Ilut,D.,Farmer,A.D.,Maki,S.L.,May,G.D.,Singer,S.R.,andDoyle,J.J.(2010).602Polyploidydidnotpredatetheevolutionofnodulationinalllegumes.PLoSOne5,603e11630.10.1371/journal.pone.0011630604

Cannon,S.B.,Mitra,A.,Baumgarten,A.,Young,N.D.,andMay,G.(2004).Therolesofsegmentaland605tandemgeneduplicationintheevolutionoflargegenefamiliesinArabidopsisthaliana.BMC606PlantBiol4,10.10.1186/1471-2229-4-10607

Cannon,S.B.,Sterck,L.,Rombauts,S.,Sato,S.,Cheung,F.,Gouzy,J.,Wang,X.,Mudge,J.,Vasdewani,608J.,Schiex,T.,Spannagl,M.,Monaghan,E.,Nicholson,C.,Humphray,S.J.,Schoof,H.,Mayer,K.F.,609Rogers,J.,Quetier,F.,Oldroyd,G.E.,Debelle,F.,Cook,D.R.,Retzel,E.F.,Roe,B.A.,Town,C.D.,610Tabata,S.,VanDePeer,Y.,andYoung,N.D.(2006).Legumegenomeevolutionviewedthrough611theMedicagotruncatulaandLotusjaponicusgenomes.ProcNatlAcadSciUSA103,14959-61214964.10.1073/pnas.0603228103613

Cardoso,D.,DeQueiroz,L.P.,Pennington,R.T.,DeLima,H.C.,Fonty,E.,Wojciechowski,M.F.,and614Lavin,M.(2012).Revisitingthephylogenyofpapilionoidlegumes:Newinsightsfrom615comprehensivelysampledearly-branchinglineages.AmericanJournalofBotany99,1991-6162013.10.3732/ajb.1200380617

Cerri,M.R.,Frances,L.,Laloum,T.,Auriac,M.C.,Niebel,A.,Oldroyd,G.E.,Barker,D.G.,Fournier,J.,618andDeCarvalho-Niebel,F.(2012).MedicagotruncatulaERNtranscriptionfactors:regulatory619interplaywithNSP1/NSP2GRASfactorsandexpressiondynamicsthroughoutrhizobialinfection.620PlantPhysiol160,2155-2172.10.1104/pp.112.203190621

Chen,X.,Li,H.,Pandey,M.K.,Yang,Q.,Wang,X.,Garg,V.,Chi,X.,Doddamani,D.,Hong,Y.,622Upadhyaya,H.,Guo,H.,Khan,A.W.,Zhu,F.,Zhang,X.,Pan,L.,Pierce,G.J.,Zhou,G.,623Krishnamohan,K.A.,Chen,M.,Zhong,N.,Agarwal,G.,Li,S.,Chitikineni,A.,Zhang,G.Q.,Sharma,624S.,Chen,N.,Liu,H.,Janila,P.,Wang,M.,Wang,T.,Sun,J.,Li,X.,Li,C.,Yu,L.,Wen,S.,Singh,S.,625Yang,Z.,Zhao,J.,Zhang,C.,Yu,Y.,Bi,J.,Liu,Z.J.,Paterson,A.H.,Wang,S.,Liang,X.,Varshney,626R.K.,andYu,S.(2016).DraftgenomeofthepeanutA-genomeprogenitor(Arachisduranensis)627providesinsightsintogeocarpy,oilbiosynthesis,andallergens.ProcNatlAcadSciUSA113,6286785-6790.10.1073/pnas.1600899113629

Cheng,F.,Wu,J.,Cai,X.,Liang,J.,Freeling,M.,andWang,X.(2018).Generetention,fractionation630andsubgenomedifferencesinpolyploidplants.NatPlants4,258-268.10.1038/s41477-018-0136-6317632

Cleary,A.,Farmer,A.,andHancock,J.(2017).GenomeContextViewer:visualexplorationofmultiple633annotatedgenomesusingmicrosynteny.Bioinformatics34,1562-6341564.10.1093/bioinformatics/btx757635

Davis,J.C.,andPetrov,D.A.(2004).Preferentialduplicationofconservedproteinsineukaryotic636genomes.PLoSBiol2,E55.10.1371/journal.pbio.0020055637

Dobin,A.,Davis,C.A.,Schlesinger,F.,Drenkow,J.,Zaleski,C.,Jha,S.,Batut,P.,Chaisson,M.,and638Gingeras,T.R.(2013).STAR:ultrafastuniversalRNA-seqaligner.Bioinformatics29,15-63921.10.1093/bioinformatics/bts635640

Doebley,J.,andLukens,L.(1998).Transcriptionalregulatorsandtheevolutionofplantform.The641Plantcell10,1075-1082.10.1105/tpc.10.7.1075642

Dong,Y.,Yang,X.,Liu,J.,Wang,B.H.,Liu,B.L.,andWang,Y.Z.(2014).Podshatteringresistance643associatedwithdomesticationismediatedbyaNACgeneinsoybean.NatCommun5,6443352.10.1038/ncomms4352645

Du,H.,Yang,S.-S.,Liang,Z.,Feng,B.-R.,Liu,L.,Huang,Y.-B.,andTang,Y.-X.(2012).Genome-wide646analysisoftheMYBtranscriptionfactorsuperfamilyinsoybean.BMCPlantBiol12,647106.10.1186/1471-2229-12-106648

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 19: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

19

Enright,A.J.,VanDongen,S.,andOuzounis,C.A.(2002).Anefficientalgorithmforlarge-scale649detectionofproteinfamilies.NucleicAcidsRes30,1575-1584650

Fang,C.,Ma,Y.,Wu,S.,Liu,Z.,Wang,Z.,Yang,R.,Hu,G.,Zhou,Z.,Yu,H.,Zhang,M.,Pan,Y.,Zhou,651G.,Ren,H.,Du,W.,Yan,H.,Wang,Y.,Han,D.,Shen,Y.,Liu,S.,Liu,T.,Zhang,J.,Qin,H.,Yuan,J.,652Yuan,X.,Kong,F.,Liu,B.,Li,J.,Zhang,Z.,Wang,G.,Zhu,B.,andTian,Z.(2017).Genome-wide653associationstudiesdissectthegeneticnetworksunderlyingagronomicaltraitsinsoybean.654GenomeBiology18.10.1186/s13059-017-1289-9655

Feil,R.,Yoshida,T.,andKawabe,A.(2013).ImportanceofGeneDuplicationintheEvolutionof656GenomicImprintingRevealedbyMolecularEvolutionaryAnalysisoftheTypeIMADS-BoxGene657FamilyinArabidopsisSpecies.PLoSOne8,e73588.10.1371/journal.pone.0073588658

Filiault,D.L.,Ballerini,E.S.,Mandakova,T.,Akoz,G.,Derieg,N.J.,Schmutz,J.,Jenkins,J.,Grimwood,659J.,Shu,S.,Hayes,R.D.,Hellsten,U.,Barry,K.,Yan,J.,Mihaltcheva,S.,Karafiatova,M.,Nizhynska,660V.,Kramer,E.M.,Lysak,M.A.,Hodges,S.A.,andNordborg,M.(2018).TheAquilegiagenome661providesinsightintoadaptiveradiationandrevealsanextraordinarilypolymorphicchromosome662withauniquehistory.Elife7.10.7554/eLife.36426663

Finn,R.D.,Coggill,P.,Eberhardt,R.Y.,Eddy,S.R.,Mistry,J.,Mitchell,A.L.,Potter,S.C.,Punta,M.,664Qureshi,M.,Sangrador-Vegas,A.,Salazar,G.A.,Tate,J.,andBateman,A.(2016).ThePfam665proteinfamiliesdatabase:towardsamoresustainablefuture.NucleicAcidsRes44,D279-666285.10.1093/nar/gkv1344667

Freeling,M.(2009).Biasinplantgenecontentfollowingdifferentsortsofduplication:tandem,668whole-genome,segmental,orbytransposition.AnnuRevPlantBiol60,433-669453.10.1146/annurev.arplant.043008.092122670

Freeling,M.,Scanlon,M.J.,andFowler,J.E.(2015).Fractionationandsubfunctionalizationfollowing671genomeduplications:mechanismsthatdrivegenecontentandtheirconsequences.CurrOpin672GenetDev35,110-118.10.1016/j.gde.2015.11.002673

Gossani,C.,Bellieny-Rabelo,D.,andVenancio,T.M.(2014).Evolutionaryanalysisofmultidrug674resistancegenesinfungi-impactofgeneduplicationandfamilyconservation.FEBSJ281,4967-6754977.10.1111/febs.13046676

Graham,P.H.,andVance,C.P.(2003).Legumes:importanceandconstraintstogreateruse.PLANT677PHYSIOLOGY131,872-877.10.1104/pp.017004678

Griesmann,M.,Chang,Y.,Liu,X.,Song,Y.,Haberer,G.,Crook,M.B.,Billault-Penneteau,B.,679Lauressergues,D.,Keller,J.,Imanishi,L.,Roswanjaya,Y.P.,Kohlen,W.,Pujic,P.,Battenberg,K.,680Alloisio,N.,Liang,Y.,Hilhorst,H.,Salgado,M.G.,Hocher,V.,Gherbi,H.,Svistoonoff,S.,Doyle,J.J.,681He,S.,Xu,Y.,Xu,S.,Qu,J.,Gao,Q.,Fang,X.,Fu,Y.,Normand,P.,Berry,A.M.,Wall,L.G.,Ane,J.M.,682Pawlowski,K.,Xu,X.,Yang,H.,Spannagl,M.,Mayer,K.F.X.,Wong,G.K.,Parniske,M.,Delaux,683P.M.,andCheng,S.(2018).Phylogenomicsrevealsmultiplelossesofnitrogen-fixingrootnodule684symbiosis.Science361.10.1126/science.aat1743685

Gupta,S.,Nawaz,K.,Parween,S.,Roy,R.,Sahu,K.,KumarPole,A.,Khandal,H.,Srivastava,R.,Kumar686Parida,S.,andChattopadhyay,D.(2017).DraftgenomesequenceofCicerreticulatumL.,thewild687progenitorofchickpeaprovidesaresourceforagronomictraitimprovement.DNARes24,1-68810.10.1093/dnares/dsw042689

Haas,B.J.,Delcher,A.L.,Wortman,J.R.,andSalzberg,S.L.(2004).DAGchainer:Atoolformining690segmentalgenomeduplicationsandsynteny.Bioinformatics20,3643-6913646.10.1093/bioinformatics/bth397692

Han,M.V.,Thomas,G.W.,Lugo-Martinez,J.,andHahn,M.W.(2013).Estimatinggenegainandloss693ratesinthepresenceoferroringenomeassemblyandannotationusingCAFE3.MolBiolEvol30,6941987-1997.10.1093/molbev/mst100695

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 20: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

20

Han,Y.,Zhao,X.,Liu,D.,Li,Y.,Lightfoot,D.A.,Yang,Z.,Zhao,L.,Zhou,G.,Wang,Z.,Huang,L.,Zhang,696Z.,Qiu,L.,Zheng,H.,andLi,W.(2016).Domesticationfootprintsanchorgenomicregionsof697agronomicimportanceinsoybeans.NewPhytol209,871-884.10.1111/nph.13626698

Hane,J.K.,Ming,Y.,Kamphuis,L.G.,Nelson,M.N.,Garg,G.,Atkins,C.A.,Bayer,P.E.,Bravo,A.,699Bringans,S.,Cannon,S.,Edwards,D.,Foley,R.,Gao,L.L.,Harrison,M.J.,Huang,W.,Hurgobin,B.,700Li,S.,Liu,C.W.,Mcgrath,A.,Morahan,G.,Murray,J.,Weller,J.,Jian,J.,andSingh,K.B.(2017).A701comprehensivedraftgenomesequenceforlupin(Lupinusangustifolius),anemerginghealth702food:insightsintoplant-microbeinteractionsandlegumeevolution.PlantBiotechnolJ15,318-703330.10.1111/pbi.12615704

He,M.,He,C.-Q.,andDing,N.-Z.(2018).AbioticStresses:GeneralDefensesofLandPlantsand705ChancesforEngineeringMultistressTolerance.FrontiersinPlantScience7069.10.3389/fpls.2018.01771707

Jaillon,O.,Aury,J.-M.,Noel,B.,Policriti,A.,Clepet,C.,Casagrande,A.,Choisne,N.,Aubourg,S.,708Vitulo,N.,Jubin,C.,Vezzi,A.,Legeai,F.,Hugueney,P.,Dasilva,C.,Horner,D.,Mica,E.,Jublot,D.,709Poulain,J.,Bruyère,C.,Billault,A.,Segurens,B.,Gouyvenoux,M.,Ugarte,E.,Cattonaro,F.,710Anthouard,V.,Vico,V.,DelFabbro,C.,Alaux,M.,DiGaspero,G.,Dumas,V.,Felice,N.,Paillard,S.,711Juman,I.,Moroldo,M.,Scalabrin,S.,Canaguier,A.,LeClainche,I.,Malacrida,G.,Durand,E.,712Pesole,G.,Laucou,V.,Chatelet,P.,Merdinoglu,D.,Delledonne,M.,Pezzotti,M.,Lecharny,A.,713Scarpelli,C.,Artiguenave,F.,Pè,M.E.,Valle,G.,Morgante,M.,Caboche,M.,Adam-Blondon,A.-F.,714Weissenbach,J.,Quétier,F.,Wincker,P.,andCharacterization,F.-I.P.C.F.G.G.(2007).The715grapevinegenomesequencesuggestsancestralhexaploidizationinmajorangiospermphyla.716Nature449,463-467.10.1038/nature06148717

Jain,M.,Misra,G.,Patel,R.K.,Priya,P.,Jhanwar,S.,Khan,A.W.,Shah,N.,Singh,V.K.,Garg,R.,Jeena,718G.,Yadav,M.,Kant,C.,Sharma,P.,Yadav,G.,Bhatia,S.,Tyagi,A.K.,andChattopadhyay,D.719(2013).Adraftgenomesequenceofthepulsecropchickpea(CicerarietinumL.).PlantJ74,715-720729.10.1111/tpj.12173721

Jin,J.,Tian,F.,Yang,D.C.,Meng,Y.Q.,Kong,L.,Luo,J.,andGao,G.(2017).PlantTFDB4.0:towarda722centralhubfortranscriptionfactorsandregulatoryinteractionsinplants.NucleicAcidsRes45,723D1040-D1045.10.1093/nar/gkw982724

Jordan,I.K.,Wolf,Y.I.,andKoonin,E.V.(2004).Duplicatedgenesevolveslowerthansingletons725despitetheinitialrateincrease.BMCEvolBiol4,22.10.1186/1471-2148-4-22726

Kang,Y.J.,Kim,S.K.,Kim,M.Y.,Lestari,P.,Kim,K.H.,Ha,B.K.,Jun,T.H.,Hwang,W.J.,Lee,T.,Lee,J.,727Shim,S.,Yoon,M.Y.,Jang,Y.E.,Han,K.S.,Taeprayoon,P.,Yoon,N.,Somta,P.,Tanya,P.,Kim,K.S.,728Gwag,J.G.,Moon,J.K.,Lee,Y.H.,Park,B.S.,Bombarely,A.,Doyle,J.J.,Jackson,S.A.,Schafleitner,729R.,Srinives,P.,Varshney,R.K.,andLee,S.H.(2014).Genomesequenceofmungbeanandinsights730intoevolutionwithinVignaspecies.NatCommun5,5443.10.1038/ncomms6443731

Kroc,M.,Koczyk,G.,Swiecicki,W.,Kilian,A.,andNelson,M.N.(2014).Newevidenceofancestral732polyploidyintheGenistoidlegumeLupinusangustifoliusL.(narrow-leafedlupin).TheorAppl733Genet127,1237-1249.10.1007/s00122-014-2294-y734

Kryuchkova-Mostacci,N.,andRobinson-Rechavi,M.(2016).Abenchmarkofgeneexpressiontissue-735specificitymetrics.BriefBioinform,bbw008.10.1093/bib/bbw008736

Kumar,S.,Stecher,G.,Suleski,M.,andHedges,S.B.(2017).TimeTree:AResourceforTimelines,737Timetrees,andDivergenceTimes.MolBiolEvol34,1812-1819.10.1093/molbev/msx116738

Kumpeangkeaw,A.,Tan,D.,Fu,L.,Han,B.,Sun,X.,Hu,X.,Ding,Z.,andZhang,J.(2019).Asymmetric739birthanddeathoftypeIandtypeIIMADS-boxgenesubfamiliesintherubbertreefacilitating740laticiferdevelopment.PLoSOne14,e0214335.10.1371/journal.pone.0214335741

Lata,C.,Yadav,A.,andPras,M.(2011)."RoleofPlantTranscriptionFactorsinAbioticStress742Tolerance,"inAbioticStressResponseinPlants-Physiological,BiochemicalandGenetic743Perspectives,eds.A.Shanker&B.Venkateswarlu.InTech),269-296.744

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 21: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

21

Lehti-Shiu,M.D.,Panchy,N.,Wang,P.,Uygun,S.,andShiu,S.H.(2017).Diversity,expansion,and745evolutionarynoveltyofplantDNA-bindingtranscriptionfactorfamilies.BiochimBiophysActa746GeneRegulMech1860,3-20.10.1016/j.bbagrm.2016.08.005747

Lespinet,O.,Wolf,Y.I.,Koonin,E.V.,andAravind,L.(2002).Theroleoflineage-specificgenefamily748expansionintheevolutionofeukaryotes.GenomeRes12,1048-1059.10.1101/gr.174302749

Lewis,G.P.(2005).LegumesoftheWorld.RoyalBotanicGardens,Kew.750Li,Z.,Defoort,J.,Tasdighian,S.,Maere,S.,VanDePeer,Y.,andDeSmet,R.(2016).Gene751

DuplicabilityofCoreGenesIsHighlyConsistentacrossAllAngiosperms.PlantCell28,326-752344.10.1105/tpc.15.00877753

Libault,M.,Farmer,A.,Joshi,T.,Takahashi,K.,Langley,R.J.,Franklin,L.D.,He,J.,Xu,D.,May,G.,and754Stacey,G.(2010).AnintegratedtranscriptomeatlasofthecropmodelGlycinemax,anditsusein755comparativeanalysesinplants.PlantJ63,86-99.10.1111/j.1365-313X.2010.04222.x756

Liu,J.,Jung,C.,Xu,J.,Wang,H.,Deng,S.,Bernad,L.,Arenas-Huertero,C.,andChua,N.H.(2012).757Genome-wideanalysisuncoversregulationoflongintergenicnoncodingRNAsinArabidopsis.758PlantCell24,4333-4345.10.1105/tpc.112.102855759

Lu,Q.,Li,H.,Hong,Y.,Zhang,G.,Wen,S.,Li,X.,Zhou,G.,Li,S.,Liu,H.,Liu,H.,Liu,Z.,Varshney,R.K.,760Chen,X.,andLiang,X.(2018).GenomeSequencingandAnalysisofthePeanutB-Genome761Progenitor(Arachisipaensis).FrontiersinPlantScience9.10.3389/fpls.2018.00604762

Lynch,M.,andConery,J.S.(2000).Theevolutionaryfateandconsequencesofduplicategenes.763Science290,1151-1155764

Mochida,K.,Sakurai,T.,Seki,H.,Yoshida,T.,Takahagi,K.,Sawai,S.,Uchiyama,H.,Muranaka,T.,and765Saito,K.(2017).DraftgenomeassemblyandannotationofGlycyrrhizauralensis,amedicinal766legume.PlantJ89,181-194.10.1111/tpj.13385767

Nagata,T.,Hosaka-Sasaki,A.,andKikuchi,S.(2016).TheEvolutionaryDiversificationofGenesthat768EncodeTranscriptionFactorProteinsinPlants.73-97.10.1016/b978-0-12-800854-6.00005-1769

Nakano,T.(2006).Genome-WideAnalysisoftheERFGeneFamilyinArabidopsisandRice.PLANT770PHYSIOLOGY140,411-432.10.1104/pp.105.073783771

Nam,J.,Kim,J.,Lee,S.,An,G.,Ma,H.,andNei,M.(2004).TypeIMADS-boxgeneshaveexperienced772fasterbirth-and-deathevolutionthantypeIIMADS-boxgenesinangiosperms.Proceedingsofthe773NationalAcademyofSciences101,1910-1915.10.1073/pnas.0308430100774

O’rourke,J.A.,Iniguez,L.P.,Fu,F.,Bucciarelli,B.,Miller,S.S.,Jackson,S.A.,Mcclean,P.E.,Li,J.,Dai,X.,775Zhao,P.X.,Hernandez,G.,andVance,C.P.(2014).AnRNA-Seqbasedgeneexpressionatlasofthe776commonbean.BMCgenomics15,866.10.1186/1471-2164-15-866777

Panchy,N.,Lehti-Shiu,M.,andShiu,S.H.(2016).EvolutionofGeneDuplicationinPlants.Plant778Physiol171,2294-2316.10.1104/pp.16.00523779

Parween,S.,Nawaz,K.,Roy,R.,Pole,A.K.,VenkataSuresh,B.,Misra,G.,Jain,M.,Yadav,G.,Parida,780S.K.,Tyagi,A.K.,Bhatia,S.,andChattopadhyay,D.(2015).Anadvanceddraftgenomeassemblyof781adesitypechickpea(CicerarietinumL.).SciRep5,12806.10.1038/srep12806782

Pertea,M.,Pertea,G.M.,Antonescu,C.M.,Chang,T.C.,Mendell,J.T.,andSalzberg,S.L.(2015).783StringTieenablesimprovedreconstructionofatranscriptomefromRNA-seqreads.Nat784Biotechnol33,290-295.10.1038/nbt.3122785

Pickett,F.B.,andMeeks-Wagner,D.R.(1995).Seeingdouble:appreciatinggeneticredundancy.Plant786Cell7,1347-1356.10.1105/tpc.7.9.1347787

Pires,N.,andDolan,L.(2010).EarlyevolutionofbHLHproteinsinplants.PlantSignalBehav5,911-788912.10.4161/psb.5.7.12100789

Portereiko,M.F.,Lloyd,A.,Steffen,J.G.,Punwani,J.A.,Otsuga,D.,andDrews,G.N.(2006).AGL80is790requiredforcentralcellandendospermdevelopmentinArabidopsis.PlantCell18,1862-7911872.10.1105/tpc.106.040824792

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 22: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

22

Proulx,S.R.,Wang,Y.,Wang,X.,Tang,H.,Tan,X.,Ficklin,S.P.,Feltus,F.A.,andPaterson,A.H.(2011).793ModesofGeneDuplicationContributeDifferentlytoGeneticNoveltyandRedundancy,butShow794ParallelsacrossDivergentAngiosperms.PLoSOne6,e28150.10.1371/journal.pone.0028150795

Qiao,X.,Yin,H.,Li,L.,Wang,R.,Wu,J.,andZhang,S.(2018).DifferentModesofGeneDuplication796ShowDivergentEvolutionaryPatternsandContributeDifferentlytotheExpansionofGene797FamiliesInvolvedinImportantFruitTraitsinPear(Pyrusbretschneideri).FrontPlantSci9,798161.10.3389/fpls.2018.00161799

Quinlan,A.R.,andHall,I.M.(2010).BEDTools:aflexiblesuiteofutilitiesforcomparinggenomic800features.Bioinformatics26,841-842.10.1093/bioinformatics/btq033801

Salman-Minkov,A.,Sabath,N.,andMayrose,I.(2016).Whole-genomeduplicationasakeyfactorin802cropdomestication.NatPlants2.10.1038/nplants.2016.115803

Sato,S.,Nakamura,Y.,Kaneko,T.,Asamizu,E.,Kato,T.,Nakao,M.,Sasamoto,S.,Watanabe,A.,Ono,804A.,Kawashima,K.,Fujishiro,T.,Katoh,M.,Kohara,M.,Kishida,Y.,Minami,C.,Nakayama,S.,805Nakazaki,N.,Shimizu,Y.,Shinpo,S.,Takahashi,C.,Wada,T.,Yamada,M.,Ohmido,N.,Hayashi,806M.,Fukui,K.,Baba,T.,Nakamichi,T.,Mori,H.,andTabata,S.(2008).Genomestructureofthe807legume,Lotusjaponicus.DNARes15,227-239.10.1093/dnares/dsn008808

Schliep,K.P.(2011).phangorn:phylogeneticanalysisinR.Bioinformatics27,592-809593.10.1093/bioinformatics/btq706810

Schmutz,J.,Cannon,S.B.,Schlueter,J.,Ma,J.,Mitros,T.,Nelson,W.,Hyten,D.L.,Song,Q.,Thelen,811J.J.,Cheng,J.,Xu,D.,Hellsten,U.,May,G.D.,Yu,Y.,Sakurai,T.,Umezawa,T.,Bhattacharyya,M.K.,812Sandhu,D.,Valliyodan,B.,Lindquist,E.,Peto,M.,Grant,D.,Shu,S.,Goodstein,D.,Barry,K.,813Futrell-Griggs,M.,Abernathy,B.,Du,J.,Tian,Z.,Zhu,L.,Gill,N.,Joshi,T.,Libault,M.,Sethuraman,814A.,Zhang,X.C.,Shinozaki,K.,Nguyen,H.T.,Wing,R.A.,Cregan,P.,Specht,J.,Grimwood,J.,815Rokhsar,D.,Stacey,G.,Shoemaker,R.C.,andJackson,S.A.(2010).Genomesequenceofthe816palaeopolyploidsoybean.Nature463,178-183.10.1038/nature08670817

Schmutz,J.,Mcclean,P.E.,Mamidi,S.,Wu,G.A.,Cannon,S.B.,Grimwood,J.,Jenkins,J.,Shu,S.,Song,818Q.,Chavarro,C.,Torres-Torres,M.,Geffroy,V.,Moghaddam,S.M.,Gao,D.,Abernathy,B.,Barry,819K.,Blair,M.,Brick,M.A.,Chovatia,M.,Gepts,P.,Goodstein,D.M.,Gonzales,M.,Hellsten,U.,820Hyten,D.L.,Jia,G.,Kelly,J.D.,Kudrna,D.,Lee,R.,Richard,M.M.,Miklas,P.N.,Osorno,J.M.,821Rodrigues,J.,Thareau,V.,Urrea,C.A.,Wang,M.,Yu,Y.,Zhang,M.,Wing,R.A.,Cregan,P.B.,822Rokhsar,D.S.,andJackson,S.A.(2014).Areferencegenomeforcommonbeanandgenome-wide823analysisofdualdomestications.NatGenet46,707-713.10.1038/ng.3008824

Sedivy,E.J.,Wu,F.,andHanzawa,Y.(2017).Soybeandomestication:theorigin,geneticarchitecture825andmolecularbases.NewPhytologist214,539-553.10.1111/nph.14418826

Severin,A.J.,Cannon,S.B.,Graham,M.M.,Grant,D.,andShoemaker,R.C.(2011).Changesintwelve827homoeologousgenomicregionsinsoybeanfollowingthreeroundsofpolyploidy.PlantCell23,8283129-3136.10.1105/tpc.111.089573829

Severin,A.J.,Woody,J.L.,Bolon,Y.T.,Joseph,B.,Diers,B.W.,Farmer,A.D.,Muehlbauer,G.J.,Nelson,830R.T.,Grant,D.,Specht,J.E.,Graham,M.A.,Cannon,S.B.,May,G.D.,Vance,C.P.,andShoemaker,831R.C.(2010).RNA-SeqAtlasofGlycinemax:aguidetothesoybeantranscriptome.BMCPlantBiol83210,160.10.1186/1471-2229-10-160833

Shiu,S.-H.(2005).TranscriptionFactorFamiliesHaveMuchHigherExpansionRatesinPlantsthanin834Animals.PLANTPHYSIOLOGY139,18-26.10.1104/pp.105.065110835

Soltis,P.S.,Marchant,D.B.,VanDePeer,Y.,andSoltis,D.E.(2015).Polyploidyandgenomeevolution836inplants.CurrOpinGenetDev35,119-125.10.1016/j.gde.2015.11.003837

Stajich,J.E.,Block,D.,Boulez,K.,Brenner,S.E.,Chervitz,S.A.,Dagdigian,C.,Fuellen,G.,Gilbert,J.G.,838Korf,I.,Lapp,H.,Lehvaslaiho,H.,Matsalla,C.,Mungall,C.J.,Osborne,B.I.,Pocock,M.R.,839Schattner,P.,Senger,M.,Stein,L.D.,Stupka,E.,Wilkinson,M.D.,andBirney,E.(2002).The840Bioperltoolkit:Perlmodulesforthelifesciences.GenomeRes12,1611-1618.10.1101/gr.361602841

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 23: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

23

Stamatakis,A.(2014).RAxMLversion8:atoolforphylogeneticanalysisandpost-analysisoflarge842phylogenies.Bioinformatics30,1312-1313.10.1093/bioinformatics/btu033843

Suyama,M.,Torrents,D.,andBork,P.(2006).PAL2NAL:robustconversionofproteinsequence844alignmentsintothecorrespondingcodonalignments.NucleicAcidsRes34,W609-845612.10.1093/nar/gkl315846

Tamura,K.,Tao,Q.,Kumar,S.,andRusso,C.(2018).TheoreticalFoundationoftheRelTimeMethod847forEstimatingDivergenceTimesfromVariableEvolutionaryRates.MolBiolEvol35,1770-8481782.10.1093/molbev/msy044849

Tang,H.,Krishnakumar,V.,Bidwell,S.,Rosen,B.,Chan,A.,Zhou,S.,Gentzbittel,L.,Childs,K.L.,850Yandell,M.,Gundlach,H.,Mayer,K.F.,Schwartz,D.C.,andTown,C.D.(2014).Animproved851genomerelease(versionMt4.0)forthemodellegumeMedicagotruncatula.BMCgenomics15,852312.10.1186/1471-2164-15-312853

Tripathi,P.,Galla,A.,Rabara,R.C.,andRushton,P.J.(2016)."TranscriptionFactorsthatRegulate854DefenceResponsesandTheirUseinIncreasingDiseaseResistance,"inPlantPathogenResistance855Biotechnology,ed.D.B.Collinge.),109-129.856

Varshney,R.K.,Chen,W.,Li,Y.,Bharti,A.K.,Saxena,R.K.,Schlueter,J.A.,Donoghue,M.T.,Azam,S.,857Fan,G.,Whaley,A.M.,Farmer,A.D.,Sheridan,J.,Iwata,A.,Tuteja,R.,Penmetsa,R.V.,Wu,W.,858Upadhyaya,H.D.,Yang,S.P.,Shah,T.,Saxena,K.B.,Michael,T.,Mccombie,W.R.,Yang,B.,Zhang,859G.,Yang,H.,Wang,J.,Spillane,C.,Cook,D.R.,May,G.D.,Xu,X.,andJackson,S.A.(2011).Draft860genomesequenceofpigeonpea(Cajanuscajan),anorphanlegumecropofresource-poor861farmers.NatBiotechnol30,83-89.10.1038/nbt.2022862

Varshney,R.K.,Song,C.,Saxena,R.K.,Azam,S.,Yu,S.,Sharpe,A.G.,Cannon,S.,Baek,J.,Rosen,B.D.,863Tar'an,B.,Millan,T.,Zhang,X.,Ramsay,L.D.,Iwata,A.,Wang,Y.,Nelson,W.,Farmer,A.D.,Gaur,864P.M.,Soderlund,C.,Penmetsa,R.V.,Xu,C.,Bharti,A.K.,He,W.,Winter,P.,Zhao,S.,Hane,J.K.,865Carrasquilla-Garcia,N.,Condie,J.A.,Upadhyaya,H.D.,Luo,M.C.,Thudi,M.,Gowda,C.L.,Singh,866N.P.,Lichtenzveig,J.,Gali,K.K.,Rubio,J.,Nadarajan,N.,Dolezel,J.,Bansal,K.C.,Xu,X.,Edwards,867D.,Zhang,G.,Kahl,G.,Gil,J.,Singh,K.B.,Datta,S.K.,Jackson,S.A.,Wang,J.,andCook,D.R.868(2013).Draftgenomesequenceofchickpea(Cicerarietinum)providesaresourcefortrait869improvement.NatBiotechnol31,240-246.10.1038/nbt.2491870

Vernie,T.,Moreau,S.,DeBilly,F.,Plet,J.,Combier,J.P.,Rogers,C.,Oldroyd,G.,Frugier,F.,Niebel,A.,871andGamas,P.(2008).EFDIsanERFtranscriptionfactorinvolvedinthecontrolofnodulenumber872anddifferentiationinMedicagotruncatula.PlantCell20,2696-2713.10.1105/tpc.108.059857873

Vidal,N.M.,Grazziotin,A.L.,Iyer,L.M.,Aravind,L.,andVenancio,T.M.(2016).Transcriptionfactors,874chromatinproteinsandthediversificationofHemiptera.InsectBiochemMolBiol69,1-87513.10.1016/j.ibmb.2015.07.001876

Vision,T.J.,Brown,D.G.,andTanksley,S.D.(2000).Theoriginsofgenomicduplicationsin877Arabidopsis.Science290,2114-2117878

Wang,J.,Sun,P.,Li,Y.,Liu,Y.,Yu,J.,Ma,X.,Sun,S.,Yang,N.,Xia,R.,Lei,T.,Liu,X.,Jiao,B.,Xing,Y.,879Ge,W.,Wang,L.,Wang,Z.,Song,X.,Yuan,M.,Guo,D.,Zhang,L.,Zhang,J.,Jin,D.,Chen,W.,Pan,880Y.,Liu,T.,Jin,L.,Sun,J.,Cheng,R.,Duan,X.,Shen,S.,Qin,J.,Zhang,M.C.,Paterson,A.H.,and881Wang,X.(2017).HierarchicallyAligning10LegumeGenomesEstablishesaFamily-LevelGenomics882Platform.PlantPhysiol174,284-300.10.1104/pp.16.01981883

Wright,E.S.(2015).DECIPHER:harnessinglocalsequencecontexttoimproveproteinmultiple884sequencealignment.BMCBioinformatics16,322.10.1186/s12859-015-0749-z885

Xie,M.,Chung,C.Y.-L.,Li,M.-W.,Wong,F.-L.,Wang,X.,Liu,A.,Wang,Z.,Leung,A.K.-Y.,Wong,T.-H.,886Tong,S.-W.,Xiao,Z.,Fan,K.,Ng,M.-S.,Qi,X.,Yang,L.,Deng,T.,He,L.,Chen,L.,Fu,A.,Ding,Q.,887He,J.,Chung,G.,Isobe,S.,Tanabata,T.,Valliyodan,B.,Nguyen,H.T.,Cannon,S.B.,Foyer,C.H.,888Chan,T.-F.,andLam,H.-M.(2019).Areference-gradewildsoybeangenome.NatCommun88910.10.1038/s41467-019-09142-9890

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 24: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

24

Yang,K.,Tian,Z.,Chen,C.,Luo,L.,Zhao,B.,Wang,Z.,Yu,L.,Li,Y.,Sun,Y.,Li,W.,Chen,Y.,Zhang,Y.,891Ai,D.,Zhao,J.,Shang,C.,Ma,Y.,Wu,B.,Wang,M.,Gao,L.,Sun,D.,Zhang,P.,Guo,F.,Wang,W.,892Wang,J.,Varshney,R.K.,Ling,H.Q.,andWan,P.(2015).Genomesequencingofadzukibean893(Vignaangularis)providesinsightintohighstarchandlowfataccumulationanddomestication.894ProcNatlAcadSciUSA112,13213-13218.10.1073/pnas.1420949112895

Young,N.D.,Debelle,F.,Oldroyd,G.E.,Geurts,R.,Cannon,S.B.,Udvardi,M.K.,Benedito,V.A.,Mayer,896K.F.,Gouzy,J.,Schoof,H.,VanDePeer,Y.,Proost,S.,Cook,D.R.,Meyers,B.C.,Spannagl,M.,897Cheung,F.,DeMita,S.,Krishnakumar,V.,Gundlach,H.,Zhou,S.,Mudge,J.,Bharti,A.K.,Murray,898J.D.,Naoumkina,M.A.,Rosen,B.,Silverstein,K.A.,Tang,H.,Rombauts,S.,Zhao,P.X.,Zhou,P.,899Barbe,V.,Bardou,P.,Bechner,M.,Bellec,A.,Berger,A.,Berges,H.,Bidwell,S.,Bisseling,T.,900Choisne,N.,Couloux,A.,Denny,R.,Deshpande,S.,Dai,X.,Doyle,J.J.,Dudez,A.M.,Farmer,A.D.,901Fouteau,S.,Franken,C.,Gibelin,C.,Gish,J.,Goldstein,S.,Gonzalez,A.J.,Green,P.J.,Hallab,A.,902Hartog,M.,Hua,A.,Humphray,S.J.,Jeong,D.H.,Jing,Y.,Jocker,A.,Kenton,S.M.,Kim,D.J.,Klee,903K.,Lai,H.,Lang,C.,Lin,S.,Macmil,S.L.,Magdelenat,G.,Matthews,L.,Mccorrison,J.,Monaghan,904E.L.,Mun,J.H.,Najar,F.Z.,Nicholson,C.,Noirot,C.,O'bleness,M.,Paule,C.R.,Poulain,J.,Prion,905F.,Qin,B.,Qu,C.,Retzel,E.F.,Riddle,C.,Sallet,E.,Samain,S.,Samson,N.,Sanders,I.,Saurat,O.,906Scarpelli,C.,Schiex,T.,Segurens,B.,Severin,A.J.,Sherrier,D.J.,Shi,R.,Sims,S.,Singer,S.R.,907Sinharoy,S.,Sterck,L.,Viollet,A.,Wang,B.B.,etal.(2011).TheMedicagogenomeprovides908insightintotheevolutionofrhizobialsymbioses.Nature480,520-524.10.1038/nature10625909

Zhang,H.,Jin,J.,Tang,L.,Zhao,Y.,Gu,X.,Gao,G.,andLuo,J.(2011).PlantTFDB2.0:updateand910improvementofthecomprehensiveplanttranscriptionfactordatabase.NucleicAcidsRes39,911D1114-D1117.10.1093/nar/gkq1141912

Zhou,Z.,Jiang,Y.,Wang,Z.,Gou,Z.,Lyu,J.,Li,W.,Yu,Y.,Shu,L.,Zhao,Y.,Ma,Y.,Fang,C.,Shen,Y.,913Liu,T.,Li,C.,Li,Q.,Wu,M.,Wang,M.,Wu,Y.,Dong,Y.,Wan,W.,Wang,X.,Ding,Z.,Gao,Y.,914Xiang,H.,Zhu,B.,Lee,S.H.,Wang,W.,andTian,Z.(2015).Resequencing302wildandcultivated915accessionsidentifiesgenesrelatedtodomesticationandimprovementinsoybean.NatBiotechnol91633,408-414.10.1038/nbt.3096917

918

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 25: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

25

Tables919

Table1:Plantspeciesusedinthisstudyandtheircorrespondinggenomeassemblyversions.Non-legume920speciesaremarkedwithasterisks.921922

923

924925926927928929930931932933934935936

Scientificname Genomeassemblyversion Genes Chromosomenumber

Reference

Cajanuscajan GCA000340665.1 48,331 11 (Varshneyetal.,2011)

Phaseolusvulgaris Pvulgaris218v1.0 27,197 11 (Schmutzetal.,2014)

Vignaradiata Vradiataver6 35,143 11 (Kangetal.,2014)

Vignaangularis Vigan1.1 34,172 11 (Yangetal.,2015)

Glycinemax Gmax275Wm82.a2.v1 56,044 20 (Schmutzetal.,2010)

Glycinesoja W05v1.0 55,539 20 (Xieetal.,2019)

Cicerreticulatum WCGAPv1.0 25,680 8 (Guptaetal.,2017)

Cicerarietinum ASM33114v1 33,107 8 (Jainetal.,2013;Varshneyetal.,2013;Parweenetal.,2015)

Medicagotruncatula Mtruncatula285Mt4.0v1 50,894 8 (Youngetal.,2011;Tangetal.,2014)

Glycyrrhizauralensis Gur.draft-genome.20151208 34,445 8 (Mochidaetal.,2017)

Lotusjaponicus Genomeassemblybuild3.0 39,734 6 (Satoetal.,2008)

Lupinusangustifolius v1.0 33,076 20 (Haneetal.,2017)

Arachisipaensis Araip1.0 46,410 10 (Bertiolietal.,2016)

Arachisduranensis Aradu1.0 42,562 10 (Bertiolietal.,2016;Chenetal.,2016)

Chamaecristafasciculata version.1 21,781 8 (Griesmannetal.,2018)

Arabidopsisthaliana* Athaliana167TAIR10 27,416 5 (Arabidopsis-Genome-Initiative,2000)

Vitisvinifera* Vvinifera145Genoscope.12X 26,346 19 (Jaillonetal.,2007)

Amborellatrichopoda* AmTrv1.1 26,846 13 (Albertetal.,2013)

Aquilegiacoerulea* Aquilegiacoeruleav3.1 30,023 7 (Filiaultetal.,2018)

Selaginellamoellendorffii* Selaginellamoellendorffiiv1.0 22,285 10 (Banksetal.,2011)

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 26: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

26

Table2:Numberoftranscriptionfactorsidentifiedineachfamilyacrossspecies.937

Supe

rfam

ily

TFfa

mily

Ca.cajan

Ph.vulga

ris

Vi.rad

iata

Vi.a

ngularis

Gl.m

ax

Gl.soja

Ci.reticu

latum

Ci.a

rietin

um

Me.trun

catula

Gl.u

ralensis

Lo.jap

onicu

s

Lu.a

ngustifolius

Ar.ipa

ensis

Ar.d

uran

ensis

Ch.fascic

ulata

Ar.tha

liana

Vi.vinife

ra

Aq.coe

rulea

Am.tric

hopo

da

Se.m

oellend

orffii

AP2/ERFAP2 26 31 24 26 49 52 24 25 31 24 28 38 30 24 22 18 20 16 12 16ERF 147 149 156 161 290 279 115 129 186 112 107 189 143 130 131 122 72 78 64 35RAV 2 3 3 3 5 5 2 3 3 3 3 6 3 3 4 6 3 4 1 2

BBR-BPC 5 4 4 6 4 5 2 3 2 4 5 10 5 6 4 7 5 4 4 1 BES1 6 6 6 8 10 8 7 6 7 6 5 12 9 8 7 8 6 4 5 5

B3 ARF 23 27 28 27 42 45 25 25 38 11 25 43 31 30 22 22 18 13 13 7B3 49 48 36 33 77 79 23 35 132 49 61 52 69 68 29 73 29 104 18 19

C3H 44 44 42 43 73 80 45 50 58 40 54 65 48 45 37 50 47 40 32 29

C2C2Zn-finger

CO-like 10 13 12 10 22 22 10 10 11 11 8 20 11 11 10 17 6 8 5 4Dof 37 42 42 40 74 73 34 37 40 41 30 67 39 38 39 36 22 29 18 11GATA 33 32 29 27 61 63 26 27 42 30 20 45 25 25 26 30 20 29 21 6LSD 7 4 4 4 5 4 4 4 4 2 5 7 4 4 3 3 3 3 2 2YABBY 9 8 10 9 12 13 7 8 7 8 7 14 7 8 9 6 7 5 6 0C2H2 223 134 128 128 219 218 78 108 111 108 93 183 136 127 157 104 64 87 85 35

CAMTA 10 8 11 6 14 15 7 7 8 8 7 9 8 7 6 6 5 5 3 5 CPP 8 9 10 7 16 15 7 8 12 9 10 13 14 13 9 10 8 7 6 5 DBB 11 12 14 13 16 16 6 7 8 8 8 13 7 8 10 8 8 5 4 4 E2F/DP 7 7 7 9 12 13 6 6 6 10 7 12 10 9 6 8 7 7 5 4

MADS M-type 48 44 23 31 77 80 34 43 101 29 34 38 28 23 32 65 17 50 19 12MIKC 23 34 47 27 75 71 16 51 38 14 21 27 44 48 28 42 35 24 15 3

EIL 6 7 4 5 10 12 6 7 13 9 7 9 7 6 17 6 2 2 2 6 FAR1 49 25 67 20 68 79 17 37 76 79 29 10 298 198 41 17 19 92 10 0 LFY 1 1 1 1 3 2 1 1 1 2 0 3 2 2 1 1 1 1 1 1

GARP ARR-B 15 15 16 18 31 34 13 14 28 14 10 19 12 11 9 15 12 13 7 6G2-like 44 50 52 46 96 100 36 41 44 49 39 78 49 46 47 42 39 34 27 20

GRAS 57 55 58 58 108 111 47 46 66 53 62 54 49 48 52 34 43 36 44 47 GRF 10 10 9 8 20 20 8 8 8 10 9 15 11 11 10 9 8 7 6 4 GeBP 7 5 8 5 9 11 7 8 7 5 4 12 4 7 4 23 1 13 6 1

Homeobox

HB-PHD 2 3 3 3 6 6 2 2 2 3 3 2 4 3 3 2 2 2 2 1HD-ZIP 51 55 54 54 89 92 45 49 57 54 40 82 45 43 45 48 33 30 22 9TALE 34 32 31 28 57 61 22 24 23 32 25 45 28 30 25 21 22 17 12 7WOX 18 18 20 61 32 33 14 18 19 17 14 31 15 15 13 16 10 10 9 8Other 6 7 8 7 15 13 8 8 7 8 9 12 9 7 6 6 7 6 5 4

HRT-like 1 1 1 1 1 1 0 2 3 1 1 1 1 1 1 2 1 1 1 2 HSF 27 29 32 32 46 45 20 22 28 32 10 33 22 22 32 24 18 15 12 6 LBD 52 48 47 49 75 75 35 47 59 53 41 70 52 49 52 41 43 29 23 13

MYB MYB 173 170 172 169 294 288 80 132 162 146 100 210 136 135 152 146 138 97 61 21MYBrelated

84 82 73 75 162 165 53 62 93 95 62 107 76 70 62 72 57 55 36 34 NAC 91 90 92 90 142 139 61 77 97 72 81 117 85 84 88 112 70 76 45 21 NF-X1 1 2 2 2 3 3 3 3 3 3 3 2 2 2 2 2 3 3 2 2

NF-YNF-YA 11 9 9 9 16 18 7 8 8 9 8 13 12 11 7 10 7 5 5 1NF-YB 23 19 23 18 36 36 19 22 24 23 18 25 16 14 20 13 17 14 9 7NF-YC 14 15 14 14 22 22 11 11 15 14 9 16 13 12 10 14 8 10 8 5

NZZ/SPL 1 3 2 3 2 2 2 2 1 1 2 3 4 3 2 1 2 2 1 3 Nin-like 13 12 11 12 27 26 10 9 13 12 9 15 16 12 11 14 8 9 7 7 S1Fa-like 2 4 3 3 4 4 3 4 5 3 3 6 2 2 5 3 3 1 1 0 SAP 3 5 3 1 5 5 2 2 3 3 4 5 5 5 1 4 3 3 2 1 SBP 24 23 22 23 42 40 17 19 24 25 16 35 19 18 19 16 18 13 12 9 SRS 11 10 11 10 22 22 8 8 11 10 11 12 11 10 10 11 6 4 6 4 STAT 1 1 1 1 1 1 1 1 1 1 1 1 2 1 0 2 1 1 1 2 TCP 25 27 27 27 55 56 19 24 21 24 24 42 30 23 23 24 15 16 14 4 Trihelix 38 41 41 42 70 74 28 38 36 36 32 64 40 43 39 30 26 33 30 34 VOZ 2 3 3 3 4 5 2 2 2 3 2 4 2 2 2 2 2 2 1 0 WRKY 96 90 95 94 171 170 59 81 104 79 72 112 85 84 71 73 59 38 31 12 Whirly 3 3 3 5 7 7 3 3 3 2 3 4 2 3 3 3 2 3 2 1 ZF-HD 18 19 18 18 41 45 14 15 17 15 20 26 13 15 31 17 10 16 9 7 bHLH 156 164 161 159 321 320 110 141 162 141 129 207 150 145 137 142 106 99 72 45 bZIP 72 79 85 85 141 146 61 70 91 73 64 128 73 74 65 77 50 46 41 28

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 27: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

27

Table3:Prevalenceofdifferentmodesofduplicationamongtranscriptionfactors.938

Species Total

TFs

Singletons(%) Duplicates

(%)

SD TD PD rTE dTE DDCa.cajan 1970 314(15.94) 1656(84.06) 548 90 21 40 397 560Ph.vulgaris 1891 257(13.59) 1634(86.41) 914 99 8 16 244 353Vi.radiata 1918 253(13.19) 1665(86.81) 743 115 9 24 319 455Vi.angularis 1877 307(16.36) 1570(83.64) 842 74 9 14 220 411Gl.max 3407 188(5.52) 3219(94.48) 264

5

79 14 28 242 211Gl.soja 3445 224(6.5) 3221(93.5) 264

8

85 13 24 231 220Ci.reticulatum 1332 365(27.4) 967(72.6) 350 45 3 24 207 338Ci.arietinum 1660 339(20.42) 1321(79.58) 521 101 7 17 267 408Me.truncatula 2182 430(19.71) 1752(80.29) 648 209 32 27 235 601Gl.uralensis 1738 403(23.19) 1335(76.81) 482 36 3 19 281 514Lo.japonicus 1514 428(28.27) 1086(71.73) 195 39 6 25 345 476Lu.angustifolius 2493 223(8.95) 2270(91.05) 165

4

51 0 0 0 565Ar.ipaensis 2073 365(17.61) 1708(82.39) 336 113 21 32 434 772Ar.duranensis 1902 363(19.09) 1539(80.91) 410 95 5 27 373 629Ch.fasciculata 1709 426(24.93) 1283(75.07) 186 36 2 36 383 640Ar.thaliana 1736 392(22.58) 1344(77.42) 707 81 13 15 152 376Vi.vinifera 1274 478(37.52) 796(62.48) 303 82 13 5 193 200Aq.coerulea 1376 552(40.12) 824(59.88) 130 86 14 0 0 594Am.trichopoda 923 560(60.67) 363(39.33) 14 33 4 0 0 312Se.moellendorffii 588 278(47.28) 310(52.72) 79 10 4 0 0 217

Abbreviations:Segmentalduplicates(SD);Tandemduplicates(TD);Proximalduplicates(PD);Retrotransposonmediatedduplicates939(rTE);Transposonmediatedduplicates(dTE);Dispersedduplicates(DD).940 941

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 28: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

28

Table4:Percentageofsingletonswithincollinearregionsinareferencespecies.942

Species Referenceoutgroupspecies Singletons Singletonsin

collinearregions SDSDincollinearregionsinthe

referenceoutgroup

Ca.cajan Vi.vinifera 314 132(42%) 548 430(78%)Ph.vulgaris Vi.vinifera 257 142(55%) 914 750(82%)

Vi.radiata Vi.vinifera 253 120(47%) 743 595(80%)

Vi.angularis Vi.vinifera 307 161(52%) 842 659(78%)

Gl.max Ph.vulgaris 188 93(49%) 2645 2546(96%)

Gl.soja Ph.vulgaris 224 105(47%) 2648 2546(96%)

Ci.reticulatum Vi.vinifera 365 199(55%) 350 286(81%)

Ci.arietinum Vi.vinifera 339 201(59%) 521 414(79%)

Me.truncatula Vi.vinifera 430 184(43%) 648 520(80%)

Gl.uralensis Vi.vinifera 403 170(42%) 482 376(78%)

Lo.japonicus Vi.vinifera 428 177(41%) 195 154(78%)

Lu.angustifolius Vi.vinifera 223 73(32%) 1654 1114(67%)

Ar.ipaensis Vi.vinifera 365 171(47%) 336 275(81%)

Ar.duranensis Vi.vinifera 363 152(42%) 410 327(79%)

Ch.fasciculata Vi.vinifera 426 144(34%) 186 95(51%)

Ar.thaliana Vi.vinifera 392 226(58%) 707 512(72%)

Vi.vinifera Aq.coerulea 478 295(62%) 303 285(94%)

Aq.coerulea Am.trichopoda 552 177(32%) 130 65(50%)

Am.trichopoda Se.moellendorffii 560 0(0%) 14 0(0%)

Abbreviations:Segmentalduplicates(SD).943944

945

946

947

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 29: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

29

Table 5: Number of genes and prevalence of modes of duplication in TF orthologous groups that948expandedinlegumes.949

Species Total SD TD PD rTE dTE DD

Ca.cajan 38 14 3 0 0 7 14

Ph.vulgaris 36 14 6 1 0 7 8

Vi.radiata 36 15 4 1 0 12 4

Vi.angularis 38 14 9 0 0 9 6

Gl.max 67 48 5 3 0 10 1

Gl.soja 69 50 2 3 0 13 1

Ci.reticulatum 22 6 0 0 0 7 9

Ci.arietinum 24 11 0 2 0 1 10

Me.truncatula 39 11 4 2 0 12 10

Gl.uralensis 31 12 0 0 0 4 15

Lo.japonicus 28 6 2 0 0 13 7

Lu.angustifolius 60 42 6 0 0 0 12

Ar.ipaensis 31 11 4 1 0 9 6

Ar.duranensis 32 9 5 1 0 10 7

Ch.fasciculata 34 2 6 0 0 13 13

Abbreviations:Segmentalduplicates(SD);Tandemduplicates(TD);Proximalduplicates(PD);Retrotransposonmediatedduplicates950(rTE);Transposonmediatedduplicates(dTE);Dispersedduplicates(DD).951 952

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 30: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

30

953

Table6:Orthologousgroupswithsignificantly(p-value<0.05)rapidexpansioninlegumes.954

Description ARF:1 ARF:2 bHLH:5 bHLH:12 ERF:10 LBD:8 M-type:1 MYB:1 MYB:25 NAC:3

Ca.cajan 4 2 3 3 4 3 7 5 3 4Ph.vulgaris 3 3 4 2 4 3 5 6 2 4

Vi.radiata 4 3 4 2 3 4 4 5 3 4Vi.angularis 3 3 4 2 4 3 5 8 3 3Gl.max 5 6 8 6 5 4 14 9 5 5Gl.soja 6 7 8 5 6 4 14 9 5 5Ci.reticulatum 3 3 2 4 3 1 4 1 1 3Ci.arietinum 3 3 2 4 3 2 4 1 1 3Me.truncatula 3 3 3 4 2 2 7 5 7 3Gl.uralensis 0 3 3 2 4 4 6 2 3 4Lo.japonicus 3 3 3 3 3 2 4 4 1 3Lu.angustifolius 5 5 6 5 5 6 18 2 3 5Ar.ipaensis 4 3 4 2 3 3 4 3 1 5Ar.duranensis 4 3 4 2 2 2 5 3 2 5Ch.fasciculata 3 3 4 3 3 3 3 4 5 3Ar.thaliana 1 0 0 0 1 1 0 0 0 1Vi.vinifera 1 1 1 1 0 2 1 1 0 2Aq.coerulea 1 1 1 1 1 1 0 0 0 1Am.trichopoda 0 0 0 1 0 0 0 0 0 0Se.moellendorffii 0 0 0 0 0 0 0 0 0 0

955

956

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 31: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

31

Figures957

958

959Figure1:Absoluteandrelativenumberoftranscriptionfactorsineachspecies.Greybarsandtheorange960linerepresenttheabsolutenumberandpercentageoftranscriptionfactorsineachspecies,respectively.961Legumesandnon-legumesareseparatedbyadottedverticalline.962

963

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 32: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

32

964

965Figure2:Ratio(inlog2scale)ofthesizesofeachtranscriptionfactorfamilyeachspeciesinrelationtoVi.966vinifera. Values greater or smaller than zero represent transcription factor families that are relatively967largerorsmallerinagivenspecies(incolumns)incomparisontoVi.vinifera,respectively.Thenumbersin968parenthesesstandfortheabsolutesizeofthatparticularfamilyinVi.vinifera.969

970

971

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 33: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

33

972

973

974Figure3:Expression levelsandKa/Ks ratioof singleton transcription factors.A.Expression (inFPKM)of975syntenic- and non-syntenic singletons in three legume species and in Arabidopsis thaliana. B. Ka/Ks976distribution of syntenic singletons and syntenic segmental duplicates. Syntenic singletons are977transcriptionfactorsgenes,withoutacloseparalog, thatare located inasyntenicregion inareference978outgroup.Segmentalduplicatesareparalogoustranscriptionfactorswithpreservedsyntenyinthesame979genome,aswellasinthegenomeofareferenceoutgroup.Ph.vulgariswasusedasreferenceforGl.max980and Vi. vinifera was used as reference for the other three species. Statistical significance test was981performedusingtheMann-WhitneyUtestandasterisk(*)markindicatesp-value<0.05.982

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 34: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

34

983Figure 4: Species tree showing number of transcription factor orthologous groups that gained or lost984genes. We used different rates of evolution in different lineages, which are represented as branch985styles/colors. Known polyploidization events are marked with stars. Green and red triangles refer to986nodes with more expansions and contractions, respectively. Numbers of expanded and contracted987orthologousgroupsareshowningreenandred,respectively.988

989

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 35: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

35

990991

Figure5:Distributionofsynonymoussubstitutionrates(Ks)of138orthologousgroupswithgenegainin992legumes.Genome-wideKsdistributionsareshownasdensityplotsonthetoppanel.Thebottompanel993showsKsdistributionsofsegmentalanddispersedduplicategenepairs.994

995

996

997

998

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint

Page 36: Polyploidization events shaped the transcription factor ... · 116 (Blanc and Wolfe, 2004;Lehti-Shiu et al., 2017). Different mechanistic explanations have 117 been proposed for this

36

999

1000Figure6:A.PhylogeneticreconstructionoftheorthologousgroupbHLH:12,whichisexpandedinlegumes.B.GeneexpressionpatternsofbHLH:12genesin1001Me.truncatula(BioProject:PRJNA80163),Gl.max(Libaultetal.,2010)andPh.vulgaris(O’Rourkeetal.,2014),showingatrendforgreaterexpressioninroots1002andnodules.1003

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint