polyploidization events shaped the transcription factor ... · 116 (blanc and wolfe,...
TRANSCRIPT
1
Polyploidization events shaped the transcription factor1
repertoiresinlegumes(Fabaceae)2
3KanhuC.MoharanaandThiagoM.Venancio*45Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e6Biotecnologia,UniversidadeEstadual doNorte FluminenseDarcyRibeiro;Camposdos7Goytacazes,Brazil.89*Correspondingauthor10Av.AlbertoLamego2000/P5/217;ParqueCalifórnia11CamposdosGoytacazes,RJ12Brazil13CEP:[email protected] 16
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
2
Abstract17
Transcription factors (TF) are essential for proper plant growth and development.18Several legumes, particularly soybean, are rich sources of protein and oil, with great19impactintheeconomyofseveralcountries.Herewereportaphylogenomicanalysisof20majorTFfamilies in legumesandtheirpotentialassociationwith importanttraitssuch21as nitrogen fixation and seed development. We used TF DNA-binding domains to22systematically screen the genomes of 15 legume and 5 non-legume species. The23percentageofTFsrangedfrom3-8%ofthegenecomplements.TForthologousgroups24(OG) inextantspecieswereusedtoestimateOGsizes inancestornodesusingagene25birth-deathmodel,whichallowedus to identify lineage-specificexpansions.Together,26OGanalysis and rateof synonymous substitutions (Ks) betweengenepairs show that27major TF expansions are strongly associated with known whole-genome duplication28(WGD)eventsinthelegume(~58mya)andGlycine(~13mya)lineages,whichaccount29fora large fractionof thePh.vulgarisandGl.maxTFrepertoires.Outof the3407Gl.30maxTFs,1808and676canbetracedbacktoasinglehomeolog inPh.vulgarisandVi31vinifera,respectively.WefoundatrendforTFsexpandedinlegumestobepreferentially32transcribedinrootsandnodules,suggestingtheirrecruitmentearlyintheevolutionof33nodulationinthelegumeclade.WealsofoundTFexpansionsintheGlycineWGDthat34werefollowedbygenelossinthewildsoybeanGl.soja,includinggeneslocatedwithin35importantquantitativetraitloci.Together,ourfindingsstronglysupporttherolesoftwo36WGDsinshapingtheTFrepertoiresinthelegumeandGlycinelineages,whicharelikely37relatedtoimportantaspectsoflegumeandsoybeanbiology.3839
Keywords: whole genome duplication, phylogenomics, segmental duplication,40nodulation,soybean.41
Abbreviations:42
TF:transcriptionfactors;DBD:DNAbindingdomains;AP2:APETALA2;ERF:Ethylene43ResponsiveFactor;RAV:RelatedtoABI3/VP1;ARF:Auxinresponsefactor;BBR-BPC:44BarleyBRecombinant(BBR)-BASICPENTACYSTEINE1(BPC1);BES1:BRI1-EMS-45SUPPRESSOR;bHLH:Basichelixloophelix;bZIP:Basicleucinezipper;Dof:DNAbinding46withonefinger;CO-like:CONSTANS-like;LSD:LESIONSIMULATINGDISEASE1(LSD1);47C2H2:CCHH(Zn);C3H:CCCH(Zn);CAMTA:Calmodulinbindingtranscriptionfactors;48CPP:Cystein-richpolycomb-likeprotein;DBB:DoubleB-boxzincfinger;E2F/DP:E249factorproteinandDPprotein;EIL:Ethylene-Insentive3(EIN3)-likeprotein3(EIL3);FAR1:50FAR-REDIMPAIREDRESPONSE1;LFY:LEAFY;G2-like:Golden2(G2)-like;ARR-B:Type-B51phospho-acceptingresponseregulator;GeBP:GLABROUS1enhancer-bindingprotein;52
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
3
GRAS:GAI,RGA,andSCR;GRF:GROWTH-REGULATINGFACTOR;TALE:ThreeAminoacid53LoopExtension;WOX:WUShomeobox-containingproteinfamily;HB:Homeobox;HB-54PHD:HB-PHDfinger;HB-other:HB-other;HRT-like:Hairy-Relatedtranscription-factor-55like;HSF:Heatshockfactor;LBD:ASYMMETRICLEAVES2/LATERALORGAN56BOUNDARIES;M-type:MADS-typeI;MIKC:MADS-typeII;MYB:Mybproto-oncogene57protein;NAC:NAM,ATAF1,2andCUC2;NF-X1:Nuclearfactor,X-boxbinding1;NF-YA:58NuclearfactorYsubunitA;NF-YB:NuclearfactorYsubunitB;NF-YC:NuclearfactorY59subunitC;Nin-like:NODULEINCEPTION;NZZ/SPL:SPOROCYTELESS/NOZZLE;S1Fa-like:60S1Fa-like;TCP:TEOSINTE-LIKE1,CYCLOIDEA,andPROLIFERATINGCELLFACTOR1;ZF-HD:61Zincfingerhomeodomainprotein;SBP:SQUAMOSApromoterbindingprotein;SRS:SHI62RELATEDSEQUENCE;SAP:STERILEAPETALA;STAT:SignalTransducersandActivatorsof63Transcription;VOZ:VascularplantOne-Zincfinger.6465Introduction66Legumes(Fabaceae)arethethird largestAngiospermfamily,comprisingnearly20,00067specieswithtremendousmorphologicalandecologicalvariation(Lewis,2005).Legumes68arenotoriousfortheirsymbioticinteractionswithspecificdiazotrophicbacteria,which69is a feature of major ecological and agronomic relevance. Out of the six Fabaceae70subfamilies, Papilionoideae alone accounts for two-thirds of all legume species,71including economically important crops, such as Glycine max (soybean), Phaseolus72vulgaris (common bean), Arachis hypogaea (peanut), and Cicer arietinum (chickpea)73(GrahamandVance,2003;Cardosoetal.,2012;Azanietal.,2017).Severallegumegrains74andpulsesarerichsourcesofdietaryproteins,cookingoils,andbiofuels.Currently,at75least15legumegenomesarepubliclyavailable(Table1),includingthosefromwildand76cultivatedsoybean(GlycinesojaandGl.max).Nevertheless,Fabaceaesubfamiliesother77thanPapilionoideaeare largelyunderrepresentedamongsequencedgenomes,despite78the recently published Chamaecrista fasciculata and Mimosa pudica genomes79(Griesmannetal.,2018).80 Virtually all major biological processes are at least partially regulated at the81transcriptional level by specific DNA-binding transcription factors (TFs),which bind to82cis-regulatory elements of target genes by means of DNA binding domains (DBDs).83Because of their key regulatory roles, TFs have been extensively demonstrated to be84critical for plant evolution and adaptation to multiple environments (Doebley and85Lukens, 1998;Lehti-Shiu et al., 2017). A typical TF family encodes proteins sharing a86commonDBD.Over50TFfamilieshavebeenidentifiedinplants(Jinetal.,2017),outof87whichmanyregulatebiologicalprocessessuchasgrowth,development,stresssignaling,88anddefenseagainstpathogens (Lataetal., 2011;Tripathietal., 2016;Heetal., 2018).89
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
4
Although many TF families are present throughout eukaryotes, their sizes can vary90considerably (Lespinetetal.,2002;Nagataetal.,2016;Lehti-Shiuetal.,2017).PlantTF91families are usually larger than their animal counterparts (Shiu, 2005) and the92exceptional increase in TF family sizes in higher plants is often related with whole93genome duplication (WGD) and triplication (WGT) events, also known as94polyploidization(Lehti-Shiuetal.,2017).95
Polyploidization is widely accepted as an important factor in angiosperm96evolution(PickettandMeeks-Wagner,1995;Soltisetal.,2015).Allcoreeudicotsshare97atleastonepolyploidizationevent(ahexaploidyevent,i.e.theγpolyploidy)(Jaillonet98al.,2007;AközandNordborg,2019).Withtheincreasingavailabilityofplantgenomes,it99becameclearthatseveralotherWGDeventsoccurredaftertheγpolyploidizationevent.100Forexample,Arabidopsishastwoadditionallineage-specificWGDs(αandβpolyploidy)101(Blancetal.,2000;Visionetal.,2000),whereaslegumesshareatetraploidancestorthat102originated~58-60millionyearsago(mya)(Cannonetal.,2006).Furtheranalysisofthe103soybean and narrowleaf lupin (Lupinus angustifolius) genomes uncovered additional104lineage-specificWGDsintheselineages(Schmutzetal.,2010;Krocetal.,2014;Haneet105al., 2017). Importantly, domestication of polyploid species ismore likely than that of106their wild relatives (Salman-Minkov et al., 2016), supporting the importance of such107events in agriculture. In addition to large scale duplication events, small scale108duplications(SSDs)orlocal(e.g.tandem)duplicationsalsocontributedtotheexpansion109ofmultiplegenefamilies(Cannonetal.,2004).110
Therelativecontributionofduplicationmodes inplantgenomes isasubjectof111intenseresearch(Panchyetal.,2016).UponWGD,genelossisthemostcommonfateof112oneof theduplicates, inaprocesscalled fractionation(Freelingetal.,2015;Panchyet113al., 2016;Cheng et al., 2018). Nevertheless, some gene families (e.g. TFs and signal114transductiongenes)areapparentlymorepronetoretainduplicatedcopiesthanothers115(BlancandWolfe,2004;Lehti-Shiuetal.,2017).Differentmechanisticexplanationshave116beenproposedfor thisphenomenon,outofwhichthegenebalancehypothesis is the117most accepted one. According to this hypothesis, upon a WGD, genes with many118interaction partners have higher probability of being retained in duplicates, since119alterations inthestoichiometrytheirproteinproductstendtobedeleterious (Birchler120andVeitia,2007;Freeling,2009;BirchlerandVeitia,2011).Retainedcopiesthentypically121evolve via subfunctionalization (i.e. duplicates acquire complementary functions) or122neofunctionalization (i.e. one of the copies evolves a new function) (Freeling et al.,1232015).124
It is currently accepted that the transition fromwild to domesticated soybean125took place in Central China between 5,000 and 9,000 years ago through a gradual126processthatinvolvedanintermediaryspecies,Gl.gracilis(Hanetal.,2016;Sedivyetal.,127
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
5
2017).Other linesofevidencesupport independentdomesticationevents inEastAsia128(Korea and Japan) (Zhou et al., 2015;Sedivy et al., 2017). Artificial selection during129domestication involved several distinct traits, such as pod shattering, seed hardness,130adaptationtodifferentphotoperiods,floweringtime,andstressresistance(Zhouetal.,1312015;Sedivy et al., 2017). Several important TFs are involved in these traits. SHAT1-5132(NAC family TF) promote shattering resistance by increasing lignification of fiber cap133cells(Dongetal.,2014).134
Theprogressinsequencingtechnologiesandautomationoverthepast12years135unleashedthepowerofcomparativeandpopulationgenomicsinpinpointingkeygenes136involved indomesticationand improvementofsoybeanandothercrops,as illustrated137by the discovery of many Quantitative Trait Loci (QTL) involved in commercially138importanttraits(e.g.seedweightandoilcontent)bytheresequencingof302soybean139accessions (Zhou et al., 2015). In the present work we systematically investigate the140evolution of TF repertoires in legumes (Fabaceae). Briefly, we performed large-scale141comparativeanalysisof TFs from15 legumeand fivenon-legume species.Our results142unveil a profound impact of polyploidization events on the expansion of TF families143throughoutlegumes.Inparticular,someTFfamiliesthatexpandedinthelegumeWGD144event (~58 mya) are preferentially expressed in roots and nodules, supporting their145importanceintheevolutionofnodulation.Further,TFexpansionsthathappenedatthe146GlycineWGD(~13mya)includegenesthatweresubsequentlylostinGl.soja,including147TF genes that are within Gl. max QTLs associated with leaf shape, area and width,148proportionofFA18inseeds,andbranchdensity.Together,ourresultsstronglysupport149thatWGDeventsdeeply shaped theevolutionof TF repertoires in legumesand likely150generatedTFsthatregulatenodulationandothertraitsofkeyagronomicimportance.151
152153ResultsandDiscussion154
Systematicidentificationandcharacterizationoftranscriptionfactors155
We used a set of diagnostic specific DNA binding domains and forbidden domains156(SupplementaryTableS1)to identifyTFs inthegenomesof20plantspecies(Table1).157Wepredictedatotalof37,008TFs(SupplementaryTableS2),whichwereclassifiedin58158broadfamilies(Table2).Atotalof31,111TFswerepredictedinthe15legumegenomes159(Supplementary Table S2).We benchmarked our pipeline by comparing the detected160TFs with those previously predicted in Ar. thaliana. Out of 1713 Ar. thaliana TFs161availableinPlantTFDB,1673(98%)werecorrectlypredicted(SupplementaryFigureS1).162Further, 59 TFs were exclusively predicted by our pipeline, out of which 40 were163annotated as TFs in the TAIR database (https://www.arabidopsis.org) (Supplementary164
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
6
TableS3).ThepercentageofTFsacrossgenomesrangedfrom5to8%,whichisinline165with a previous estimation from95 eudicot species (Jin et al., 2017).Gl. soja andSe.166moellendorffii showed the highest and lowest number of TFs, respectively (Figure 1).167LegumestypicallyshowedgreaternumberofTFsthannon-legumes(Figure1),although168thevariationinthesefractionsindicatesthatsomeTFexpansionsplayspecificroles in169particularlineages.170
To better understand the different proportion of TFs across genomes, we171comparedTF familysizesbetweenpairsof species.AllexceptsixTF families (i.e.AP2,172GRAS, B3, Nin-like, HRT-like, and Trihelix) expanded in the basal angiosperm Am.173trichopoda in comparison to the lycophyte Se. moellendorffii, supporting the174contribution ancient WGD events to the TF repertoires of seed plants (Albert et al.,1752013).AlthoughMADSTFstightlyregulateflowerdevelopment,theirdiversificationhas176been proposed to predate the origin of angiosperms (Albert et al., 2013).We found177twicemoreMADSgenes inAm.trichopoda (n=34)than inSe.moellendorffii (n=15). In178particular, theMIKC-type (type II)MADSsubfamilyalonehas increasedby five-fold, in179spiteofthehigherrateofgenebirth/deathoftheM-type(type-I)MADSsubfamily(Nam180etal.,2004;Kumpeangkeawetal.,2019).ByanalyzingTFclusters (described later)we181observed that genes from twoM-typeMADS clusters are exclusively present in Am.182trichopoda, probablyas a resultof a lineage-specific expansion.Wealsoanalyzed the183expansionoftheGRASfamilyinAm.trichopoda(n=44)ascomparedtothebasaldicot184Aq. coerulea (n= 36), which happened via lineage-specific tandem duplications (10185genes) in the former (Supplementary Figure S3). These 10 genes belong to a single186orthologous group (OG, see below) that does not have orthologs from other dicots187exceptone from thebasaleudicotAq. coerulea. Inaddition,we foundonemoreAm.188trichopoda specific OG consisting of two GRAS genes (scaffold00007.332 and189scaffold00007.335).TherearealsoafewremarkableexpansionsinSe.moellendorffiiin190comparison toAm. trichopoda (e.g.HD-Zip,NAC, TCP,GATA, expandedbymore than191twofold)(Table2).Together,theseresultssupportagrowthoftheTFrepertoireearlyin192thediversificationofangiosperms.193
Aq.coeruleaisanancienttetraploidandthistetraploidywaslikelyanimportant194first step towards the gamma hexaploidy (4n+2n) that is shared by all core eudicots195(AközandNordborg,2019).Nevertheless,someTFfamiliesareremarkablylargerinAq.196coeruleathaninVi.vinifera,suchasFAR1(Aco:92,Vvi:19),B3(Aco:104,Vvi:29),GeBP197(Aco: 13, Vvi: 1), andM-TypeMADS (Aco: 50, Vvi: 17) (Figure 2; Table 2). After the198gamma hexaplodization event, Vi. vinifera has not undergone any large scale199duplication event,making it a suitable reference for comparative analysis with other200core eudicots (Jaillon et al., 2007;Severin et al., 2011;Wang et al., 2017).Most large201familiesareexpandedinAr.thalianaandlegumesincomparisontoVi.vinifera(Figure202
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
7
2).Therearealsosomenotablespecies-specificexpansionsinlegumes,suchasthatof203FAR1,B3,andM-TypeMADS inMe. truncatula (Figure2).FAR1hasalsoexpanded10204timesinAr.ipaensisandAr.duranensis.FAR1hasbeenlinkedwithskotomorphogenesis205andphotomorphogenesis inhigherplantsand itsexpansionmightberelatedwiththe206fructificationprocess in peanuts (Chen et al., 2016;Lu et al., 2018).Unlike the above-207mentionedexpansionsofMADSTFsinAm.trichopoda,onlyM-type(type-I)MADShad208large expansions in all legumes in comparison toVi. Vinifera, as previously discussed209(Nametal.,2004;Feiletal.,2013;Kumpeangkeawetal.,2019).210
TheGlycine genus has amore recentWGD that is not sharedwithPhaseolus.211Accordingly,wefoundanapproximateratioof1:2betweenPh.vulgarisandGl.maxin21290%(52/58)oftheTFfamilies,implyingthattheGlycineWGDhasstronglycontributed213tothesoybeanTFrepertoire.Therearealsoa fewdeviations fromthis trend,suchas214theNAC(Gl.max:142andPh.vulgaris:90)andNOZZLE/SPL(Gl.max:2andPh.vulgaris:2153) families. Of the 42 NAC OGs with genes from Ph. vulgaris and Gl. max, 9 have216identical number of genes, indicating that there are subfamilies that rapidly reverted217back to their configuration before the Glycine WGD, probably due to gene dosage218sensitivity.WealsonoticedthatGl.sojahas38moreTFsthanGl.max.WhilesixteenTF219familieshadidenticalnumberofgenesinbothspecies,otherssuchasERF(Gl.max:290,220Gl.soja:279)andFAR1(Gl.max:68,Gl.soja:79)showedsignificantvariationbetween221cultivatedandwildsoybeans.222
223
OriginofTFparalogs224
We also investigated the modes of duplication shaping TF family sizes.225Paralogouspairshadtheirmodesofduplicationpredictedusingapreviouslydescribed226scheme(Proulxetal.,2011;Qiaoetal.,2018)thatassignduplicatepairsinthefollowing227categories: segmentalduplicates (SD); tandemduplicates (TD);pairs separatedbyone228to five intervening genes were called proximal duplicates (PD); pairs originated via229retrotransposon (rTE) or DNA transposon (dTE) activity. The remaining pairs were230classified as dispersed duplicates (DD). We used the priority order231SD>TD>PD>rTE>dTE>DD to assign a single duplicationmode to each pair. In legumes,232morethan70%oftheTFshaveatleastoneparalog(Table3).Further,itisclearthatSD233isthemainduplicationmode,supportingtheiroriginthroughlargescaleduplication,as234previously reported (Lehti-Shiuetal.,2017).Gl.maxandGl. sojaare thespecieswith235thegreatestnumberofSDTFs,whichcomprise77.7%oftheTFrepertoireintheformer.236In addition, local duplications (i.e. TD and PD) also contributed to TF repertoires,237particularlyinMe.truncatula,whichhas11%(241/1752)oftheduplicateTFsclassified238asTDandPD,especially in theERF,WRKY,andB3 families (SupplementaryTableS4).239
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
8
TheprevalenceofTDpairsinMe.truncatulahasalsobeenreportedingenesrelatedto240otherregulatoryroles,suchasintheF-boxfamily(Bellieny-Rabeloetal.,2013).Further,241in Ar. duranensis, Ar. ipaensis, and Vi. radiata, nearly 7% of the duplicated TFs are242derivedfromTD,whereasdTEduplicationsaccount for29.8%of theduplicatedTFs in243Ca.fasciculata,especiallyintheMYB,NAC,andbHLHfamilies(Table3;Supplementary244Table S4). There are also some notable differences in the prevalence of modes of245duplicationbetweencloselyrelatedspecies.Forexample,localTFduplicationsaremore246frequentinAr.ipaensisthaninAr.duranensis(Table3).247 Between5.5%(188/3407,inGl.max)and60.7%(560/923,inAm.trichopoda)of248theTFswereclassifiedassingletons(Table3).WhileinAr.thaliana22.58%(392/1736)249oftheTFsweresingletons,inlegumesthisnumberrangesbetween5.5and29%(Table2503). Importantly, a large fraction of these singletons remain syntenic to a reference251outgroup species (Table 4). InPh. vulgaris andMe. truncatula, syntenic singleton TFs252were significantly more expressed than their non-syntenic counterparts (Figure 3A),253suggesting that their greater functional conservation is associatedwith their genomic254context.MostSDTFswerealso foundtobesyntenic in theirclosestoutgroupspecies255(Table 4). We also estimated non-synonymous/synonymous mutation ratios (Ka/Ks)256betweensingletonandSDTFswithpreservedsyntenyinanoutgroupspecies.Orthologs257from SD pairs had significantly lower Ka/Ks than the singleton orthologs (Figure 3B),258leadingustohypothesizethatthesegenesareunderstrongpurifyingselectiondueto259their involvement in intricate regulatory systems emerging from the WGD events.260Similar observations on the strong negative selection of duplicated genes have been261alsoreportedinotherspecies(DavisandPetrov,2004;Jordanetal.,2004).262
ThesignificantnumberofsyntenicSDTFsderivedmostlybygeneretentionafter263successiveWGDevents (Table 3).Geneduplicability, the ability of a duplicate pair to264remain duplicate, is non-random and biased towards specific gene families, including265TFs (Lynch and Conery, 2000;Davis and Petrov, 2004;Li et al., 2016).We analyzed TF266duplicabilityusingGl.maxTFsfromsyntenicblocksthatsurvivedthe58myaandthe13267myaWGDevents.Weusedintra-speciescollinearblockstoidentifyGl.maxSDTFsthat268correspondtosinglesyntenicregionsinPh.vulgaris.WeusedamaximumKsthreshold269of0.4tofiltertheGl.maxSDpairsthatlikelyemergedinthe13myaWGD(Schmutzet270al., 2010). Nearly 81% (1808/2230) of theGl.max SD TFswithin that Ks range had a271syntenicgeneinPh.vulgaris.Further,75%(676/904)ofsuchPh.vulgarisorthologshad272asingleVi.viniferasyntenicortholog.InbothcaseswefoundthatbHLHfamilyhadthe273highestnumberofsyntenicgenepairs(Gl.max-Ph.vulgaris:99pairsandPh.vulgaris-Vi.274Vinifera:43pairs).Conversely,only16% (15/93)of theGl.max syntenic singletonTFs275correspond to single genomic regions inPh. vulgaris andVi vinifera.We hypothesize276thatthesegenesnotonlydependontheconservationofa localgenomiccontext,but277
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
9
arealsosensitivetogenedosage.Ourresultsclearly illustratethehighduplicabilityof278most TF families in soybean and further support the impact of twoWGDevents that279accountforaprominentfractionoftheTFrepertoireofthisspecies.280
281
LargescaleduplicationeventscorrelatewithincreaseinTFcopynumber282
Many TF families explored here are broad and diversified, often comprising283multiplesub-groups,suchasbHLH(PiresandDolan,2010),MYB(Duetal.,2012),and284ERF(Nakano,2006).ToobtainanoverviewofthediversificationofplantTFfamilies,we285assignedthemtoOGsbyusingall-vs-allreciprocalBLASTPsearch,followedbyMarkov286clustering(seemethodsfordetails).Forexample,AP2had28clusters(Supplementary287Table S5), labeled as AP:1 to AP:28.We found 1557 TF OGs from the 58 TF families288reported above. Nearly 9% (144/1557) of these OGs had no members from legume289species,whereas29%(452/1557)werelegume-specific,and43%(672/1557)hadgenes290fromatleast10species.Expectedly,largerfamilieshadmoreOGs,suchasbHLH,C2H2,291andMYB,withmorethan100OGseach.Conversely,afewfamiliesdivertedfromthis292trend, such as SAP (65 genes and 7 OGs) and EIL (143 members and 13 OGs)293(SupplementaryFigureS4).294
To investigate the evolution of TF families in more detail, we analyzed the295numberofgenesperOGineachspeciesusingCAFE(v.4.2)(Bieetal.,2006;Hanetal.,2962013),anelegantmethodthatusesgenebirth(λ)anddeath(μ)ratestomodelgainand297loss events in different lineages of a given ultrametric species tree (seemethods for298details).Wesearchedforoptimalλandμbasedonthemaximumlikelihoodscore,using299the option to take potentially fragmented genomes into account (see methods for300details).Weused672TFOGswithsufficientvariation innumberofgenesperspecies301(statisticalvariance≥0.5)andcontaininggenesfromat least10species.Forexample,30210outof31AP2clusterswereusedforrateestimation(SupplementaryTableS5).We303repeated the rateparameter search for 50 times and theparameters resulting in the304bestmaximumlikelihoodscorewereusedforfurtheranalysis.Theresultsobtainedwith305CAFElargelyconfirmthegeneraltrendforTFgainuponWGD(Figure4),whichisinline306withtheabove-mentionedcorrelationbetweenSD,theretentionofparalogousTFpairs,307andintraspeciessynteny.Thistrendcanbeexemplifiedbythenodesrepresentingthe308legume and Glycine ancestors, which have a high number of expanded TF families309(Figure4).310
We also investigated the impact of the legume and Glycine WGDs in the TF311repertoires of Gl. max and Gl. soja. Firstly, we analyzed the 138 OGs (from 34 TF312families)thatexpandedinlegumesincomparisontonon-legumes(Figure4).Ifall1557313OGs are considered, an average of 0.21 genes were gained per OG in legumes, in314
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
10
contrast to 1.09 genes in the 138 expanded OGs. Secondly, we identified rapidly315evolvingOGs(10of138;7.25%)(Table5),whicharethosewithsignificantgenegainor316lossrate(p-value<0.05)(Table6).Inthese10OGs,theaveragerateofgenegainwas3172.0, nearly two and 10 times of that observed in the 138 expanded OGs and in the318completesetof1557OGs,respectively.319
We analyzed the most prevalent modes of duplication in the 138 OGs that320expanded in legumesand found thatamajor fractionof thememergedviaSD.While321SDscomprisemorethan80%oftheTFsinspecieswithmorerecentWGDs(i.e.Gl.max,322Gl. soja,and Lu. angustifolius) (Figure 5), several SDpairsmight have lost collinearity323afterthelegumeWGDandwereclassifiedasDDs.WheninspectingtheKsdistributions324oftheparalogouspairsfromthe138OGsthatexpandedin legumes,wefoundranges325corresponding to both, the legume and Glycine WGDs (Figure 5) (Schmutz et al.,3262010;Cannon,2013),suggestingthatafractionoftheTFsthatexpandedinthelegume327WGDsubsequentlyduplicated fora second time in theGlycineWGD.Deviations from328this range were observed for Me. truncatula, Arachis spp., Cicer spp. and, Ch.329fasciculata,aspreviouslyreported(Cannonetal.,2010;Varshneyetal.,2013;Tangetal.,3302014;Chenetal.,2016).DDparalogshaveamoredispersedKsdistributionthanthatSD,331althoughtheirKsdistributionsalsoindicatethatseveralDDpairswerelikelygenerated332bySDwithsubsequentlossofcollinearity(Figure5).Collectively,theseresultssupport333theassociationbetweentheexpansionoflegume-specificTFexpansionsandtheWGD334eventthattookplace58mya.335
To further explore the functional relevance of legume TF expansions, we336analyzedgeneexpressionpatterns(seemethodsfordetails)acrossmultipletissuesfrom337Me.truncatula,Ph.vulgaris,andGl.max(Figure6;SupplementaryFigureS5).Strikingly,338the 138 OGs that expanded in legumes are enriched in genes with preferential339expression innodules (Fisher'sExactTest,p-values=1.7×10-4and1.4×10-5 forMe.340truncatulaandGl.max,respectively)androots(Fisher'sExactTest,p-values=1.4×10-9341and5.9×10-3 forMe.truncatulaandPh.vulgaris, respectively).Theseresults indicate342that the recruitmentof thesegenespredate theemergenceofnodulation in legumes343andmighthaveplayedrolesintherootphysiologyassociatedwiththisprocess.344
Next, we integrated phylogenetic reconstructions of the 10 rapidly expanded345OGswith gene expression data and found three interesting groups (i.e. bHLH:12,M-346type:1,ERF:10)(Table6).ThebHLH:12OGshowedsignificantlyhigherexpressionduring347nodule development in Me. truncatula, Ph. vulgaris, and Gl. max (Figure 6B and348Supplementary Figure S5). This OG includes two SD pairs ofMe. truncatula bHLHs,349Medtr4g087920-Medtr2g015890andMedtr4g079760-Medtr2g091190withKsvaluesof3501.0523and0.8292,respectively.Ph.vulgarisandGl.maxorthologsofthesegeneswere351alsomoreexpressedinrootsandnodulesthaninothertissues(Figure6).352
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
11
Another interestingOGencodesTFsfromtheM-typeMADSfamily (M-Type:1).353This OG has independently expanded in different species (Supplementary Figure S6),354includingamajorexpansioninLu.angustifolius.Nineof14Gl.maxgenesinthiscluster355emerged within the Glycine genus, including one gene (Glyma.03G083700.1) with356preferentialexpression in seedsand flowers (SupplementaryFigureS7A).The twoPh.357vulgaris orthologs (Phvul.006G077700 and Phvul.006G077800.1) showed seed-specific358expression, suggesting their importance in seed development (Supplementary Figure359S7B). Interestingly, these Ph. vulgaris genes originated from an ancestral tandem360duplication,as thisorganization isalso found inother legumes (SupplementaryFigure361S6).AlthoughthisOGlackedanAr.thalianamember,theclosestAr.thalianahomologs362include AT5G27810, AT1G22590 (AGAMOUS-LIKE 87, AGL87), and AT5G48670363(AGAMOUS-LIKE80,AGL80). Importantly,AGL80hasbeenshowntoberesponsible for364centralcellandendospermdevelopmentinArabidopsis(Portereikoetal.,2006).365
Wealso analyzed twoERF (ethylene response factor)OGs (ERF:10andERF:18)366containing genes playing critical roles in nodulation. Of these two, only ERF:10 was367amongthe10rapidlyexpandedOGs.ManualcurationrevealedthatERF:10andERF:18368compriseERFrequiredfornoduledifferentiation (EFD)andERF requiredfornodulation369(ERN) genes, respectively. ERN and EFD genes regulate nodulation inMe. truncatula370(Vernieetal.,2008;Youngetal.,2011;Cerrietal.,2012).ThreeofthefourMtERNs(i.e.371Medtr7g085810.1, Medtr6g029180.1, and Medtr8g085960.1) had relatively higher372expression after inoculation than in roots or nodules, supporting their critical role in373nodule development (Supplementary Figure S9A). The biased expression towards374nodules and root tissues are also observed in Ph. vulgaris and Gl. max orthologs375(Supplementary Figure S8). Of the twoMtEFDs,Medtr4g008860 andMedtr3g106290376weremoreexpressed innodulesand inoculated roothairs, respectively. Similarly,Ph.377vulgarisandGl.maxERNsarealsomoreexpressedinrootsandnodulesthaninaerial378tissues(SupplementaryFigureS8).379
380
AssociationbetweenquantitativetraitsandGlycinemaxspecificTFs381
The Glycine node had the largest number of expanded OGs, with 36% (563/1557)382(Figure4)ofthemexpandingbyanaveragerateof1.67genesperOG.Outofthese,57383had rapidly expanded (p-value < 0.05) with an average rate of 2 genes per OG. In384comparison to the Glycine node, 76 and 82 OGs expanded in Gl. soja and Gl. max,385respectively. Of the OGs that expanded in Gl. max, 79% (65/82) showed significant386expansions (p-value < 0.05), with average rate of 1.44. Among these families, ERF (6387OGs),MYB(5OGs),MYB_related(8OGs),bHLH(7OGs),andC2H2(7OGS)TFsgained388more than5genesperOG. Interestingly, 59% (202/341)of thegenes fromexpanded389
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
12
OGs liewithin SD regions, supporting the importance of theGlycineWGD in shaping390thesefamiliesinGl.max.391
We explored whether some of these OGs could be related with important392soybeanagronomic traits.WesearchedtheGl. soja syntenic regionscorrespondingto393the341Gl.maxTFsfromthe65rapidlyexpandedOGs.Weidentified50TFswithouta394homeolog inGl. soja (Supplementary Table S6), outofwhichonly fivehada syntenic395ortholog inPh. vulgaris. Interestingly, twoERF (Glyma.20G115300,Glyma.14G161900)396and one SBP (Glyma.06G205700) TFs are within previously reported chromosomal397regionsassociatedwithimportantquantitativetraits(SupplementaryFigureS9)(Fanget398al., 2017). The ERF Glyma.20G115300 was located within a region associated with399overall leaf size and average number of seeds per pod. The second ERF,400Glyma.14G161900,iswithinaregionassociatedwithFA18contentandratioinmature401seeds.Finally,theSBPTFGlyma.06G205700iswithinaregionregulatingbranchdensity402(i.e. ratio of branch number and plant height) and beginning bloom date.403(Supplementary Table S6). We envisage that many more of these 50 TFs will be404associatedwithimportanttraits,whichcouldberevealedinbyamorecomprehensive405work integrating QTL information from other genotypes and studies with our406phylogenomicresults.407408
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
13
Methods409
Genomicdata410
Genome sequences and annotations for 20 plant species were obtained from public411repositories(Table1).Weusedcodingsequencesandpredictedproteinsequencesfrom412thelongestsplicingisoform(whenmorethanoneareavailable).Predictedproteinswith413less than 50 amino acids or containing premature stop codons or more than 20%414ambiguousaminoacidswereexcluded.415
Predictionandclassificationsoftranscriptionfactors(TFs)416
To remove redundancy due to splicing isoforms and incomplete gene predictions,we417removed nearly identical sequences using BLASTCLUST (Altschul et al., 1997) as418previouslydescribed (parameters: -S1.89 -L0.9 -bF) (Gossanietal.,2014;Vidaletal.,4192016).420
We adopted the TF family classification scheme of plantTFDB (Zhang et al.,4212011;Jinetal.,2017).Wecreatedalocaldatabaseofproteindomainsbycombiningall422HMMprofiles fromPFAM-A (Release31.0) (Finnetal., 2016)and13plant specific TF423HMM profiles downloaded from PlantTFDB (Supplementary Table S1). Protein424sequencesweresearchedforconserveddomainsusingHMMER3.0(http://hmmer.org)425(domaine-valuecutoff<0.01).TFswereclassifiedin58familiesaccordingtotheirDBD.426427
Speciesphylogeny428
A species phylogeny was reconstructed using low copy orthologs present in all 20429species. We clustered the predicted proteins on the basis of the pairwise sequence430similarityoftheirlongestproteinproducts,whichwascomputedwithBLAST(e-value≤4311e-5)(Altschuletal.,1997).Sequencepairswithpercentageidentityofatleast35%and432querycoverageofat least50%wereused forMarkovclusteringusingmclblastline (v.43312-068; Inflation parameter: 1.5) (Enright et al., 2002). Clusters containing up to 22434geneswithatleastonegenefromeachspecieswereused.Ifaspecieshadparalogous435genes, the paralog with greater identity to orthologs from other species was used.436Amino acid sequence alignment was performed using DECIPHER (Wright, 2015) and437cDNAalignmentperformedwithPAL2NAL(Suyamaetal.,2006).Weconcatenatedthe438codonalignmentsofthesegenestocreateasuper-alignment.Next,phangorn(Schliep,4392011) was used to estimate the best substitution model for the phylogenetic440reconstruction, which was performed using RAxML (v8.2.11; model: GTRGAMMAIG4,441bootstrap:1000)(Stamatakis,2014).Thephylogramandsequencealignmentwereused442inrelTime-ML(implementedinMEGA-X,v.10.0.1)(Tamuraetal.,2018)togeneratean443
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
14
ultrametric tree.Weused theTimeTreedatabase (Kumaret al., 2017) to retrieve the444divergencetimesofFabaceaeandVi.vinifera(110mya)andofPh.vulgarisandGl.max445(24mya),whichwereusedasreferences.446
Syntenyandsynonymousmutationrate(Ks)447
We identified segmental duplications using DAGCHAINER (r02062008) (Haas et al.,4482004) by using bidirectional best BLASTPhits (e-value ≤ 1e-10, 35%minimum identity,44950%minimumquery coverage). Aminimumof four collinear geneswere required to450identifyasyntenicblock(DAGCHAINER,parameter-A4),aspreviouslyusedinsoybean451(Severin et al., 2011). Tandem duplicates were also identified using DAGCHAINER452(parameters-T-A2).Tandemorsegmentalgenepairshadtheirnon-synonymous(Ka)453and synonymous (Ks) mutation rates estimated using the bp_pairwise_kaks script,454distributedwithBioPerl(v5.22.1)(Stajichetal.,2002).455
OrthologousclusterandTFparalogidentification456
TF OGs were inspected for different modes of gene duplication. Multiple modes of457duplication can also co-occur in a group of genes. In these cases, only one type of458duplication is reported, following the order SD>TD>PD>rTE>dTE>DD (Proulx et al.,4592011;Qiao et al., 2018). Although this strategy can slightly underestimate some460duplication levels, it helpedus to assess themain forces shaping theexpansionof TF461families.462
EstimationofexpansionsandcontractionsinTFfamilies463
WeusedCAFE(v4.2)(Hanetal.,2013)toassesstheevolutionofTFfamilysizesusing464the time-calibrated species tree and TF OG compositions as inputs. We used the465cafeerror.pyscript,availableintheCAFEpackage,tomodelerrorratesthatmighthave466been introduced in gene family sizes, particularly by species with more fragmented467genomeassemblies(e.g.Lu.angustifolius)(Hanetal.,2013).Thiserrormodelwasused468adjustfamilysizes.WeestimatedλandμbyrunningCAFEfor50timesandselectedthe469parameters that gave the bestmaximum likelihood estimate. These parameterswere470usedtoestimateOGsizesatancestornodesandtopredictrapidlyevolvingOGs(p-value471<0.05),whicharethosethatsignificantlygainedorlostgenes.Theaveragenetchange472ateachnodeonthespeciestreewasexpressedas:473
Theaveragechangeonnode!" = (&'()')+',-
+ 474
wherenisthetotalnumberofOGs,(Mi−Xi)isthedifferenceinOGsizebetweennodeM475and its parent node X for a given OG i. A negative or positive Am value stands for476
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
15
contractionorexpansion,respectively.SomeremarkablyexpandedorcontractedTFOG477had their phylogenies reconstructed with RAxML (v8.2.11; model: GTRGAMMAAUTO,478bootstrap: 1000) and visualized using Figtree (v.1.4.3)479(http://tree.bio.ed.ac.uk/software/figtree/).480
Geneexpressiondataandtissuespecificity481
Ar. thaliana and Ph. vulgaris normalized gene expression data were obtained from482public databases such as ArrayExpress (Liu et al., 2012) and PvGEA (O’Rourke et al.,4832014), respectively.FouradditionalRNAseqdatasetsweredownloaded fromtheNCBI484SRAdatabase(https://www.ncbi.nlm.nih.gov/sra/).Thefirsttwodatasetscomprisetwo485soybean transcriptome studies (Bioproject PRJNA208048, PRJNA79597) (Libault et al.,4862010;Severin et al., 2010). The third dataset includesMe. truncatula transcriptomes487(Boscari et al., 2013) in the following conditions and tissues: nitrogen-starving roots,488rootsinoculatedwithSinorhizobiummeliloti,androotnodules(BioProjectPRJNA79233).489We also downloaded an additionalMe. truncatula RNAseq data covering 7 different490tissues(BioProjectPRJNA80163).RNAseqreadsweremappedoneachspeciesgenome491using STAR v2.5.3a (Dobin et al., 2013) and normalized gene expression values were492estimatedwithStringTiev1.3.4d(Perteaetal.,2015),bothwithdefaultparameters.493
Expression values lower than 1 were converted to 0 and considered not494expressed.Weadded1toallvalues,whichwerethenlog2transformed.Todetermine495tissue-preferential expression, we transformed the gene expression values in a496transformed z-score index (Kryuchkova-Mostacci and Robinson-Rechavi, 2016).497Dependingonthehighestexpressioninagiventissue,geneswithtransformedz-score498index>0.9werelabeledaspreferentiallyexpressed.499500
Micro-syntenicregionsbetweenGl.max,Gl.soja,andPh.vulgaris501
WeusedDAGCHAINERoutputfilestoidentifythemicrosyntenicregionsinGl.max,Gl.502soja,and Ph. vulgaris. Inparticular,wequeried thegenes fromOGswith significantly503larger (as predictedbyCAFE) sizes inGl.max in comparison to theGlycine node. For504each Gl. max gene, we considered only one collinear region from Gl. soja and Ph.505vulgaris.Whenmorethanonecollinearregionwasdetected,weselectedthatwiththe506highest DAGCHAINER alignment score. We visualized the microsynteny regions using507GenomeContextVieweravailableonLegumeInformationSystem(Clearyetal.,2017).508
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
16
QTLintervals509
Weobtainedthechromosomalcoordinatesof150QTLssignificantlyassociatedwith57510soybean traits (Fang et al., 2017). Chromosomal coordinates of soybean genes were511mappedtotheseQTLregionsusingbedtoolsv2.26.0(QuinlanandHall,2010).512513Acknowledgements514ThisworkwassupportedbyfundingfromCoordenaçãodeAperfeiçoamentodePessoal515deNível Superior (CAPES, Finance code 001), ConselhoNacional de Desenvolvimento516Científico e Tecnológico (CNPq), and Fundação Carlos Chagas Filho de Amparo à517PesquisadoEstadodoRiodeJaneiro(FAPERJ).518519
520
References521
Aköz,G.,andNordborg,M.(2019).TheAquilegiagenomerevealsahybridoriginofcoreeudicots.522bioRxiv.10.1101/407973523
Albert,V.A.,Barbazuk,W.B.,Depamphilis,C.W.,Der,J.P.,Leebens-Mack,J.,Ma,H.,Palmer,J.D.,524Rounsley,S.,Sankoff,D.,Schuster,S.C.,Soltis,D.E.,Soltis,P.S.,Wessler,S.R.,Wing,R.A.,Albert,525V.A.,Ammiraju,J.S.S.,Barbazuk,W.B.,Chamala,S.,Chanderbali,A.S.,Depamphilis,C.W.,Der,J.P.,526Determann,R.,Leebens-Mack,J.,Ma,H.,Ralph,P.,Rounsley,S.,Schuster,S.C.,Soltis,D.E.,Soltis,527P.S.,Talag,J.,Tomsho,L.,Walts,B.,Wanke,S.,Wing,R.A.,Albert,V.A.,Barbazuk,W.B.,Chamala,528S.,Chanderbali,A.S.,Chang,T.-H.,Determann,R.,Lan,T.,Soltis,D.E.,Soltis,P.S.,Arikit,S.,Axtell,529M.J.,Ayyampalayam,S.,Barbazuk,W.B.,Burnette,J.M.,Chamala,S.,DePaoli,E.,Depamphilis,530C.W.,Der,J.P.,Estill,J.C.,Farrell,N.P.,Harkess,A.,Jiao,Y.,Leebens-Mack,J.,Liu,K.,Mei,W.,531Meyers,B.C.,Shahid,S.,Wafula,E.,Walts,B.,Wessler,S.R.,Zhai,J.,Zhang,X.,Albert,V.A.,532Carretero-Paulet,L.,Depamphilis,C.W.,Der,J.P.,Jiao,Y.,Leebens-Mack,J.,Lyons,E.,Sankoff,D.,533Tang,H.,Wafula,E.,Zheng,C.,Albert,V.A.,Altman,N.S.,Barbazuk,W.B.,Carretero-Paulet,L.,534Depamphilis,C.W.,Der,J.P.,Estill,J.C.,Jiao,Y.,Leebens-Mack,J.,Liu,K.,Mei,W.,Wafula,E.,535Altman,N.S.,Arikit,S.,Axtell,M.J.,Chamala,S.,Chanderbali,A.S.,Chen,F.,Chen,J.-Q.,Chiang,V.,536DePaoli,E.,Depamphilis,C.W.,Der,J.P.,etal.(2013).TheAmborellaGenomeandtheEvolution537ofFloweringPlants.Science342,1241089-1241089.10.1126/science.1241089538
Altschul,S.F.,Madden,T.L.,Schaffer,A.A.,Zhang,J.,Zhang,Z.,Miller,W.,andLipman,D.J.(1997).539GappedBLASTandPSI-BLAST:anewgenerationofproteindatabasesearchprograms.Nucleic540AcidsRes25,3389-3402541
Arabidopsis-Genome-Initiative(2000).Analysisofthegenomesequenceofthefloweringplant542Arabidopsisthaliana.Nature408,796-815.10.1038/35048692543
Azani,N.,Babineau,M.,Bailey,C.D.,Banks,H.,Barbosa,A.,Pinto,R.B.,Boatwright,J.,Borges,L.,544Brown,G.,Bruneau,A.,Candido,E.,Cardoso,D.,Chung,K.-F.,Clark,R.,Conceição,A.D.,Crisp,M.,545Cubas,P.,Delgado-Salinas,A.,Dexter,K.,Doyle,J.,Duminil,J.,Egan,A.,DeLaEstrella,M.,Falcão,546M.,Filatov,D.,Fortuna-Perez,A.P.,Fortunato,R.,Gagnon,E.,Gasson,P.,Rando,J.G.,Azevedo547Tozzi,A.M.G.D.,Gunn,B.,Harris,D.,Haston,E.,Hawkins,J.,Herendeen,P.,Hughes,C.,Iganci,548J.V.,Javadi,F.,Kanu,S.A.,Kazempour-Osaloo,S.,Kite,G.,Klitgaard,B.,Kochanovski,F.,Koenen,549E.M.,Kovar,L.,Lavin,M.,Roux,M.L.,Lewis,G.,DeLima,H.,López-Roberts,M.C.,Mackinder,B.,550Maia,V.H.,Malécot,V.,Mansano,V.,Marazzi,B.,Mattapha,S.,Miller,J.,Mitsuyuki,C.,Moura,551
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
17
T.,Murphy,D.,Nageswara-Rao,M.,Nevado,B.,Neves,D.,Ojeda,D.,Pennington,R.T.,Prado,D.,552Prenner,G.,DeQueiroz,L.P.,Ramos,G.,RanzatoFilardi,F.,Ribeiro,P.,Rico-Arce,M.D.L.,553Sanderson,M.,Santos-Silva,J.,São-Mateus,W.B.,Silva,M.S.,Simon,M.,Sinou,C.,Snak,C.,De554Souza,É.,Sprent,J.,Steele,K.,Steier,J.,Steeves,R.,Stirton,C.,Tagane,S.,Torke,B.,Toyama,H.,555Cruz,D.T.D.,Vatanparast,M.,Wieringa,J.,Wink,M.,Wojciechowski,M.,Yahara,T.,Yi,T.,and556Zimmerman,E.(2017).AnewsubfamilyclassificationoftheLeguminosaebasedona557taxonomicallycomprehensivephylogeny–TheLegumePhylogenyWorkingGroup(LPWG).Taxon55866,44-77.10.12705/661.3559
Banks,J.A.,Nishiyama,T.,Hasebe,M.,Bowman,J.L.,Gribskov,M.,Depamphilis,C.,Albert,V.A.,560Aono,N.,Aoyama,T.,Ambrose,B.A.,Ashton,N.W.,Axtell,M.J.,Barker,E.,Barker,M.S.,561Bennetzen,J.L.,Bonawitz,N.D.,Chapple,C.,Cheng,C.,Correa,L.G.,Dacre,M.,Debarry,J.,562Dreyer,I.,Elias,M.,Engstrom,E.M.,Estelle,M.,Feng,L.,Finet,C.,Floyd,S.K.,Frommer,W.B.,563Fujita,T.,Gramzow,L.,Gutensohn,M.,Harholt,J.,Hattori,M.,Heyl,A.,Hirai,T.,Hiwatashi,Y.,564Ishikawa,M.,Iwata,M.,Karol,K.G.,Koehler,B.,Kolukisaoglu,U.,Kubo,M.,Kurata,T.,Lalonde,S.,565Li,K.,Li,Y.,Litt,A.,Lyons,E.,Manning,G.,Maruyama,T.,Michael,T.P.,Mikami,K.,Miyazaki,S.,566Morinaga,S.,Murata,T.,Mueller-Roeber,B.,Nelson,D.R.,Obara,M.,Oguri,Y.,Olmstead,R.G.,567Onodera,N.,Petersen,B.L.,Pils,B.,Prigge,M.,Rensing,S.A.,Riano-Pachon,D.M.,Roberts,A.W.,568Sato,Y.,Scheller,H.V.,Schulz,B.,Schulz,C.,Shakirov,E.V.,Shibagaki,N.,Shinohara,N.,Shippen,569D.E.,Sorensen,I.,Sotooka,R.,Sugimoto,N.,Sugita,M.,Sumikawa,N.,Tanurdzic,M.,Theissen,G.,570Ulvskov,P.,Wakazuki,S.,Weng,J.K.,Willats,W.W.,Wipf,D.,Wolf,P.G.,Yang,L.,Zimmer,A.D.,571Zhu,Q.,Mitros,T.,Hellsten,U.,Loque,D.,Otillar,R.,Salamov,A.,Schmutz,J.,Shapiro,H.,572Lindquist,E.,etal.(2011).TheSelaginellagenomeidentifiesgeneticchangesassociatedwiththe573evolutionofvascularplants.Science332,960-963.10.1126/science.1203810574
Bellieny-Rabelo,D.,Oliveira,A.E.,andVenancio,T.M.(2013).Impactofwhole-genomeandtandem575duplicationsintheexpansionandfunctionaldiversificationoftheF-boxfamilyinlegumes576(Fabaceae).PLoSOne8,e55127.10.1371/journal.pone.0055127577
Bertioli,D.J.,Cannon,S.B.,Froenicke,L.,Huang,G.,Farmer,A.D.,Cannon,E.K.,Liu,X.,Gao,D.,578Clevenger,J.,Dash,S.,Ren,L.,Moretzsohn,M.C.,Shirasawa,K.,Huang,W.,Vidigal,B.,Abernathy,579B.,Chu,Y.,Niederhuth,C.E.,Umale,P.,Araujo,A.C.,Kozik,A.,Kim,K.D.,Burow,M.D.,Varshney,580R.K.,Wang,X.,Zhang,X.,Barkley,N.,Guimaraes,P.M.,Isobe,S.,Guo,B.,Liao,B.,Stalker,H.T.,581Schmitz,R.J.,Scheffler,B.E.,Leal-Bertioli,S.C.,Xun,X.,Jackson,S.A.,Michelmore,R.,andOzias-582Akins,P.(2016).ThegenomesequencesofArachisduranensisandArachisipaensis,thediploid583ancestorsofcultivatedpeanut.NatGenet48,438-446.10.1038/ng.3517584
Bie,T.D.,Cristianini,N.,Demuth,J.P.,andHahn,M.W.(2006).CAFE:acomputationaltoolforthe585studyofgenefamilyevolution.22,1269-1271.10.1093/bioinformatics/btl097586
Birchler,J.A.,andVeitia,R.A.(2007).TheGeneBalanceHypothesis:FromClassicalGeneticsto587ModernGenomics.THEPLANTCELLONLINE19,395-402.10.1105/tpc.106.049338588
Birchler,J.A.,andVeitia,R.A.(2011).Protein–proteinandprotein–DNAdosagebalanceand589differentialparalogtranscriptionfactorretentioninpolyploids.FrontiersinPlantGeneticsand590Genomics2,64.10.3389/fpls.2011.00064591
Blanc,G.,Barakat,A.,Guyot,R.,Cooke,R.,andDelseny,M.(2000).Extensiveduplicationand592reshufflingintheArabidopsisgenome.PlantCell12,1093-1101593
Blanc,G.,andWolfe,K.H.(2004).Functionaldivergenceofduplicatedgenesformedbypolyploidy594duringArabidopsisevolution.ThePlantcell16,1679-1691.10.1105/tpc.021410595
Boscari,A.,DelGiudice,J.,Ferrarini,A.,Venturini,L.,Zaffini,A.L.,Delledonne,M.,andPuppo,A.596(2013).ExpressiondynamicsoftheMedicagotruncatulatranscriptomeduringthesymbiotic597interactionwithSinorhizobiummeliloti:whichrolefornitricoxide?PlantPhysiol161,425-598439.10.1104/pp.112.208538599
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
18
Cannon,S.B.(2013).Themodellegumegenomes.MethodsMolBiol1069,1-14.10.1007/978-1-60062703-613-9_1601
Cannon,S.B.,Ilut,D.,Farmer,A.D.,Maki,S.L.,May,G.D.,Singer,S.R.,andDoyle,J.J.(2010).602Polyploidydidnotpredatetheevolutionofnodulationinalllegumes.PLoSOne5,603e11630.10.1371/journal.pone.0011630604
Cannon,S.B.,Mitra,A.,Baumgarten,A.,Young,N.D.,andMay,G.(2004).Therolesofsegmentaland605tandemgeneduplicationintheevolutionoflargegenefamiliesinArabidopsisthaliana.BMC606PlantBiol4,10.10.1186/1471-2229-4-10607
Cannon,S.B.,Sterck,L.,Rombauts,S.,Sato,S.,Cheung,F.,Gouzy,J.,Wang,X.,Mudge,J.,Vasdewani,608J.,Schiex,T.,Spannagl,M.,Monaghan,E.,Nicholson,C.,Humphray,S.J.,Schoof,H.,Mayer,K.F.,609Rogers,J.,Quetier,F.,Oldroyd,G.E.,Debelle,F.,Cook,D.R.,Retzel,E.F.,Roe,B.A.,Town,C.D.,610Tabata,S.,VanDePeer,Y.,andYoung,N.D.(2006).Legumegenomeevolutionviewedthrough611theMedicagotruncatulaandLotusjaponicusgenomes.ProcNatlAcadSciUSA103,14959-61214964.10.1073/pnas.0603228103613
Cardoso,D.,DeQueiroz,L.P.,Pennington,R.T.,DeLima,H.C.,Fonty,E.,Wojciechowski,M.F.,and614Lavin,M.(2012).Revisitingthephylogenyofpapilionoidlegumes:Newinsightsfrom615comprehensivelysampledearly-branchinglineages.AmericanJournalofBotany99,1991-6162013.10.3732/ajb.1200380617
Cerri,M.R.,Frances,L.,Laloum,T.,Auriac,M.C.,Niebel,A.,Oldroyd,G.E.,Barker,D.G.,Fournier,J.,618andDeCarvalho-Niebel,F.(2012).MedicagotruncatulaERNtranscriptionfactors:regulatory619interplaywithNSP1/NSP2GRASfactorsandexpressiondynamicsthroughoutrhizobialinfection.620PlantPhysiol160,2155-2172.10.1104/pp.112.203190621
Chen,X.,Li,H.,Pandey,M.K.,Yang,Q.,Wang,X.,Garg,V.,Chi,X.,Doddamani,D.,Hong,Y.,622Upadhyaya,H.,Guo,H.,Khan,A.W.,Zhu,F.,Zhang,X.,Pan,L.,Pierce,G.J.,Zhou,G.,623Krishnamohan,K.A.,Chen,M.,Zhong,N.,Agarwal,G.,Li,S.,Chitikineni,A.,Zhang,G.Q.,Sharma,624S.,Chen,N.,Liu,H.,Janila,P.,Wang,M.,Wang,T.,Sun,J.,Li,X.,Li,C.,Yu,L.,Wen,S.,Singh,S.,625Yang,Z.,Zhao,J.,Zhang,C.,Yu,Y.,Bi,J.,Liu,Z.J.,Paterson,A.H.,Wang,S.,Liang,X.,Varshney,626R.K.,andYu,S.(2016).DraftgenomeofthepeanutA-genomeprogenitor(Arachisduranensis)627providesinsightsintogeocarpy,oilbiosynthesis,andallergens.ProcNatlAcadSciUSA113,6286785-6790.10.1073/pnas.1600899113629
Cheng,F.,Wu,J.,Cai,X.,Liang,J.,Freeling,M.,andWang,X.(2018).Generetention,fractionation630andsubgenomedifferencesinpolyploidplants.NatPlants4,258-268.10.1038/s41477-018-0136-6317632
Cleary,A.,Farmer,A.,andHancock,J.(2017).GenomeContextViewer:visualexplorationofmultiple633annotatedgenomesusingmicrosynteny.Bioinformatics34,1562-6341564.10.1093/bioinformatics/btx757635
Davis,J.C.,andPetrov,D.A.(2004).Preferentialduplicationofconservedproteinsineukaryotic636genomes.PLoSBiol2,E55.10.1371/journal.pbio.0020055637
Dobin,A.,Davis,C.A.,Schlesinger,F.,Drenkow,J.,Zaleski,C.,Jha,S.,Batut,P.,Chaisson,M.,and638Gingeras,T.R.(2013).STAR:ultrafastuniversalRNA-seqaligner.Bioinformatics29,15-63921.10.1093/bioinformatics/bts635640
Doebley,J.,andLukens,L.(1998).Transcriptionalregulatorsandtheevolutionofplantform.The641Plantcell10,1075-1082.10.1105/tpc.10.7.1075642
Dong,Y.,Yang,X.,Liu,J.,Wang,B.H.,Liu,B.L.,andWang,Y.Z.(2014).Podshatteringresistance643associatedwithdomesticationismediatedbyaNACgeneinsoybean.NatCommun5,6443352.10.1038/ncomms4352645
Du,H.,Yang,S.-S.,Liang,Z.,Feng,B.-R.,Liu,L.,Huang,Y.-B.,andTang,Y.-X.(2012).Genome-wide646analysisoftheMYBtranscriptionfactorsuperfamilyinsoybean.BMCPlantBiol12,647106.10.1186/1471-2229-12-106648
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
19
Enright,A.J.,VanDongen,S.,andOuzounis,C.A.(2002).Anefficientalgorithmforlarge-scale649detectionofproteinfamilies.NucleicAcidsRes30,1575-1584650
Fang,C.,Ma,Y.,Wu,S.,Liu,Z.,Wang,Z.,Yang,R.,Hu,G.,Zhou,Z.,Yu,H.,Zhang,M.,Pan,Y.,Zhou,651G.,Ren,H.,Du,W.,Yan,H.,Wang,Y.,Han,D.,Shen,Y.,Liu,S.,Liu,T.,Zhang,J.,Qin,H.,Yuan,J.,652Yuan,X.,Kong,F.,Liu,B.,Li,J.,Zhang,Z.,Wang,G.,Zhu,B.,andTian,Z.(2017).Genome-wide653associationstudiesdissectthegeneticnetworksunderlyingagronomicaltraitsinsoybean.654GenomeBiology18.10.1186/s13059-017-1289-9655
Feil,R.,Yoshida,T.,andKawabe,A.(2013).ImportanceofGeneDuplicationintheEvolutionof656GenomicImprintingRevealedbyMolecularEvolutionaryAnalysisoftheTypeIMADS-BoxGene657FamilyinArabidopsisSpecies.PLoSOne8,e73588.10.1371/journal.pone.0073588658
Filiault,D.L.,Ballerini,E.S.,Mandakova,T.,Akoz,G.,Derieg,N.J.,Schmutz,J.,Jenkins,J.,Grimwood,659J.,Shu,S.,Hayes,R.D.,Hellsten,U.,Barry,K.,Yan,J.,Mihaltcheva,S.,Karafiatova,M.,Nizhynska,660V.,Kramer,E.M.,Lysak,M.A.,Hodges,S.A.,andNordborg,M.(2018).TheAquilegiagenome661providesinsightintoadaptiveradiationandrevealsanextraordinarilypolymorphicchromosome662withauniquehistory.Elife7.10.7554/eLife.36426663
Finn,R.D.,Coggill,P.,Eberhardt,R.Y.,Eddy,S.R.,Mistry,J.,Mitchell,A.L.,Potter,S.C.,Punta,M.,664Qureshi,M.,Sangrador-Vegas,A.,Salazar,G.A.,Tate,J.,andBateman,A.(2016).ThePfam665proteinfamiliesdatabase:towardsamoresustainablefuture.NucleicAcidsRes44,D279-666285.10.1093/nar/gkv1344667
Freeling,M.(2009).Biasinplantgenecontentfollowingdifferentsortsofduplication:tandem,668whole-genome,segmental,orbytransposition.AnnuRevPlantBiol60,433-669453.10.1146/annurev.arplant.043008.092122670
Freeling,M.,Scanlon,M.J.,andFowler,J.E.(2015).Fractionationandsubfunctionalizationfollowing671genomeduplications:mechanismsthatdrivegenecontentandtheirconsequences.CurrOpin672GenetDev35,110-118.10.1016/j.gde.2015.11.002673
Gossani,C.,Bellieny-Rabelo,D.,andVenancio,T.M.(2014).Evolutionaryanalysisofmultidrug674resistancegenesinfungi-impactofgeneduplicationandfamilyconservation.FEBSJ281,4967-6754977.10.1111/febs.13046676
Graham,P.H.,andVance,C.P.(2003).Legumes:importanceandconstraintstogreateruse.PLANT677PHYSIOLOGY131,872-877.10.1104/pp.017004678
Griesmann,M.,Chang,Y.,Liu,X.,Song,Y.,Haberer,G.,Crook,M.B.,Billault-Penneteau,B.,679Lauressergues,D.,Keller,J.,Imanishi,L.,Roswanjaya,Y.P.,Kohlen,W.,Pujic,P.,Battenberg,K.,680Alloisio,N.,Liang,Y.,Hilhorst,H.,Salgado,M.G.,Hocher,V.,Gherbi,H.,Svistoonoff,S.,Doyle,J.J.,681He,S.,Xu,Y.,Xu,S.,Qu,J.,Gao,Q.,Fang,X.,Fu,Y.,Normand,P.,Berry,A.M.,Wall,L.G.,Ane,J.M.,682Pawlowski,K.,Xu,X.,Yang,H.,Spannagl,M.,Mayer,K.F.X.,Wong,G.K.,Parniske,M.,Delaux,683P.M.,andCheng,S.(2018).Phylogenomicsrevealsmultiplelossesofnitrogen-fixingrootnodule684symbiosis.Science361.10.1126/science.aat1743685
Gupta,S.,Nawaz,K.,Parween,S.,Roy,R.,Sahu,K.,KumarPole,A.,Khandal,H.,Srivastava,R.,Kumar686Parida,S.,andChattopadhyay,D.(2017).DraftgenomesequenceofCicerreticulatumL.,thewild687progenitorofchickpeaprovidesaresourceforagronomictraitimprovement.DNARes24,1-68810.10.1093/dnares/dsw042689
Haas,B.J.,Delcher,A.L.,Wortman,J.R.,andSalzberg,S.L.(2004).DAGchainer:Atoolformining690segmentalgenomeduplicationsandsynteny.Bioinformatics20,3643-6913646.10.1093/bioinformatics/bth397692
Han,M.V.,Thomas,G.W.,Lugo-Martinez,J.,andHahn,M.W.(2013).Estimatinggenegainandloss693ratesinthepresenceoferroringenomeassemblyandannotationusingCAFE3.MolBiolEvol30,6941987-1997.10.1093/molbev/mst100695
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
20
Han,Y.,Zhao,X.,Liu,D.,Li,Y.,Lightfoot,D.A.,Yang,Z.,Zhao,L.,Zhou,G.,Wang,Z.,Huang,L.,Zhang,696Z.,Qiu,L.,Zheng,H.,andLi,W.(2016).Domesticationfootprintsanchorgenomicregionsof697agronomicimportanceinsoybeans.NewPhytol209,871-884.10.1111/nph.13626698
Hane,J.K.,Ming,Y.,Kamphuis,L.G.,Nelson,M.N.,Garg,G.,Atkins,C.A.,Bayer,P.E.,Bravo,A.,699Bringans,S.,Cannon,S.,Edwards,D.,Foley,R.,Gao,L.L.,Harrison,M.J.,Huang,W.,Hurgobin,B.,700Li,S.,Liu,C.W.,Mcgrath,A.,Morahan,G.,Murray,J.,Weller,J.,Jian,J.,andSingh,K.B.(2017).A701comprehensivedraftgenomesequenceforlupin(Lupinusangustifolius),anemerginghealth702food:insightsintoplant-microbeinteractionsandlegumeevolution.PlantBiotechnolJ15,318-703330.10.1111/pbi.12615704
He,M.,He,C.-Q.,andDing,N.-Z.(2018).AbioticStresses:GeneralDefensesofLandPlantsand705ChancesforEngineeringMultistressTolerance.FrontiersinPlantScience7069.10.3389/fpls.2018.01771707
Jaillon,O.,Aury,J.-M.,Noel,B.,Policriti,A.,Clepet,C.,Casagrande,A.,Choisne,N.,Aubourg,S.,708Vitulo,N.,Jubin,C.,Vezzi,A.,Legeai,F.,Hugueney,P.,Dasilva,C.,Horner,D.,Mica,E.,Jublot,D.,709Poulain,J.,Bruyère,C.,Billault,A.,Segurens,B.,Gouyvenoux,M.,Ugarte,E.,Cattonaro,F.,710Anthouard,V.,Vico,V.,DelFabbro,C.,Alaux,M.,DiGaspero,G.,Dumas,V.,Felice,N.,Paillard,S.,711Juman,I.,Moroldo,M.,Scalabrin,S.,Canaguier,A.,LeClainche,I.,Malacrida,G.,Durand,E.,712Pesole,G.,Laucou,V.,Chatelet,P.,Merdinoglu,D.,Delledonne,M.,Pezzotti,M.,Lecharny,A.,713Scarpelli,C.,Artiguenave,F.,Pè,M.E.,Valle,G.,Morgante,M.,Caboche,M.,Adam-Blondon,A.-F.,714Weissenbach,J.,Quétier,F.,Wincker,P.,andCharacterization,F.-I.P.C.F.G.G.(2007).The715grapevinegenomesequencesuggestsancestralhexaploidizationinmajorangiospermphyla.716Nature449,463-467.10.1038/nature06148717
Jain,M.,Misra,G.,Patel,R.K.,Priya,P.,Jhanwar,S.,Khan,A.W.,Shah,N.,Singh,V.K.,Garg,R.,Jeena,718G.,Yadav,M.,Kant,C.,Sharma,P.,Yadav,G.,Bhatia,S.,Tyagi,A.K.,andChattopadhyay,D.719(2013).Adraftgenomesequenceofthepulsecropchickpea(CicerarietinumL.).PlantJ74,715-720729.10.1111/tpj.12173721
Jin,J.,Tian,F.,Yang,D.C.,Meng,Y.Q.,Kong,L.,Luo,J.,andGao,G.(2017).PlantTFDB4.0:towarda722centralhubfortranscriptionfactorsandregulatoryinteractionsinplants.NucleicAcidsRes45,723D1040-D1045.10.1093/nar/gkw982724
Jordan,I.K.,Wolf,Y.I.,andKoonin,E.V.(2004).Duplicatedgenesevolveslowerthansingletons725despitetheinitialrateincrease.BMCEvolBiol4,22.10.1186/1471-2148-4-22726
Kang,Y.J.,Kim,S.K.,Kim,M.Y.,Lestari,P.,Kim,K.H.,Ha,B.K.,Jun,T.H.,Hwang,W.J.,Lee,T.,Lee,J.,727Shim,S.,Yoon,M.Y.,Jang,Y.E.,Han,K.S.,Taeprayoon,P.,Yoon,N.,Somta,P.,Tanya,P.,Kim,K.S.,728Gwag,J.G.,Moon,J.K.,Lee,Y.H.,Park,B.S.,Bombarely,A.,Doyle,J.J.,Jackson,S.A.,Schafleitner,729R.,Srinives,P.,Varshney,R.K.,andLee,S.H.(2014).Genomesequenceofmungbeanandinsights730intoevolutionwithinVignaspecies.NatCommun5,5443.10.1038/ncomms6443731
Kroc,M.,Koczyk,G.,Swiecicki,W.,Kilian,A.,andNelson,M.N.(2014).Newevidenceofancestral732polyploidyintheGenistoidlegumeLupinusangustifoliusL.(narrow-leafedlupin).TheorAppl733Genet127,1237-1249.10.1007/s00122-014-2294-y734
Kryuchkova-Mostacci,N.,andRobinson-Rechavi,M.(2016).Abenchmarkofgeneexpressiontissue-735specificitymetrics.BriefBioinform,bbw008.10.1093/bib/bbw008736
Kumar,S.,Stecher,G.,Suleski,M.,andHedges,S.B.(2017).TimeTree:AResourceforTimelines,737Timetrees,andDivergenceTimes.MolBiolEvol34,1812-1819.10.1093/molbev/msx116738
Kumpeangkeaw,A.,Tan,D.,Fu,L.,Han,B.,Sun,X.,Hu,X.,Ding,Z.,andZhang,J.(2019).Asymmetric739birthanddeathoftypeIandtypeIIMADS-boxgenesubfamiliesintherubbertreefacilitating740laticiferdevelopment.PLoSOne14,e0214335.10.1371/journal.pone.0214335741
Lata,C.,Yadav,A.,andPras,M.(2011)."RoleofPlantTranscriptionFactorsinAbioticStress742Tolerance,"inAbioticStressResponseinPlants-Physiological,BiochemicalandGenetic743Perspectives,eds.A.Shanker&B.Venkateswarlu.InTech),269-296.744
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
21
Lehti-Shiu,M.D.,Panchy,N.,Wang,P.,Uygun,S.,andShiu,S.H.(2017).Diversity,expansion,and745evolutionarynoveltyofplantDNA-bindingtranscriptionfactorfamilies.BiochimBiophysActa746GeneRegulMech1860,3-20.10.1016/j.bbagrm.2016.08.005747
Lespinet,O.,Wolf,Y.I.,Koonin,E.V.,andAravind,L.(2002).Theroleoflineage-specificgenefamily748expansionintheevolutionofeukaryotes.GenomeRes12,1048-1059.10.1101/gr.174302749
Lewis,G.P.(2005).LegumesoftheWorld.RoyalBotanicGardens,Kew.750Li,Z.,Defoort,J.,Tasdighian,S.,Maere,S.,VanDePeer,Y.,andDeSmet,R.(2016).Gene751
DuplicabilityofCoreGenesIsHighlyConsistentacrossAllAngiosperms.PlantCell28,326-752344.10.1105/tpc.15.00877753
Libault,M.,Farmer,A.,Joshi,T.,Takahashi,K.,Langley,R.J.,Franklin,L.D.,He,J.,Xu,D.,May,G.,and754Stacey,G.(2010).AnintegratedtranscriptomeatlasofthecropmodelGlycinemax,anditsusein755comparativeanalysesinplants.PlantJ63,86-99.10.1111/j.1365-313X.2010.04222.x756
Liu,J.,Jung,C.,Xu,J.,Wang,H.,Deng,S.,Bernad,L.,Arenas-Huertero,C.,andChua,N.H.(2012).757Genome-wideanalysisuncoversregulationoflongintergenicnoncodingRNAsinArabidopsis.758PlantCell24,4333-4345.10.1105/tpc.112.102855759
Lu,Q.,Li,H.,Hong,Y.,Zhang,G.,Wen,S.,Li,X.,Zhou,G.,Li,S.,Liu,H.,Liu,H.,Liu,Z.,Varshney,R.K.,760Chen,X.,andLiang,X.(2018).GenomeSequencingandAnalysisofthePeanutB-Genome761Progenitor(Arachisipaensis).FrontiersinPlantScience9.10.3389/fpls.2018.00604762
Lynch,M.,andConery,J.S.(2000).Theevolutionaryfateandconsequencesofduplicategenes.763Science290,1151-1155764
Mochida,K.,Sakurai,T.,Seki,H.,Yoshida,T.,Takahagi,K.,Sawai,S.,Uchiyama,H.,Muranaka,T.,and765Saito,K.(2017).DraftgenomeassemblyandannotationofGlycyrrhizauralensis,amedicinal766legume.PlantJ89,181-194.10.1111/tpj.13385767
Nagata,T.,Hosaka-Sasaki,A.,andKikuchi,S.(2016).TheEvolutionaryDiversificationofGenesthat768EncodeTranscriptionFactorProteinsinPlants.73-97.10.1016/b978-0-12-800854-6.00005-1769
Nakano,T.(2006).Genome-WideAnalysisoftheERFGeneFamilyinArabidopsisandRice.PLANT770PHYSIOLOGY140,411-432.10.1104/pp.105.073783771
Nam,J.,Kim,J.,Lee,S.,An,G.,Ma,H.,andNei,M.(2004).TypeIMADS-boxgeneshaveexperienced772fasterbirth-and-deathevolutionthantypeIIMADS-boxgenesinangiosperms.Proceedingsofthe773NationalAcademyofSciences101,1910-1915.10.1073/pnas.0308430100774
O’rourke,J.A.,Iniguez,L.P.,Fu,F.,Bucciarelli,B.,Miller,S.S.,Jackson,S.A.,Mcclean,P.E.,Li,J.,Dai,X.,775Zhao,P.X.,Hernandez,G.,andVance,C.P.(2014).AnRNA-Seqbasedgeneexpressionatlasofthe776commonbean.BMCgenomics15,866.10.1186/1471-2164-15-866777
Panchy,N.,Lehti-Shiu,M.,andShiu,S.H.(2016).EvolutionofGeneDuplicationinPlants.Plant778Physiol171,2294-2316.10.1104/pp.16.00523779
Parween,S.,Nawaz,K.,Roy,R.,Pole,A.K.,VenkataSuresh,B.,Misra,G.,Jain,M.,Yadav,G.,Parida,780S.K.,Tyagi,A.K.,Bhatia,S.,andChattopadhyay,D.(2015).Anadvanceddraftgenomeassemblyof781adesitypechickpea(CicerarietinumL.).SciRep5,12806.10.1038/srep12806782
Pertea,M.,Pertea,G.M.,Antonescu,C.M.,Chang,T.C.,Mendell,J.T.,andSalzberg,S.L.(2015).783StringTieenablesimprovedreconstructionofatranscriptomefromRNA-seqreads.Nat784Biotechnol33,290-295.10.1038/nbt.3122785
Pickett,F.B.,andMeeks-Wagner,D.R.(1995).Seeingdouble:appreciatinggeneticredundancy.Plant786Cell7,1347-1356.10.1105/tpc.7.9.1347787
Pires,N.,andDolan,L.(2010).EarlyevolutionofbHLHproteinsinplants.PlantSignalBehav5,911-788912.10.4161/psb.5.7.12100789
Portereiko,M.F.,Lloyd,A.,Steffen,J.G.,Punwani,J.A.,Otsuga,D.,andDrews,G.N.(2006).AGL80is790requiredforcentralcellandendospermdevelopmentinArabidopsis.PlantCell18,1862-7911872.10.1105/tpc.106.040824792
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
22
Proulx,S.R.,Wang,Y.,Wang,X.,Tang,H.,Tan,X.,Ficklin,S.P.,Feltus,F.A.,andPaterson,A.H.(2011).793ModesofGeneDuplicationContributeDifferentlytoGeneticNoveltyandRedundancy,butShow794ParallelsacrossDivergentAngiosperms.PLoSOne6,e28150.10.1371/journal.pone.0028150795
Qiao,X.,Yin,H.,Li,L.,Wang,R.,Wu,J.,andZhang,S.(2018).DifferentModesofGeneDuplication796ShowDivergentEvolutionaryPatternsandContributeDifferentlytotheExpansionofGene797FamiliesInvolvedinImportantFruitTraitsinPear(Pyrusbretschneideri).FrontPlantSci9,798161.10.3389/fpls.2018.00161799
Quinlan,A.R.,andHall,I.M.(2010).BEDTools:aflexiblesuiteofutilitiesforcomparinggenomic800features.Bioinformatics26,841-842.10.1093/bioinformatics/btq033801
Salman-Minkov,A.,Sabath,N.,andMayrose,I.(2016).Whole-genomeduplicationasakeyfactorin802cropdomestication.NatPlants2.10.1038/nplants.2016.115803
Sato,S.,Nakamura,Y.,Kaneko,T.,Asamizu,E.,Kato,T.,Nakao,M.,Sasamoto,S.,Watanabe,A.,Ono,804A.,Kawashima,K.,Fujishiro,T.,Katoh,M.,Kohara,M.,Kishida,Y.,Minami,C.,Nakayama,S.,805Nakazaki,N.,Shimizu,Y.,Shinpo,S.,Takahashi,C.,Wada,T.,Yamada,M.,Ohmido,N.,Hayashi,806M.,Fukui,K.,Baba,T.,Nakamichi,T.,Mori,H.,andTabata,S.(2008).Genomestructureofthe807legume,Lotusjaponicus.DNARes15,227-239.10.1093/dnares/dsn008808
Schliep,K.P.(2011).phangorn:phylogeneticanalysisinR.Bioinformatics27,592-809593.10.1093/bioinformatics/btq706810
Schmutz,J.,Cannon,S.B.,Schlueter,J.,Ma,J.,Mitros,T.,Nelson,W.,Hyten,D.L.,Song,Q.,Thelen,811J.J.,Cheng,J.,Xu,D.,Hellsten,U.,May,G.D.,Yu,Y.,Sakurai,T.,Umezawa,T.,Bhattacharyya,M.K.,812Sandhu,D.,Valliyodan,B.,Lindquist,E.,Peto,M.,Grant,D.,Shu,S.,Goodstein,D.,Barry,K.,813Futrell-Griggs,M.,Abernathy,B.,Du,J.,Tian,Z.,Zhu,L.,Gill,N.,Joshi,T.,Libault,M.,Sethuraman,814A.,Zhang,X.C.,Shinozaki,K.,Nguyen,H.T.,Wing,R.A.,Cregan,P.,Specht,J.,Grimwood,J.,815Rokhsar,D.,Stacey,G.,Shoemaker,R.C.,andJackson,S.A.(2010).Genomesequenceofthe816palaeopolyploidsoybean.Nature463,178-183.10.1038/nature08670817
Schmutz,J.,Mcclean,P.E.,Mamidi,S.,Wu,G.A.,Cannon,S.B.,Grimwood,J.,Jenkins,J.,Shu,S.,Song,818Q.,Chavarro,C.,Torres-Torres,M.,Geffroy,V.,Moghaddam,S.M.,Gao,D.,Abernathy,B.,Barry,819K.,Blair,M.,Brick,M.A.,Chovatia,M.,Gepts,P.,Goodstein,D.M.,Gonzales,M.,Hellsten,U.,820Hyten,D.L.,Jia,G.,Kelly,J.D.,Kudrna,D.,Lee,R.,Richard,M.M.,Miklas,P.N.,Osorno,J.M.,821Rodrigues,J.,Thareau,V.,Urrea,C.A.,Wang,M.,Yu,Y.,Zhang,M.,Wing,R.A.,Cregan,P.B.,822Rokhsar,D.S.,andJackson,S.A.(2014).Areferencegenomeforcommonbeanandgenome-wide823analysisofdualdomestications.NatGenet46,707-713.10.1038/ng.3008824
Sedivy,E.J.,Wu,F.,andHanzawa,Y.(2017).Soybeandomestication:theorigin,geneticarchitecture825andmolecularbases.NewPhytologist214,539-553.10.1111/nph.14418826
Severin,A.J.,Cannon,S.B.,Graham,M.M.,Grant,D.,andShoemaker,R.C.(2011).Changesintwelve827homoeologousgenomicregionsinsoybeanfollowingthreeroundsofpolyploidy.PlantCell23,8283129-3136.10.1105/tpc.111.089573829
Severin,A.J.,Woody,J.L.,Bolon,Y.T.,Joseph,B.,Diers,B.W.,Farmer,A.D.,Muehlbauer,G.J.,Nelson,830R.T.,Grant,D.,Specht,J.E.,Graham,M.A.,Cannon,S.B.,May,G.D.,Vance,C.P.,andShoemaker,831R.C.(2010).RNA-SeqAtlasofGlycinemax:aguidetothesoybeantranscriptome.BMCPlantBiol83210,160.10.1186/1471-2229-10-160833
Shiu,S.-H.(2005).TranscriptionFactorFamiliesHaveMuchHigherExpansionRatesinPlantsthanin834Animals.PLANTPHYSIOLOGY139,18-26.10.1104/pp.105.065110835
Soltis,P.S.,Marchant,D.B.,VanDePeer,Y.,andSoltis,D.E.(2015).Polyploidyandgenomeevolution836inplants.CurrOpinGenetDev35,119-125.10.1016/j.gde.2015.11.003837
Stajich,J.E.,Block,D.,Boulez,K.,Brenner,S.E.,Chervitz,S.A.,Dagdigian,C.,Fuellen,G.,Gilbert,J.G.,838Korf,I.,Lapp,H.,Lehvaslaiho,H.,Matsalla,C.,Mungall,C.J.,Osborne,B.I.,Pocock,M.R.,839Schattner,P.,Senger,M.,Stein,L.D.,Stupka,E.,Wilkinson,M.D.,andBirney,E.(2002).The840Bioperltoolkit:Perlmodulesforthelifesciences.GenomeRes12,1611-1618.10.1101/gr.361602841
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
23
Stamatakis,A.(2014).RAxMLversion8:atoolforphylogeneticanalysisandpost-analysisoflarge842phylogenies.Bioinformatics30,1312-1313.10.1093/bioinformatics/btu033843
Suyama,M.,Torrents,D.,andBork,P.(2006).PAL2NAL:robustconversionofproteinsequence844alignmentsintothecorrespondingcodonalignments.NucleicAcidsRes34,W609-845612.10.1093/nar/gkl315846
Tamura,K.,Tao,Q.,Kumar,S.,andRusso,C.(2018).TheoreticalFoundationoftheRelTimeMethod847forEstimatingDivergenceTimesfromVariableEvolutionaryRates.MolBiolEvol35,1770-8481782.10.1093/molbev/msy044849
Tang,H.,Krishnakumar,V.,Bidwell,S.,Rosen,B.,Chan,A.,Zhou,S.,Gentzbittel,L.,Childs,K.L.,850Yandell,M.,Gundlach,H.,Mayer,K.F.,Schwartz,D.C.,andTown,C.D.(2014).Animproved851genomerelease(versionMt4.0)forthemodellegumeMedicagotruncatula.BMCgenomics15,852312.10.1186/1471-2164-15-312853
Tripathi,P.,Galla,A.,Rabara,R.C.,andRushton,P.J.(2016)."TranscriptionFactorsthatRegulate854DefenceResponsesandTheirUseinIncreasingDiseaseResistance,"inPlantPathogenResistance855Biotechnology,ed.D.B.Collinge.),109-129.856
Varshney,R.K.,Chen,W.,Li,Y.,Bharti,A.K.,Saxena,R.K.,Schlueter,J.A.,Donoghue,M.T.,Azam,S.,857Fan,G.,Whaley,A.M.,Farmer,A.D.,Sheridan,J.,Iwata,A.,Tuteja,R.,Penmetsa,R.V.,Wu,W.,858Upadhyaya,H.D.,Yang,S.P.,Shah,T.,Saxena,K.B.,Michael,T.,Mccombie,W.R.,Yang,B.,Zhang,859G.,Yang,H.,Wang,J.,Spillane,C.,Cook,D.R.,May,G.D.,Xu,X.,andJackson,S.A.(2011).Draft860genomesequenceofpigeonpea(Cajanuscajan),anorphanlegumecropofresource-poor861farmers.NatBiotechnol30,83-89.10.1038/nbt.2022862
Varshney,R.K.,Song,C.,Saxena,R.K.,Azam,S.,Yu,S.,Sharpe,A.G.,Cannon,S.,Baek,J.,Rosen,B.D.,863Tar'an,B.,Millan,T.,Zhang,X.,Ramsay,L.D.,Iwata,A.,Wang,Y.,Nelson,W.,Farmer,A.D.,Gaur,864P.M.,Soderlund,C.,Penmetsa,R.V.,Xu,C.,Bharti,A.K.,He,W.,Winter,P.,Zhao,S.,Hane,J.K.,865Carrasquilla-Garcia,N.,Condie,J.A.,Upadhyaya,H.D.,Luo,M.C.,Thudi,M.,Gowda,C.L.,Singh,866N.P.,Lichtenzveig,J.,Gali,K.K.,Rubio,J.,Nadarajan,N.,Dolezel,J.,Bansal,K.C.,Xu,X.,Edwards,867D.,Zhang,G.,Kahl,G.,Gil,J.,Singh,K.B.,Datta,S.K.,Jackson,S.A.,Wang,J.,andCook,D.R.868(2013).Draftgenomesequenceofchickpea(Cicerarietinum)providesaresourcefortrait869improvement.NatBiotechnol31,240-246.10.1038/nbt.2491870
Vernie,T.,Moreau,S.,DeBilly,F.,Plet,J.,Combier,J.P.,Rogers,C.,Oldroyd,G.,Frugier,F.,Niebel,A.,871andGamas,P.(2008).EFDIsanERFtranscriptionfactorinvolvedinthecontrolofnodulenumber872anddifferentiationinMedicagotruncatula.PlantCell20,2696-2713.10.1105/tpc.108.059857873
Vidal,N.M.,Grazziotin,A.L.,Iyer,L.M.,Aravind,L.,andVenancio,T.M.(2016).Transcriptionfactors,874chromatinproteinsandthediversificationofHemiptera.InsectBiochemMolBiol69,1-87513.10.1016/j.ibmb.2015.07.001876
Vision,T.J.,Brown,D.G.,andTanksley,S.D.(2000).Theoriginsofgenomicduplicationsin877Arabidopsis.Science290,2114-2117878
Wang,J.,Sun,P.,Li,Y.,Liu,Y.,Yu,J.,Ma,X.,Sun,S.,Yang,N.,Xia,R.,Lei,T.,Liu,X.,Jiao,B.,Xing,Y.,879Ge,W.,Wang,L.,Wang,Z.,Song,X.,Yuan,M.,Guo,D.,Zhang,L.,Zhang,J.,Jin,D.,Chen,W.,Pan,880Y.,Liu,T.,Jin,L.,Sun,J.,Cheng,R.,Duan,X.,Shen,S.,Qin,J.,Zhang,M.C.,Paterson,A.H.,and881Wang,X.(2017).HierarchicallyAligning10LegumeGenomesEstablishesaFamily-LevelGenomics882Platform.PlantPhysiol174,284-300.10.1104/pp.16.01981883
Wright,E.S.(2015).DECIPHER:harnessinglocalsequencecontexttoimproveproteinmultiple884sequencealignment.BMCBioinformatics16,322.10.1186/s12859-015-0749-z885
Xie,M.,Chung,C.Y.-L.,Li,M.-W.,Wong,F.-L.,Wang,X.,Liu,A.,Wang,Z.,Leung,A.K.-Y.,Wong,T.-H.,886Tong,S.-W.,Xiao,Z.,Fan,K.,Ng,M.-S.,Qi,X.,Yang,L.,Deng,T.,He,L.,Chen,L.,Fu,A.,Ding,Q.,887He,J.,Chung,G.,Isobe,S.,Tanabata,T.,Valliyodan,B.,Nguyen,H.T.,Cannon,S.B.,Foyer,C.H.,888Chan,T.-F.,andLam,H.-M.(2019).Areference-gradewildsoybeangenome.NatCommun88910.10.1038/s41467-019-09142-9890
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
24
Yang,K.,Tian,Z.,Chen,C.,Luo,L.,Zhao,B.,Wang,Z.,Yu,L.,Li,Y.,Sun,Y.,Li,W.,Chen,Y.,Zhang,Y.,891Ai,D.,Zhao,J.,Shang,C.,Ma,Y.,Wu,B.,Wang,M.,Gao,L.,Sun,D.,Zhang,P.,Guo,F.,Wang,W.,892Wang,J.,Varshney,R.K.,Ling,H.Q.,andWan,P.(2015).Genomesequencingofadzukibean893(Vignaangularis)providesinsightintohighstarchandlowfataccumulationanddomestication.894ProcNatlAcadSciUSA112,13213-13218.10.1073/pnas.1420949112895
Young,N.D.,Debelle,F.,Oldroyd,G.E.,Geurts,R.,Cannon,S.B.,Udvardi,M.K.,Benedito,V.A.,Mayer,896K.F.,Gouzy,J.,Schoof,H.,VanDePeer,Y.,Proost,S.,Cook,D.R.,Meyers,B.C.,Spannagl,M.,897Cheung,F.,DeMita,S.,Krishnakumar,V.,Gundlach,H.,Zhou,S.,Mudge,J.,Bharti,A.K.,Murray,898J.D.,Naoumkina,M.A.,Rosen,B.,Silverstein,K.A.,Tang,H.,Rombauts,S.,Zhao,P.X.,Zhou,P.,899Barbe,V.,Bardou,P.,Bechner,M.,Bellec,A.,Berger,A.,Berges,H.,Bidwell,S.,Bisseling,T.,900Choisne,N.,Couloux,A.,Denny,R.,Deshpande,S.,Dai,X.,Doyle,J.J.,Dudez,A.M.,Farmer,A.D.,901Fouteau,S.,Franken,C.,Gibelin,C.,Gish,J.,Goldstein,S.,Gonzalez,A.J.,Green,P.J.,Hallab,A.,902Hartog,M.,Hua,A.,Humphray,S.J.,Jeong,D.H.,Jing,Y.,Jocker,A.,Kenton,S.M.,Kim,D.J.,Klee,903K.,Lai,H.,Lang,C.,Lin,S.,Macmil,S.L.,Magdelenat,G.,Matthews,L.,Mccorrison,J.,Monaghan,904E.L.,Mun,J.H.,Najar,F.Z.,Nicholson,C.,Noirot,C.,O'bleness,M.,Paule,C.R.,Poulain,J.,Prion,905F.,Qin,B.,Qu,C.,Retzel,E.F.,Riddle,C.,Sallet,E.,Samain,S.,Samson,N.,Sanders,I.,Saurat,O.,906Scarpelli,C.,Schiex,T.,Segurens,B.,Severin,A.J.,Sherrier,D.J.,Shi,R.,Sims,S.,Singer,S.R.,907Sinharoy,S.,Sterck,L.,Viollet,A.,Wang,B.B.,etal.(2011).TheMedicagogenomeprovides908insightintotheevolutionofrhizobialsymbioses.Nature480,520-524.10.1038/nature10625909
Zhang,H.,Jin,J.,Tang,L.,Zhao,Y.,Gu,X.,Gao,G.,andLuo,J.(2011).PlantTFDB2.0:updateand910improvementofthecomprehensiveplanttranscriptionfactordatabase.NucleicAcidsRes39,911D1114-D1117.10.1093/nar/gkq1141912
Zhou,Z.,Jiang,Y.,Wang,Z.,Gou,Z.,Lyu,J.,Li,W.,Yu,Y.,Shu,L.,Zhao,Y.,Ma,Y.,Fang,C.,Shen,Y.,913Liu,T.,Li,C.,Li,Q.,Wu,M.,Wang,M.,Wu,Y.,Dong,Y.,Wan,W.,Wang,X.,Ding,Z.,Gao,Y.,914Xiang,H.,Zhu,B.,Lee,S.H.,Wang,W.,andTian,Z.(2015).Resequencing302wildandcultivated915accessionsidentifiesgenesrelatedtodomesticationandimprovementinsoybean.NatBiotechnol91633,408-414.10.1038/nbt.3096917
918
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
25
Tables919
Table1:Plantspeciesusedinthisstudyandtheircorrespondinggenomeassemblyversions.Non-legume920speciesaremarkedwithasterisks.921922
923
924925926927928929930931932933934935936
Scientificname Genomeassemblyversion Genes Chromosomenumber
Reference
Cajanuscajan GCA000340665.1 48,331 11 (Varshneyetal.,2011)
Phaseolusvulgaris Pvulgaris218v1.0 27,197 11 (Schmutzetal.,2014)
Vignaradiata Vradiataver6 35,143 11 (Kangetal.,2014)
Vignaangularis Vigan1.1 34,172 11 (Yangetal.,2015)
Glycinemax Gmax275Wm82.a2.v1 56,044 20 (Schmutzetal.,2010)
Glycinesoja W05v1.0 55,539 20 (Xieetal.,2019)
Cicerreticulatum WCGAPv1.0 25,680 8 (Guptaetal.,2017)
Cicerarietinum ASM33114v1 33,107 8 (Jainetal.,2013;Varshneyetal.,2013;Parweenetal.,2015)
Medicagotruncatula Mtruncatula285Mt4.0v1 50,894 8 (Youngetal.,2011;Tangetal.,2014)
Glycyrrhizauralensis Gur.draft-genome.20151208 34,445 8 (Mochidaetal.,2017)
Lotusjaponicus Genomeassemblybuild3.0 39,734 6 (Satoetal.,2008)
Lupinusangustifolius v1.0 33,076 20 (Haneetal.,2017)
Arachisipaensis Araip1.0 46,410 10 (Bertiolietal.,2016)
Arachisduranensis Aradu1.0 42,562 10 (Bertiolietal.,2016;Chenetal.,2016)
Chamaecristafasciculata version.1 21,781 8 (Griesmannetal.,2018)
Arabidopsisthaliana* Athaliana167TAIR10 27,416 5 (Arabidopsis-Genome-Initiative,2000)
Vitisvinifera* Vvinifera145Genoscope.12X 26,346 19 (Jaillonetal.,2007)
Amborellatrichopoda* AmTrv1.1 26,846 13 (Albertetal.,2013)
Aquilegiacoerulea* Aquilegiacoeruleav3.1 30,023 7 (Filiaultetal.,2018)
Selaginellamoellendorffii* Selaginellamoellendorffiiv1.0 22,285 10 (Banksetal.,2011)
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
26
Table2:Numberoftranscriptionfactorsidentifiedineachfamilyacrossspecies.937
Supe
rfam
ily
TFfa
mily
Ca.cajan
Ph.vulga
ris
Vi.rad
iata
Vi.a
ngularis
Gl.m
ax
Gl.soja
Ci.reticu
latum
Ci.a
rietin
um
Me.trun
catula
Gl.u
ralensis
Lo.jap
onicu
s
Lu.a
ngustifolius
Ar.ipa
ensis
Ar.d
uran
ensis
Ch.fascic
ulata
Ar.tha
liana
Vi.vinife
ra
Aq.coe
rulea
Am.tric
hopo
da
Se.m
oellend
orffii
AP2/ERFAP2 26 31 24 26 49 52 24 25 31 24 28 38 30 24 22 18 20 16 12 16ERF 147 149 156 161 290 279 115 129 186 112 107 189 143 130 131 122 72 78 64 35RAV 2 3 3 3 5 5 2 3 3 3 3 6 3 3 4 6 3 4 1 2
BBR-BPC 5 4 4 6 4 5 2 3 2 4 5 10 5 6 4 7 5 4 4 1 BES1 6 6 6 8 10 8 7 6 7 6 5 12 9 8 7 8 6 4 5 5
B3 ARF 23 27 28 27 42 45 25 25 38 11 25 43 31 30 22 22 18 13 13 7B3 49 48 36 33 77 79 23 35 132 49 61 52 69 68 29 73 29 104 18 19
C3H 44 44 42 43 73 80 45 50 58 40 54 65 48 45 37 50 47 40 32 29
C2C2Zn-finger
CO-like 10 13 12 10 22 22 10 10 11 11 8 20 11 11 10 17 6 8 5 4Dof 37 42 42 40 74 73 34 37 40 41 30 67 39 38 39 36 22 29 18 11GATA 33 32 29 27 61 63 26 27 42 30 20 45 25 25 26 30 20 29 21 6LSD 7 4 4 4 5 4 4 4 4 2 5 7 4 4 3 3 3 3 2 2YABBY 9 8 10 9 12 13 7 8 7 8 7 14 7 8 9 6 7 5 6 0C2H2 223 134 128 128 219 218 78 108 111 108 93 183 136 127 157 104 64 87 85 35
CAMTA 10 8 11 6 14 15 7 7 8 8 7 9 8 7 6 6 5 5 3 5 CPP 8 9 10 7 16 15 7 8 12 9 10 13 14 13 9 10 8 7 6 5 DBB 11 12 14 13 16 16 6 7 8 8 8 13 7 8 10 8 8 5 4 4 E2F/DP 7 7 7 9 12 13 6 6 6 10 7 12 10 9 6 8 7 7 5 4
MADS M-type 48 44 23 31 77 80 34 43 101 29 34 38 28 23 32 65 17 50 19 12MIKC 23 34 47 27 75 71 16 51 38 14 21 27 44 48 28 42 35 24 15 3
EIL 6 7 4 5 10 12 6 7 13 9 7 9 7 6 17 6 2 2 2 6 FAR1 49 25 67 20 68 79 17 37 76 79 29 10 298 198 41 17 19 92 10 0 LFY 1 1 1 1 3 2 1 1 1 2 0 3 2 2 1 1 1 1 1 1
GARP ARR-B 15 15 16 18 31 34 13 14 28 14 10 19 12 11 9 15 12 13 7 6G2-like 44 50 52 46 96 100 36 41 44 49 39 78 49 46 47 42 39 34 27 20
GRAS 57 55 58 58 108 111 47 46 66 53 62 54 49 48 52 34 43 36 44 47 GRF 10 10 9 8 20 20 8 8 8 10 9 15 11 11 10 9 8 7 6 4 GeBP 7 5 8 5 9 11 7 8 7 5 4 12 4 7 4 23 1 13 6 1
Homeobox
HB-PHD 2 3 3 3 6 6 2 2 2 3 3 2 4 3 3 2 2 2 2 1HD-ZIP 51 55 54 54 89 92 45 49 57 54 40 82 45 43 45 48 33 30 22 9TALE 34 32 31 28 57 61 22 24 23 32 25 45 28 30 25 21 22 17 12 7WOX 18 18 20 61 32 33 14 18 19 17 14 31 15 15 13 16 10 10 9 8Other 6 7 8 7 15 13 8 8 7 8 9 12 9 7 6 6 7 6 5 4
HRT-like 1 1 1 1 1 1 0 2 3 1 1 1 1 1 1 2 1 1 1 2 HSF 27 29 32 32 46 45 20 22 28 32 10 33 22 22 32 24 18 15 12 6 LBD 52 48 47 49 75 75 35 47 59 53 41 70 52 49 52 41 43 29 23 13
MYB MYB 173 170 172 169 294 288 80 132 162 146 100 210 136 135 152 146 138 97 61 21MYBrelated
84 82 73 75 162 165 53 62 93 95 62 107 76 70 62 72 57 55 36 34 NAC 91 90 92 90 142 139 61 77 97 72 81 117 85 84 88 112 70 76 45 21 NF-X1 1 2 2 2 3 3 3 3 3 3 3 2 2 2 2 2 3 3 2 2
NF-YNF-YA 11 9 9 9 16 18 7 8 8 9 8 13 12 11 7 10 7 5 5 1NF-YB 23 19 23 18 36 36 19 22 24 23 18 25 16 14 20 13 17 14 9 7NF-YC 14 15 14 14 22 22 11 11 15 14 9 16 13 12 10 14 8 10 8 5
NZZ/SPL 1 3 2 3 2 2 2 2 1 1 2 3 4 3 2 1 2 2 1 3 Nin-like 13 12 11 12 27 26 10 9 13 12 9 15 16 12 11 14 8 9 7 7 S1Fa-like 2 4 3 3 4 4 3 4 5 3 3 6 2 2 5 3 3 1 1 0 SAP 3 5 3 1 5 5 2 2 3 3 4 5 5 5 1 4 3 3 2 1 SBP 24 23 22 23 42 40 17 19 24 25 16 35 19 18 19 16 18 13 12 9 SRS 11 10 11 10 22 22 8 8 11 10 11 12 11 10 10 11 6 4 6 4 STAT 1 1 1 1 1 1 1 1 1 1 1 1 2 1 0 2 1 1 1 2 TCP 25 27 27 27 55 56 19 24 21 24 24 42 30 23 23 24 15 16 14 4 Trihelix 38 41 41 42 70 74 28 38 36 36 32 64 40 43 39 30 26 33 30 34 VOZ 2 3 3 3 4 5 2 2 2 3 2 4 2 2 2 2 2 2 1 0 WRKY 96 90 95 94 171 170 59 81 104 79 72 112 85 84 71 73 59 38 31 12 Whirly 3 3 3 5 7 7 3 3 3 2 3 4 2 3 3 3 2 3 2 1 ZF-HD 18 19 18 18 41 45 14 15 17 15 20 26 13 15 31 17 10 16 9 7 bHLH 156 164 161 159 321 320 110 141 162 141 129 207 150 145 137 142 106 99 72 45 bZIP 72 79 85 85 141 146 61 70 91 73 64 128 73 74 65 77 50 46 41 28
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
27
Table3:Prevalenceofdifferentmodesofduplicationamongtranscriptionfactors.938
Species Total
TFs
Singletons(%) Duplicates
(%)
SD TD PD rTE dTE DDCa.cajan 1970 314(15.94) 1656(84.06) 548 90 21 40 397 560Ph.vulgaris 1891 257(13.59) 1634(86.41) 914 99 8 16 244 353Vi.radiata 1918 253(13.19) 1665(86.81) 743 115 9 24 319 455Vi.angularis 1877 307(16.36) 1570(83.64) 842 74 9 14 220 411Gl.max 3407 188(5.52) 3219(94.48) 264
5
79 14 28 242 211Gl.soja 3445 224(6.5) 3221(93.5) 264
8
85 13 24 231 220Ci.reticulatum 1332 365(27.4) 967(72.6) 350 45 3 24 207 338Ci.arietinum 1660 339(20.42) 1321(79.58) 521 101 7 17 267 408Me.truncatula 2182 430(19.71) 1752(80.29) 648 209 32 27 235 601Gl.uralensis 1738 403(23.19) 1335(76.81) 482 36 3 19 281 514Lo.japonicus 1514 428(28.27) 1086(71.73) 195 39 6 25 345 476Lu.angustifolius 2493 223(8.95) 2270(91.05) 165
4
51 0 0 0 565Ar.ipaensis 2073 365(17.61) 1708(82.39) 336 113 21 32 434 772Ar.duranensis 1902 363(19.09) 1539(80.91) 410 95 5 27 373 629Ch.fasciculata 1709 426(24.93) 1283(75.07) 186 36 2 36 383 640Ar.thaliana 1736 392(22.58) 1344(77.42) 707 81 13 15 152 376Vi.vinifera 1274 478(37.52) 796(62.48) 303 82 13 5 193 200Aq.coerulea 1376 552(40.12) 824(59.88) 130 86 14 0 0 594Am.trichopoda 923 560(60.67) 363(39.33) 14 33 4 0 0 312Se.moellendorffii 588 278(47.28) 310(52.72) 79 10 4 0 0 217
Abbreviations:Segmentalduplicates(SD);Tandemduplicates(TD);Proximalduplicates(PD);Retrotransposonmediatedduplicates939(rTE);Transposonmediatedduplicates(dTE);Dispersedduplicates(DD).940 941
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
28
Table4:Percentageofsingletonswithincollinearregionsinareferencespecies.942
Species Referenceoutgroupspecies Singletons Singletonsin
collinearregions SDSDincollinearregionsinthe
referenceoutgroup
Ca.cajan Vi.vinifera 314 132(42%) 548 430(78%)Ph.vulgaris Vi.vinifera 257 142(55%) 914 750(82%)
Vi.radiata Vi.vinifera 253 120(47%) 743 595(80%)
Vi.angularis Vi.vinifera 307 161(52%) 842 659(78%)
Gl.max Ph.vulgaris 188 93(49%) 2645 2546(96%)
Gl.soja Ph.vulgaris 224 105(47%) 2648 2546(96%)
Ci.reticulatum Vi.vinifera 365 199(55%) 350 286(81%)
Ci.arietinum Vi.vinifera 339 201(59%) 521 414(79%)
Me.truncatula Vi.vinifera 430 184(43%) 648 520(80%)
Gl.uralensis Vi.vinifera 403 170(42%) 482 376(78%)
Lo.japonicus Vi.vinifera 428 177(41%) 195 154(78%)
Lu.angustifolius Vi.vinifera 223 73(32%) 1654 1114(67%)
Ar.ipaensis Vi.vinifera 365 171(47%) 336 275(81%)
Ar.duranensis Vi.vinifera 363 152(42%) 410 327(79%)
Ch.fasciculata Vi.vinifera 426 144(34%) 186 95(51%)
Ar.thaliana Vi.vinifera 392 226(58%) 707 512(72%)
Vi.vinifera Aq.coerulea 478 295(62%) 303 285(94%)
Aq.coerulea Am.trichopoda 552 177(32%) 130 65(50%)
Am.trichopoda Se.moellendorffii 560 0(0%) 14 0(0%)
Abbreviations:Segmentalduplicates(SD).943944
945
946
947
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
29
Table 5: Number of genes and prevalence of modes of duplication in TF orthologous groups that948expandedinlegumes.949
Species Total SD TD PD rTE dTE DD
Ca.cajan 38 14 3 0 0 7 14
Ph.vulgaris 36 14 6 1 0 7 8
Vi.radiata 36 15 4 1 0 12 4
Vi.angularis 38 14 9 0 0 9 6
Gl.max 67 48 5 3 0 10 1
Gl.soja 69 50 2 3 0 13 1
Ci.reticulatum 22 6 0 0 0 7 9
Ci.arietinum 24 11 0 2 0 1 10
Me.truncatula 39 11 4 2 0 12 10
Gl.uralensis 31 12 0 0 0 4 15
Lo.japonicus 28 6 2 0 0 13 7
Lu.angustifolius 60 42 6 0 0 0 12
Ar.ipaensis 31 11 4 1 0 9 6
Ar.duranensis 32 9 5 1 0 10 7
Ch.fasciculata 34 2 6 0 0 13 13
Abbreviations:Segmentalduplicates(SD);Tandemduplicates(TD);Proximalduplicates(PD);Retrotransposonmediatedduplicates950(rTE);Transposonmediatedduplicates(dTE);Dispersedduplicates(DD).951 952
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
30
953
Table6:Orthologousgroupswithsignificantly(p-value<0.05)rapidexpansioninlegumes.954
Description ARF:1 ARF:2 bHLH:5 bHLH:12 ERF:10 LBD:8 M-type:1 MYB:1 MYB:25 NAC:3
Ca.cajan 4 2 3 3 4 3 7 5 3 4Ph.vulgaris 3 3 4 2 4 3 5 6 2 4
Vi.radiata 4 3 4 2 3 4 4 5 3 4Vi.angularis 3 3 4 2 4 3 5 8 3 3Gl.max 5 6 8 6 5 4 14 9 5 5Gl.soja 6 7 8 5 6 4 14 9 5 5Ci.reticulatum 3 3 2 4 3 1 4 1 1 3Ci.arietinum 3 3 2 4 3 2 4 1 1 3Me.truncatula 3 3 3 4 2 2 7 5 7 3Gl.uralensis 0 3 3 2 4 4 6 2 3 4Lo.japonicus 3 3 3 3 3 2 4 4 1 3Lu.angustifolius 5 5 6 5 5 6 18 2 3 5Ar.ipaensis 4 3 4 2 3 3 4 3 1 5Ar.duranensis 4 3 4 2 2 2 5 3 2 5Ch.fasciculata 3 3 4 3 3 3 3 4 5 3Ar.thaliana 1 0 0 0 1 1 0 0 0 1Vi.vinifera 1 1 1 1 0 2 1 1 0 2Aq.coerulea 1 1 1 1 1 1 0 0 0 1Am.trichopoda 0 0 0 1 0 0 0 0 0 0Se.moellendorffii 0 0 0 0 0 0 0 0 0 0
955
956
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
31
Figures957
958
959Figure1:Absoluteandrelativenumberoftranscriptionfactorsineachspecies.Greybarsandtheorange960linerepresenttheabsolutenumberandpercentageoftranscriptionfactorsineachspecies,respectively.961Legumesandnon-legumesareseparatedbyadottedverticalline.962
963
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
32
964
965Figure2:Ratio(inlog2scale)ofthesizesofeachtranscriptionfactorfamilyeachspeciesinrelationtoVi.966vinifera. Values greater or smaller than zero represent transcription factor families that are relatively967largerorsmallerinagivenspecies(incolumns)incomparisontoVi.vinifera,respectively.Thenumbersin968parenthesesstandfortheabsolutesizeofthatparticularfamilyinVi.vinifera.969
970
971
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
33
972
973
974Figure3:Expression levelsandKa/Ks ratioof singleton transcription factors.A.Expression (inFPKM)of975syntenic- and non-syntenic singletons in three legume species and in Arabidopsis thaliana. B. Ka/Ks976distribution of syntenic singletons and syntenic segmental duplicates. Syntenic singletons are977transcriptionfactorsgenes,withoutacloseparalog, thatare located inasyntenicregion inareference978outgroup.Segmentalduplicatesareparalogoustranscriptionfactorswithpreservedsyntenyinthesame979genome,aswellasinthegenomeofareferenceoutgroup.Ph.vulgariswasusedasreferenceforGl.max980and Vi. vinifera was used as reference for the other three species. Statistical significance test was981performedusingtheMann-WhitneyUtestandasterisk(*)markindicatesp-value<0.05.982
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
34
983Figure 4: Species tree showing number of transcription factor orthologous groups that gained or lost984genes. We used different rates of evolution in different lineages, which are represented as branch985styles/colors. Known polyploidization events are marked with stars. Green and red triangles refer to986nodes with more expansions and contractions, respectively. Numbers of expanded and contracted987orthologousgroupsareshowningreenandred,respectively.988
989
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
35
990991
Figure5:Distributionofsynonymoussubstitutionrates(Ks)of138orthologousgroupswithgenegainin992legumes.Genome-wideKsdistributionsareshownasdensityplotsonthetoppanel.Thebottompanel993showsKsdistributionsofsegmentalanddispersedduplicategenepairs.994
995
996
997
998
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint
36
999
1000Figure6:A.PhylogeneticreconstructionoftheorthologousgroupbHLH:12,whichisexpandedinlegumes.B.GeneexpressionpatternsofbHLH:12genesin1001Me.truncatula(BioProject:PRJNA80163),Gl.max(Libaultetal.,2010)andPh.vulgaris(O’Rourkeetal.,2014),showingatrendforgreaterexpressioninroots1002andnodules.1003
.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available
The copyright holder for this preprint (which wasthis version posted November 21, 2019. ; https://doi.org/10.1101/849778doi: bioRxiv preprint