agbiodata pan-genome discussion · topics: 1. what is a "pan-genome” 2. examples of...
TRANSCRIPT
AgBioDatapan-genomediscussion
Wednesday,May6th,2020
We’vebeentalkingaboutpan-genomesforyearsnow,butwhatexactlyisapan-genome,whatisitgoodfor,andhowcandatabepresentedtohelpresearchers?
?
?
? ?
?
?
?
Topics:1. Whatisa"pan-genome”
2. Examplesofpan-genomes,tools,analyses.
3. Discussion:howcanapan-genomebeuseful,whoisitusefulfor,howtomakedataaccessible,whatistheroleofdataportals?
Lighteningtalks:EloiDurant-PanacheFakepan-genomeviewerAlanCleary–GenomeContextViewerMarcelaTello-Ruiz-pangenomebrowsersforrice,maize,andgrape
Whatisapan-genome?
• Genomeorgenefocused.• Couldbereference-basedorall-by-all.• CapturelargeorsmallstructuralvariaUon.• Withinaspeciesorclade.• Isitagraph,alignment,orsetofsyntenUcrelaUonships.
• Whenisitapan-genome,whenisitvariaUondata?Isdiversitydataaformofpan-genome?
Somepan-genomeportalexamples
hXps://phytozome-next.jgi.doe.gov/cowpeapan
hXps://phytozome-next.jgi.doe.gov/brachypan
hXp://www.10wheatgenomes.com
hXp://animal.nwsuaf.edu.cn/code/index.php/panPig
hXp://animal.nwsuaf.edu.cn/code/index.php/panGoatTalks:
Somepan-genomevisualizaUons
• Pan-tetris–standaloneJavaappforbacteriapan-genomes(download)– hXps://www.ncbi.nlm.nih.gov/pmc/arUcles/PMC4547177/
• Ricepan-genomeviewer:hXp://www.rmbreeding.cn/pan3k– hXps://www.ncbi.nlm.nih.gov/pubmed/27940610
• PPanGGOLiN:hXps://github.com/labgem/PPanGGOLiN– hXp://dx.doi.org/10.1371/journal.pcbi.1007732
Somepan-genomecodefests
• hXps://github.com/NCBI-Hackathons/TheHumanPangenome-workshoptodiscusstoolsinrelaUontopangenomeanalysis.Strategiesandtoolswerepresented.(Possiblefollow-upatBaylorCollegeofMedicineOctober11-13.)– hXps://f1000research.com/arUcles/8-1751
• hXps://graph-genome.github.io/PantoGraphforSARS-CoV-2
Howcanapan-genomebeuseful,whoisitusefulfor,howtomakedataaccessible,whatistheroleofdata
portals?
PresentaUonsofpan-genomes,tools,analyses.
EloiDurant-PanacheFakepan-genomeviewer
hXps://meerketeer.ird.fr/PanacheFake
AlanCleary–GenomeContextViewer
hXps://legumeinfo.org/lis_context_viewer/
MarcelaTello-Ruiz–pan-genomebrowsersforrice,maize,andgrape
Rice:hXp://oge.gramene.org/(maize,grapevine,andsorghumareunderdevelopment)
How to think Pangenome
Visualization? Introducing Panache, a
pangenome explorer prototype
Éloi DURANT AgBioData Conference call – Pan-genomes
Introduction
[email protected] PhD student
MONTPELLIER
“Development of a tool for the visualization of plant
pangenomes”
Thinking visualization
Does it to scale to big genomes?
What representations can be done?
Seeing everything at once is a fantasy.
Thinking visualization Seeing everything at once is a
fantasy.
Overly complicated data are unreadable as such, cf. the ‘Hairball effect’
Detail is reached on focusing on specific perts
Information = visualization x data processing
Panache
Panache: PANgenome Analyzer with CHromosomal
Exploration
Summarization of information + exploration
Summarization: access to inner properties Exploration: manipulation of multiple representations
Panache
https://meerketeer.ird.fr/PanacheFake
https://meerketeer.ird.fr/PanacheNapus
Panache
What is missing?
Exploration through different zoom scales
More representations
“Pangenomes, why not, but I don’t want to loose all my previous analyses.”
Data
GenomeContextViewer
Alan Cleary
Micro-Synteny
query gene
neighbors neighbors
genomic interval
chromosome
Query Track
Functional Annotations
Biology● Intra-track homology
○ Copy number
physical distance
Micro-SyntenyTrack Search
alig
ned
sim
ilar
trac
ks
Dot PlotsBiology● Inter-track homology
○ Copy number variation○ Gene presence/absence
variation○ Inversions
genomic interval
Macro-SyntenyBlock Search
similar blocks
query track
SpeciesBiology● Preservation of large structures
AllTogether
Pangenomics-Arabidopsisthaliana
Pangenomics-Vignaunguiculata
Pangenomics-Vignaunguiculatapangenesets
Traditional gene families
Pan gene sets
Resources
Cleary, Alan, and Andrew Farmer. "Genome Context Viewer: visual exploration of multiple annotated genomes using microsynteny." Bioinformatics 34.9 (2017): 1562-1564. hXps://legumeinfo.org/lis_context_viewerhXps://github.com/legumeinfo/lis_context_viewerGlycineexampleArabidopsisexample
PlantPanGenomeBrowsers-UUlizingtheGramene&Ensembl
Infrastructure
MarcelaKareyTello-Ruiz,PhDMay6,2020
TypesofPanGenomesHomolog-basedstrategy(all-by-all)-Thegenomesofindividualsareindependentlyassembled,andthepresence/absenceinagenefamilyisdeterminedbyclusteringproteinsequencesintohomologs.
1. “Map-to-pan”strategy(reference-based)-Pangenomesequencesareconstructedbycombiningawell-annotatedreferencegenomewithnewlyidenUfiednon-referencerepresentaUvesequences,fromwhichthepresence/absenceofageneisthendeterminedbasedonreadcoveragealerindividualreadsaremappedtothepangenome.Highlyrecommendedforeukaryo9cpangenomeanalysis.
Huetal(2020)
PhylogeneUcGeneTrees● Clusterhomologousgenefamilies● Consensusof5tree-buildingmethods
○ NJ-dN,NJ-dS,NJ-mm,Phyml-aa,Phyml-nt● Infersorthologsandparalogs● TaxonomicdaUng● InteracUvetree-browserforCross-species
hXp://useast.ensembl.org/info/docs/compara/homology_method.htmlVilellaetal(2008);Schwartzetal(2003);Kentetal(2003)
UsingComparative PhylogenomicstoSupportaPan-GeneSpaceBrowser
Pan-Genome (gene space) Browsers
Oryza Genome Evolution (oge.gramene.org) Maize NAM Founders (maize-pangenome.gramene.org) ● PacBio/Bionano assembly of diverse maize
inbreds ● Kelly Dawe (U Georgia), Matt Hufford (Iowa
State U), Candice Hirsch, MaizeGDB: Carson Andorf, Maggie Woodhouse, Corteva: Kevin Fengler
Wild & cultivated Grapevine (vitis.gramene.org) ● Multiple PacBio & 10X genomes ● USDA-ARS VitisGen2 Project: Lance Cadle-Davidson (USDA-ARS, Geneva,
NY), Dario Cantu (UC Davis), Rachel Naegele (USDA-ARS, Parlier, CA) USDA-ARS portal for Sorghum genomics/breeding resources (sorghumbase.org) ● Multiple PacBio &10X genomes ● Chad Hayes (USDA-ARS, Lubbock TX), Corteva, community data sets JGI,
Terra Ref
SubsitesholdcollecUonsofcloselyrelatedreferencegenomes● Withinspecies,genus,orcropgroup● Outgroupspecies● Sourcedbycollaboratorsandfundedprojects● 4subsitesinprogressforrice,maize,sorghum,&grapevine
UniformgeneannotaUonprotocol(inprogress)● Species-customizedrepeatlibrary&evidencesets● RNA-seqassemblies,PacBioIso-seq,EST,priorannotaUon● Evidence-basedredicUon
Gramene/Ensembldatabases,Search,Views&PipelinesComparaGeneTrees&wholegenomealignment
● Genefamilyassignment● PhylogeneUctreebuild● Ortholog¶logcalling● TaxonomicdaUng● PairwiseWGA(BLASTZ-CHAIN-NET)● GeneUcvariaUons(SVs&SNPs)
Gene-centeredpairwisesyntenymaps● Mapscollinear&near-collinearorthologs● Neighborhoodview
Pangenomeindex● ClustersyntelogsbytransiUveclosure● PresenceabsencevariaUon(PAV)● CopynumbervariaUon(CNV)● Core&dispensablegenome
Pan-Genome (gene space) Browsers
Gramene Search & Enhanced Tree Views
Pangenomicsearchsummary
Alternategene-treeviews● Gene-neighborhood● MulUplesequencealignment● Protein-domainhighlighUng
MaizePan-Genome:Genetreealignmentview
Prototypesite:hXp://maize-pangenome-ensembl.gramene.org
MaizePan-Genome:MulUple-sequencealignmentview
Prototypesite:hXp://maize-pangenome-ensembl.gramene.org
MaizePan-Genome:GeneneighborhoodconservaUonview
Prototypesite:hXp://maize-pangenome-ensembl.gramene.org
Pan- Genome (gene space) Browsers
SubsitesholdcollecUonsofcloselyrelatedreferencegenomes● Withinspecies,genus,orcropgroup● Outgroupspecies● Sourcedbycollaboratorsandfundedprojects● 4subsitesinprogressforrice,maize,sorghum,&grapevine
UniformgeneannotaUonprotocol(inprogress)● Species-customizedrepeatlibrary&evidencesets● RNA-seqassemblies,PacBioIso-seq,EST,priorannotaUon● Evidence-based+abiniUopredicUon
Gramene/Ensembldatabases,Search,Views&PipelinesComparaGeneTrees&wholegenomealignment
● Genefamilyassignment● PhylogeneUctreebuild● Ortholog¶logcalling● TaxonomicdaUng● PairwiseWGA(BLASTZ-CHAIN-NET)● GeneUcvariaUons(SVs&SNPs)
Gene-centeredpairwisesyntenymaps● Mapscollinear&near-collinearorthologs● Neighborhoodview
Pangenomeindex● ClustersyntelogsbytransiUveclosure● PresenceabsencevariaUon(PAV)● CopynumbervariaUon(CNV)● Core&dispensablegenome
Usecase:OriginofDomes@ca@ongenesforPAVs
Genetreeandwhole-genomealignmentconfirmspresenceofSh1inO.barthiiprogenitor,butabsenceinAfricanrice,aspreviouslyobserved(Wangetal.2014).
Pan- Genome (gene space) Browsers
Futuretargets:● Wholegenomealignmentscomplimentthe
proteingenetreesandcharacterizaUonofnon-codingtranscriberegions.
● Regulatorynontranscribedregions
SubsitesholdcollecUonsofcloselyrelatedreferencegenomes● Withinspecies,genus,orcropgroup● Outgroupspecies● Sourcedbycollaboratorsandfundedprojects● 4subsitesinprogressforrice,maize,sorghum,&grapevine
UniformgeneannotaUonprotocol(inprogress)● Species-customizedrepeatlibrary&evidencesets● RNA-seqassemblies,PacBioIso-seq,EST,priorannotaUon● Evidence-based+abiniUopredicUon
Gramene/Ensembldatabases,Search,Views&PipelinesComparaGeneTrees&wholegenomealignment
● Genefamilyassignment● PhylogeneUctreebuild● Ortholog¶logcalling● TaxonomicdaUng● PairwiseWGA(BLASTZ-CHAIN-NET)● GeneUcvariaUons(SVs&SNPs)
Gene-centeredpairwisesyntenymaps● Mapscollinear&near-collinearorthologs● Neighborhoodview
Pangenomeindex● ClustersyntelogsbytransiUveclosure● PresenceabsencevariaUon(PAV)● CopynumbervariaUon(CNV)● Core&dispensablegenome
Pan- Genome (gene space) Browsers
Futuretargets:● SupportforanalysesworkflowtoextractPAV
&CNVs;Core&dispensablegenomes● Novelviewstoimproveaccessand
interpretaUon● ImprovedSearch
SubsitesholdcollecUonsofcloselyrelatedreferencegenomes● Withinspecies,genus,orcropgroup● Outgroupspecies● Sourcedbycollaboratorsandfundedprojects● 4subsitesinprogressforrice,maize,sorghum,&grapevine
UniformgeneannotaUonprotocol(inprogress)● Species-customizedrepeatlibrary&evidencesets● RNA-seqassemblies,PacBioIso-seq,EST,priorannotaUon● Evidence-based+abiniUopredicUon
Gramene/Ensembldatabases,Search,Views&PipelinesComparaGeneTrees&wholegenomealignment
● Genefamilyassignment● PhylogeneUctreebuild● Ortholog¶logcalling● TaxonomicdaUng● PairwiseWGA(BLASTZ-CHAIN-NET)● GeneUcvariaUons(SVs&SNPs)
Gene-centeredpairwisesyntenymaps● Mapscollinear&near-collinearorthologs● Neighborhoodview
Pangenomeindex● ClustersyntelogsbytransiUveclosure● PresenceabsencevariaUon(PAV)● CopynumbervariaUon(CNV)● Core&dispensablegenome
Thanks!WegratefullyacknowledgesupportfromgrantsNSF#1744001,NSF#1127112,andUSDA-ARS#58-8062-7-008.● SharonWei-Analyses,web&Ensemblsolware● AndrewOlson-APIdevelopment,search,views● MarcelaK.Tello-Ruiz-Species-specificcollaboraUons&outreach● WareLabmembers
Collaborators
● Ensembl-Infrastructure● OGEproject● NAMproject● VG2project
SomeIntroductoryPapersBayeretal.,2017.Assemblyandcomparisonoftwo
closelyrelatedBrassicanapusgenomes.Gaoetal.,2019.Thetomatopan-genomeuncovers
newgenesandararealleleregulaUngfruitflavorGoliczetal.,2020.Pangenomicscomesofage:
FrombacteriatoplantandanimalapplicaUons.Montenegroetal.,2017.Thepangenomeof
modernhexaploidbreadwheat.ShermanandSalzberg,2020.Pan-genomicsinthe
humangenomeera.
BayerPE,HurgobinB,GoliczA,ChanK,YuanY,LeeHT,RentonM,MengJ,LiR,LongY,ZouJ,BancrolI,ChalhoubB,KingG,BatleyJ,EdwardsD.(2017)AssemblyandcomparisonoftwocloselyrelatedBrassicanapusgenomes.PlantBiotechnologyJournal.15(12):1602-1610
DanileviczMF,TayFernandezCG,MarshJI,BayerPE,EdwardsD.(2020)PlantPangenomics:Approaches,ApplicaUonsandAdvancements.CurrentOpinioninPlantBiology.54:15-25
DolatabadianA,BayerP,TirnazS,HurgobinB,EdwardsD,BatleyJ.(2020)CharacterisaUonofdiseaseresistancegenesintheBrassicanapuspangenomerevealssignificantstructuralvariaUon.PlantBiotechnologyJournal.18(4):969-982
DolatabadianA,PatelDA,EdwardsDandBatleyJ.(2017)CopynumbervariaUonanddiseaseresistanceinplants.TheoreUcalandAppliedGeneUcs.130(12),2479-2490
GoliczA,BayerPE,BhallaPL,BatleyJ,EdwardsD.(2020)Pangenomicscomesofage:FrombacteriatoplantandanimalapplicaUons.TrendsinGeneUcs63(2):132-145
GoliczAA,BayerPE,BarkerG,EdgerPP,KimHR,MarUnezPA,ChanCKK,Severn-EllisA,McCombieR,ParkinIAP,PatersonAH,PiresJC,SharpeAG,TangH,R.TeakleGR,TownCD,BatleyJ,EdwardsD.(2016)ThepangenomeofanagronomicallyimportantcropBrassicaoleracea.NatureCommunicaUons7:13390
HirschCN,FoersterJM,JohnsonJM,SekhonRS,MuXoniG,VaillancourtB,PeñagaricanoF,LindquistE,PedrazaMA,BarryK,deLeonN,KaepplerSM,BuellCR.Insightsintothemaizepan-genomeandpan-transcriptome.PlantCell.2014Jan;26(1):121-35.doi:10.1105/tpc.113.119982.
Morereferences
Hübneretal..Sunflowerpan-genomeanalysisshowsthathybridizaUonalteredgenecontentanddiseaseresistance.NatPlants.2019Jan;5(1):54-62.doi:10.1038/s41477-018-0329-0.hepangenomeofanagronomicallyimportantcropBrassicaoleracea.NatureCommunicaUons7:13390.
HurgobinB,GoliczA,BayerP,ChanK,TirnazS,DolatabadianA,SchiesslS,SamansB,MontenegroJ,ParkinI,PiresC,ChalhoubB,KingG,SnowdonR,BatleyJandEdwardsD.Homoeologousexchangeisamajorcauseofgenepresence/absencevariaUonintheamphidiploidBrassicanapus.(2018)PlantBiotechnologyJournal.16(7),1265-1274
HurgobinHandEdwardsD.(2017)SNPdiscoveryusingapangenome:hasthesinglereferenceapproachbecomeobsolete?Biology6(1):E21
MontenegroJDM,GoliczAA,BayerPE,HurgobinB,LeeHT,ChanCKK,VisendiP,LaiK,DoleželJ,BatleyJ,EdwardsD.(2017)Thepangenomeofmodernhexaploidbreadwheat.PlantJournal.90(5):1007-1013
Ouetal.,2018.Pan-genomeofculUvatedpepper(Capsicum)anditsuseingenepresence–absencevariaUonanalyses.NewPhytologistVol220(2):360-363
Pinosioetal.,2016.CharacterizaUonofthePoplarPan-GenomebyGenome-WideIdenUficaUonofStructuralVariaUon.MolecularBiologyandEvoluUon,Volume33,Issue10,October2016,Pages2706–2719.
Readetal.,2013.PangenomeofthephytoplanktonEmilianiaunderpinsitsglobaldistribuUon.Naturevolume499,pages209–213.
SunC1,2,HuZ1,2,ZhengT3,LuK1,ZhaoY1,WangW3,ShiJ4,WangC3,LuJ1,ZhangD4,5,LiZ6,WeiC7,2.RPAN:ricepan-genomebrowserfor∼3000ricegenomes.NucleicAcidsRes.2017Jan25;45(2):597-605.doi:10.1093/nar/gkw958.
Morereferences-conUnued
Valliyodanetal..2019ConstrucUonandcomparisonofthreenewreference-qualitygenomeassembliesforsoybean.ThePlantJournal.100(5):1066-1082
Varshneyetal.,2019Resequencingof429chickpeaaccessionsfrom45countriesprovidesinsightsintogenomediversity,domesUcaUonandagronomictraits.NatureGeneUcs51,857-864.
Wangetal.,2018.GenomicvariaUonin3,010diverseaccessionsofAsianculUvatedrice.Naturevolume557,pg43–49.
YuJ,GoliczA,LuK,DossaK,ZhangY,ChenJ,WangL,YouJ,FanD,EdwardsD,ZhangX.(2019)InsightintotheevoluUonandfuncUonalcharacterisUcsofthepan-genomeassemblyfromsesamelandracesandmodernculUvars.PlantBiotechnologyJournal.17(5):881-892BayerPE;GoliczA,TirnazS,ChanKCC,EdwardsD,BatleyJ.(2019)VariaUoninabundanceofpredictedresistancegenesintheBrassicaoleraceapangenome.PlantBiotechnologyJournal.17(4):789-800
ZhaoJ,BayerPE,RuperaoP,SaxenaRK,KhanAW,GoliczAA,NguyenHT,BatleyJ,EdwardsD,VarshneyRK.2020TraitassociaUonsinthepangenomeofpigeonpea(Cajanuscajan)PlantBiotechnologyJournal.
Morereferences-conUnued