teaching bioinformatics data analysis using medicago truncatula as a model
TRANSCRIPT
Teaching Bioinformatics data analysis using Medicagotruncatula as a model
Vivek KrishnakumarSession: Teaching Genetics, Genomics, Bioinformatics and Biotechnology
Plant & Animal Genome XXIVSaturday, Jan 9th, 2016
Outline
• Background¡ Medicago genomeproject¡ Outreachmandate¡ OurVision
• JCVIPlantBioinformaticsWorkshop• Communityaccesstoworkshopresources• RelatedInitiatives• Summary
Medicago genomeproject
• Medicago truncatula,acloserelativeofalfalfa,isthepreeminentmodelforlegumegenomics
• Sequencinginitiatedin2003,renewedin2006,movedtocurationphasein2009
• FundedbyNSFPlantGenomeawards#0321460,#0604966and #0821966,respectively
Medicago genomeprojectactivities
• Sequencing¡ Sanger-basedBACsequencing¡ Sequencefinishing/gapclosure¡ NextGen sequencing(NGS)usingIllumina/454
• Assembly¡ Tiling-path&geneticmapbasedgenomeassembly¡ WholeGenomeShotgun(WGS)assembly¡ OpticalMapbasedgenomeassemblyimprovement
• Annotation¡ denovo genefinding,transposonclassification¡ Transcriptomebasedgenestructuralannotation¡ TranscriptomebasedAlternativeSplicing(AS)detection¡ Genefunctionalannotation
• OnlineDatabases¡ Medicago truncatula GenomeDatabase¡ Medicago CommunityAnnotationPortal
OutreachMandate
NSFAward#0821966:Attheeducationallevel,participatinginstitutionswillhostvisitingstudentsintheirlaboratoriesforsummerinternships.Inaddition,annualworkshopswillbeheldtoprovideeducationingenomeannotationandanalysistograduatestudents,postdoctoralfellowsandinterestedfacultyinthelegumecommunity.
http://www.nsf.gov/awardsearch/showAward?AWD_ID=0821966
OurVision
• Genomeandtranscriptome sequencingisnowcommonplace,sequencingtechconstantlyevolving
• Newmethodologiesandtoolstoanalyze/visualizedatacontinuetobedevelopedandreleased
• Pressingneedforresearcherstokeepabreastofnewbioinformaticsanalysistechniques
• Goal:¡ Developacomprehensivecurriculumcapableofcoveringtheoreticalandpracticalnuancesofgenomicdataanalysis,targetedtowardsresearcherslookingtohonetheirbioinformaticsskills
JCVIPlantBioinformaticsWorkshopBackground
• Annualweek-longworkshop• Startedin2010andconcludedin2014• Opentoparticipantswithin/outsidetheUSA• Opentouniversityandindustryparticipants• Opentoremotelylocatedparticipants• FullypaidforbytheNSFAward(exceptforinternationaltravel)
• FocusedonvariousaspectsofGenomicsandBioinformaticsdataanalysis
JCVIPlantBioinformaticsWorkshopPresentations
• Internalinstructors(fromthePlantGenomicsgroups)presenttalksontopicsderivingfromtheirdomainknowledge
¡ Linux:Getting familiarwithcommandline interface (CLI),1. learningtousecommandlinetoolkits2. understanding common fileformats(GFF3, BED,SAM)
¡ Assembly:1. genomesequencing technologies(454, Illumina,PacBio)2. genomeassemblymethodsandtools(SOAPdenovo, Velvet)3. assembly comparison tools(nucmer)
¡ Annotation:1. genefindingmethodologies2. functional annotationtools3. transcriptome assembly andanalysis4. differential expression analysis
¡ Variation:1. SingleNucleotideVariations(SNV)andtheireffects2. Variantanalysis tools
• Guestinstructorspresentdomain specifictalks:smallRNAanalysis(BlakeMeyers,DBI),Repeatanalysis(Heidrun Gundlach,MIPS),Comparative genomics(EricLyons,UofA/iPlant), Quantifying transcriptabundance(Andrew Farmer,NCGR),SyntheticBiology(OtherJCVIResearchers)
• Hands-ondataanalysissessionsareinterspersedbetweenpresentations
• Exercisesaredesignedagainstrealdata,eithergeneratedbytheMedicago project,orotherpublisheddatasets
• Attendeesperformallthedataanalysisonthecommand-lineinterface,directlyonJCVIhostedcomputationalresources
• ComputationalneedsforremoteattendeesmanagedviacloudcomputetechnologypoweredbyAmazonwebservices
JCVIPlantBioinformaticsWorkshopHands-onSessions
JCVIPlantBioinformaticsWorkshopCloud-basedcollaborationtechnologies
• Cloud-baseddocumentsharing
¡ GoogleDriveplatform¡ Presentationandhands-on
materialhostedaslivedocuments
¡ Contentorganizedintologicalfolders
¡ Contentaccessible afterworkshopcompletion
• Cloud-basedteleconferencing¡ CiscoWebExplatform¡ Facilitates instantaneous voice
andvideocalling¡ Sharecontentwithremote
participants¡ Selectiverecordingoftalks
JCVIPlantBioinformaticsWorkshopCloud-basedcomputetechnologies
• Settingupandtestingcompute,dataandanalysistoolswithinJCVIenabledestimationofresourcerequirementsintermsofCPU,RAMandstorage
• ResourcesreplicatedontotheAmazonElasticCloudCompute(EC2)infrastructuretobuildVirtualMachine(VM)image
• VMimageusedtospawnon-demandinstancesasperrequirementsofremoteattendees
Resource Allocation(per machine)
ProcessingCores 20CPU
Memory (RAM) 40GB
Storage 150GB
For a total of 20 users, 4x machines allocated
JCVIPlantBioinformaticsWorkshopParticipation
2013
2013 2014
2012
Undergrad &GraduateStudents
Postdocs/Scientist Faculty Women Universities Intl.
Universities Industries Govt.Agencies
Workshop2014 7 11 4 10 14 2 2 2Workshop2013 8 5 4 7 15 2 3 1
Totals 15 16 8 17 29 4 5 3
Communityaccesstoworkshopresources
• Forposterity,completesetofworkshopresourceshavebeenpostedasafree-to-userVirtualMachine(VM)imageavailableontheopen-accesscloudcomputinginfrastructure,Atmosphere,developedandmadeavailablebyCyVerse (formerlyiPlantCollaborative)
• VMimage:https://atmo.iplantcollaborative.org/application/images/899
• Presentations&Hands-onexercisematerial:http://j.mp/jcvi-bioinfo-workshop
Requirements toaccesstheseresources:• CreateaniPlant account:
https://user.iplantcollaborative.org• RequestaccesstoAtmosphere:
https://pods.iplantcollaborative.org/wiki/x/mIly
• CreatenewinstancefromWorkshopVMimage:https://pods.iplantcollaborative.org/wiki/x/Blm
• Onceinstance isrunning,followtheSSHinstructionsfrom“ConnectingtoiPlant Instance”documentintheGoogleDocsrepository:http://j.mp/jcvi-bioinfo-workshop
Communityaccesstoworkshopresources
Layoutofdataandtools:
Componentspecific layout:
SimilarInitiativesOSUSummerBioinformaticsWorkshop
• Annualsummerworkshopstartedin2012
• Targetedtowardstudentsandfacultywithlimitedbackgroundinbioinformatics
• SimilarinscopeastheJCVIworkshop:Instructorspresentbackgroundinformation,attendeesformgroupsandworktogethertoanalyzedataandpresenttheirfindings
• PartofOSUBioinformaticsGraduateCertificationprogram
• ParticipantslearntouseHighPerformanceComputingsystems(viaOSUHPCC)
• ExposesresearcherstoiPlantcommunityresources:Atmosphere(cloud),DiscoveryEnvironment(workflows)
Peter HoytDana Brunson
SimilarInitiativesOSUSummerBioinformaticsWorkshop
Undergrads GraduateStudents Postdocs Faculty/staff
WomenorUnderrepresent
edgroups
CollegesRepresented
Universitiesrepresented
InternationalUniversities Industries Govt.
Agencies
2015 1 12 2 6 13 4 4 1 12014 0 20 6 7 27 4 2 0 1 2Total 1 32 8 13 40 8 6 1 2 2
Conclusion
• Developedcurriculumconsistingofdiversetopics,maintainingrelevancetocurrentadvances
• Implementedcurriculumaspartoftrainingworkshopsover4yearperiod
• Cloudcomputingtechnologyutilizedtoexpandthereachoftheworkshop
• WorkshopmaterialsmadeavailabletothebroadercommunityviaiPlant
• Teachingmaterialadaptedandutilizedbysimilarinitiatives
Acknowledgements
JCVIInstructors• Haibao Tang• ShelbyBidwell• BenjaminRosen• MariaKim• Yongwook Choi• AgnesChan• ChristopherTownJCVIGuestInstructors• Suman Pakala• BarbaraMethé• ChuckMerryman
GuestInstructors(US)• EricLyons(Arizona/iPlant)• Nevin Young(UMN)• KevinSilverstein(UMN)• AndrewFarmer(NCGR)• PatrickZhao(Noble
Foundation)• StevenCannon(USDA-ARS)• BlakeMeyers(DBI)GuestInstructors(Intl.)• Heidrun Gundlach (MIPS)• JeromeGouzy (INRA)
THANKYOU!