using apollo at the i5k [email protected] process of manual annotation 1. select a chromosomal...

42
Using Apollo at the i5k Workspace@NAL NAL USDA-ARS https://i5k.nal.usda.gov August 29 th , 2017

Upload: others

Post on 10-Feb-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

UsingApolloatthei5kWorkspace@NAL

NALUSDA-ARShttps://i5k.nal.usda.gov

August29th,2017

Page 2: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Agenda

• Manualannotationgeneraloverview• I5kWorkspacetoolsformanualannotation– BLAST,Clustal,HMMER– Apollo

• Manualannotationexample:preparation• Manualannotationliveexample

Page 3: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Otherresources• MonicaMunoz-TorresfromtheApollogrouphasanumberof

comprehensivetutorials:– https://www.slideshare.net/MonicaMunozTorres/presentations

• Irecommendtheseslidesifyouneedmorebackground:– https://www.slideshare.net/MonicaMunozTorres/apollo-workshop-at-ksu-2015

• Note- therearetwoversionsofApollo.Thei5kWorkspacestillusestheolderversionwithaslightlydifferentinterface

– IfyouarenewtoApollo,orneedarefresher,wehighlyrecommendthatyoureviewoneofherpresentations

• TheofficialApolloannotationguide:– http://genomearchitect.org/users-guide/

• Othermanualcurationtutorials:– https://i5k.nal.usda.gov/manual-curation-example– http://genomecuration.github.io/genometrain/d-feature-curation-

crossing/

Page 4: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotationgeneraloverview

Page 5: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Whatismanualannotation?

• Manualreviewandimprovementofanexistinggeneprediction

• Often,butnotalways:drawingonexternalevidence(e.g.RNA-Seq,cDNA,genesfromotherspecies)toimproveacomputationallypredictedgenemodel– Structuralannotation– definingthegenestructure(e.g.exonboundaries)

– Functionalannotation– describingthegenefunction(e.g itsname)

Page 6: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Whymanuallyannotate?

• “Incorrectannotationspoisoneveryexperimentthatmakesuseofthem”

• “Worsestill,thepoisonspreadsbecauseincorrectannotationsfromoneorganismareoftenunknowinglyusedbyotherprojectstohelpannotatetheirowngenomes.”– Yandell andEnce 2012,doi:10.1038/nrg3174

Page 7: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Generalprocessofmanualannotation1. Selectachromosomalregionofinterest(e.g.scaffold)

1. E.g.findsequenceofinterestfromoneorseveralotherspecies,andalignagainstproteinsorgenomesequencefromyourspecies

2. Selectappropriateevidence(tracksinApollo,oryourownfiles)3. Determinewhetherafeatureinyourevidenceprovidesareasonable

startinggenemodel1. Ifyes:selectanddragthefeaturetothe‘user-createdannotations’area,creatinganinitial

genemodel.Ifnecessaryuseeditingfunctionstoadjustthemodel.2. Ifnot– getintouchwithus!

4. Editmodelifnecessary5. Checkyoureditedgenemodelforintegrityandaccuracybycomparingit

withavailablehomologs1. Verifythatthegenemodelisthebestrepresentationoftheunderlyingbiology

6. Repeatsteps1through5 asneededtorefinemodel7. Addannotationdetailsinthe“InformationEditor”

1. Replacedmodel,name,symbol,othercomments

Adaptedfromhttps://www.slideshare.net/MonicaMunozTorres/apollo-workshop-at-ksu-2015

Page 8: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

I5kWorkspace‘Etiquette’1. UseApollotoimproveagenemodelinani5kWorkspaceassembly.

1. Ifyoujustwanttopractice– useoneofourtraininginstances.1. https://i5k.nal.usda.gov/jbrowseapollo-training

2. Ifyoujustwanttoviewthedata– youprobablycangetwhatyouwantwithoutusingApollo.Allofthedatathatwehostispublic.

2. Yourannotationworkisacommunityeffort.1. Ifyounoticethatsomeoneelseisworkingonyourmodelofchoice,getin

touchwiththem(orus)andcollaborate– don’tmakea2ndmodelordeletetheothermodel.

2. Keepinmindthatyourworkwillbeusedbythescientificcommunityonceyou’redone.

3. Ifyoupublishanyofyourworkgeneratedinthei5kworkspace:1. Getintouchwiththegenomecontactfirst(youcanfindthecontactinfoon

theorganismpage;https://i5k.nal.usda.gov/species);2. Pleasecitethei5kWorkspacepaper!Thishelpsuscontinuetoexist.

1. https://doi.org/10.1093/nar/gku983

Page 9: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotation:i5kWorkspacetools

Page 10: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

First,someconventions• HSP– HighscoringpairinBLAST/BLATalignments– The‘Hits’inanalignmentresultset– Asubsectionofapairofsequenceswithsufficientscore– HSPscanchangebasedonthealignmentparameters

• Fiveprimeendandthreeprimeend– Basedondirectionoftranscription– Initiationsiteisatthefiveprimeend– Stopcodonisatthethreeprimeend

• Inthegenomebrowser,arrowheadsindicatedirection

3’5’

5’ 3’

Page 11: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

JBrowse andApollo

Trackselector

Bookmark/shareURL

User-createdannotationstrack

Login/outFile:Addyourownfiles

View:Changecoloringscheme

Tools:SearchusingBLAT

Findinformationabouttracks

Locatewhereyouareonthescaffold

Searchforageneorlocation

Zoomin/out

Turntrackson/off

JBrowse isaweb- basedgenomebrowser• Visualizefeaturesthataremappedtoa

genome• Thesefeaturesaredisplayedastracks• Manydifferenttypesofdatamaybe

displayed

ApolloaddseditingfunctionstoJBrowse• Manualgenecuration• Changesautomaticallysavedbacktoserver• Editsarevisibletootherannotatorsinreal-

time• Editinghistoryistracked

Page 12: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

i5kWorkspaceBLAST:onewaytoaccessApollo

URL:https://i5k.nal.usda.gov/webapp/blast/

BLASTagainstthegenomeassemblytoviewHSPsinJbrowse

Selectorganism-specificdatabaseSelect

organism

Pasteoruploadquery

sequence(s)

Programisautomatically

selected

Page 13: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

i5kWorkspaceBLAST:onewaytoaccessApollo

BLASTresultpagewith4panels

Clickonblueblastdb iconnexttoyourfavoriteHSP

BlastresultsaredisplayedinApollo

Page 14: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

HMMERandClustal

• UseHMMERtodetectremoteproteinhomologs

• https://i5k.nal.usda.gov/webapp/hmmer/

• UseClustal toperformmultiplesequencealignments

• https://i5k.nal.usda.gov/webapp/clustal/

Page 15: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

TipsandTricks• Thei5kWorkspaceBLASTresultspersistforoneweek

– Youcanbookmarkandsharesearches– BLASTHSPsare‘draggable’andcanbeusedinannotations

• Jbrowse/ApolloURLscanbeshared– Allowyoutosharetheexactview(includingactivetracks)withothers– Greatfortroubleshootingwithcollaborators

• InApollo“walk”featureboundaries– Squarebracketswalkexonboundaries:[and]– Curlybracketswalkgeneboundaries:{and}

• InApollo,youcanpintrackstothetop• IfyouknowthenameorIDofthegenethatyou’dliketoannotate,

youcanpasteitintothesearchboxinApollotonavigatetoit

Page 16: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotationexample:preparation

Page 17: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

AnnotationExample

• Phosphoenolpyruvatecarboxykinase (pepck)inthecopepodEurytemora affinis

• Pepck catalyzestheconversionofoxaloacetate(OAA)tophosphoenolpyruvate(PEP).

• Moreinformationaboutthecopepod:https://i5k.nal.usda.gov/Eurytemora_affinis

• ApolloURL:https://apollo.nal.usda.gov/euraff/jbrowse/– Note:Therearenodemoaccountsforthisspecies

Page 18: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

NotesonE.affinis genome/browser

• Bigadvantageforannotation:lotsofRNA-Seqandtranscriptomedataareavailabletouseascontributingevidenceforyourgenemodels– Includesstrand-specificRNA-Seq

• Disadvantage:Noclosereferencegenomes,soitmaybehardertofindhomologsforyourgenesofinteresttoinformyourannotations.

Page 19: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

AvailabletracksforE.affinis• BaylorMakerannotations:

– PrimaryGeneSet:• EAFF_v0.5.3-Models

– Othertracksthatwereusedtogeneratetheprimarygeneset

• Transcriptome/RNA-Seq– Transcriptomeassemblies– Coverageplots,Mapped

RNA-Seq data,Splicejunctions

– SomeoftheRNA-Seqlibrariesarestranded

Page 20: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Choosingreferenceproteins:D.melanogasterpepck inUniProt

Catalyzestheconversionofoxaloacetate(OAA)tophosphoenolpyruvate(PEP).Source:http://www.uniprot.org/uniprot/P20007

Annotationscoreisaheuristicforannotationquality

Flybase isanothergreat

resource

Featureviewergivesgraphicalviewofdomainsandsites

Page 21: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Choosingreferenceproteins:Daphniapulex Pepck

• GenBank record:https://www.ncbi.nlm.nih.gov/protein/EFX80236.1

Treatwithcaution!!!

Page 22: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Manualannotationliveexample

Page 23: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

BLASTdmel,dpul proteinsagainstE.affinis proteins

https://i5k.nal.usda.gov/webapp/blast/

ResultURL:https://i5k.nal.usda.gov/webapp/blast/c577723ffdb04de7921d768d2a1080b6

Copytheprotein‘basename’EAFF006514forsearchinginApollo

Resultsarefilteredbye-value;onlyoneproteininthe E.affinis datasethasasignificantmatch

Page 24: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

ModifyE.affinismodelsequenceinApollo

• GotoApolloURL:https://apollo.nal.usda.gov/euraff/jbrowse/– FindmRNAofEAFF006514-PAingenomebrowserbypastingEAFF006514intosearchbox,selectingEAFF006514-RA

• LogintoApollo• DragEAFF006514-RAintotheyellowannotationtrack

• Checkavailableevidenceformodel

Page 25: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Anotherapproach:BLASTagainstthegenome

https://i5k.nal.usda.gov/webapp/blast/

Clickonblueblastdb buttonnexttoyourfavoriteHSPtoviewitinJBrowse

Page 26: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Anotherapproach:BLASTagainstthegenome

BLASTresultsaredisplayedasglyphsinbrowser;canbeusedasannotationstartingpointsifthealignmentishighquality

Page 27: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Createannotationinuser-createdannotationstrack

DragmodelEaffTmpM006514-RAtoUser-createdAnnotations

track

LoginwithyourApollo

credentials

Page 28: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

ModifyE.affinismodelsequenceinApollo

• Questions:–Whatevidencedoyouchoosetochecktheintegrityofthemodel?

– Doyouneedadditionalevidence?– Howdoyouevaluatewhethertheproteinsequenceisascompleteasitcanbe?

– Shouldyouadd/modifyUTRs?

Page 29: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Viewavailableevidence

RNA-Seq andtranscriptometrackssuggestthatoneexonismissing

Modelisonthereversestrand,sowecantakeadvantageofthestrandedRNA-Seq availableforthisspecies

Page 30: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Addanexontothemodel

Dragexonfromtranscriptometrackintonewgenemodel

Page 31: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Adjustexonboundary

CDSsequenceisnowUTR–zoomintoinvestigate

CDSframehaschangedfrompurpletogreen–weneedtofixthis

RNA-Seq suggestsweneedtoadjustexonboundary

Page 32: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Adjustexonboundary

DragexonboundarytomatchRNA-Seq andtranscriptometracks

Fixedbothreadingframeandexonboundary

Page 33: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Evaluatenewproteinsequence• BlastmodifiedEAFF006514-PAsequencetoNCBI’snr

database– Makesureitdoesn’tmatchapotentialcontaminant– Getanideawhetheryouhavetherightsequence– Blastp home:

• https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome

– ResultURL:• https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=U8EJ44A701R(expiresendofday8/29)

• Oncecontaminationisruledout,it’sbettertoalignyoursequenceagainstasmallersetofhigh-qualityproteins

• Ifyounoticethatpartsoftheproteinaremissing,checkthe‘Gapsinassembly’trackinthebrowser

Page 34: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Evaluatenewproteinsequence

• GetE.affinis pepck proteinsequencefromoldmodelandnewmodel

• Alignnewandoldsequencetodmel anddmag proteinsequences– Clustal (https://i5k.nal.usda.gov/webapp/clustal/)– CanalsouseNCBIBlast

• Checkalignmentextent,%ID

Page 35: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Clustal Results

Newexonadded

Anotherexonmightbemissing(we’renotgoingtohandlethistoday)

- Clustal resultURL:https://i5k.nal.usda.gov/webapp/clustal/105850a3594e4234a21b07d93cbbed71

- Scrolltobottomofpageandclick‘colorful’toseecolor-codedalignment

Page 36: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

UsingtheInformationEditor• SelectthemodelinApollo,thenright-click,andselect‘Edit

Information’fromthedrop-downmenu– Usethe‘mRNA’section– Name:WerecommendtheINSDCnamingguidelines:

• http://www.uniprot.org/docs/nameprot• Ifanamingconventionexists,useit(e.g.forgenefamilies)• Nameshouldbeuniqueandattributedtoallorthologs (asfaraspossible)• Usenamefromanorthologousproteinifyouaresurethatyourgenemodelis

anortholog.• DocumentyourjustificationforthenameintheCommentsfield(e.g.“88%

sequencesimilarityviablastp toD.melanogasterpepck P20007”)– Comments– Documentwhatchangesyouperformed,andyour

justificationforthename.ThesenoteswillbevisibleintheOGS,somakesurethatothersunderstandthem

– ReplacedModelsField– theMakermodel(EAFF_v0.5.3)thatyournewmodelwillreplaceintheOGS

Page 37: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

UsingtheInformationEditor

Page 38: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

TheReplacedModelsfield• Weusetheinformationin

thisfieldtogenerateamerged,non-redundantgenesetfromthemanuallycuratedmodelsandtheofficialorprimarygeneset

• Yourofficialorprimarygenesetislistedinthecategoryfieldofthetrackselector

• Ifyoudon’tknowwhatyourproject’sgenesetis,contactus!

https://i5k.nal.usda.gov/apollo-replaced-models-field-explanations-and-examples

ReplacedModelsfield

Page 39: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Checklistforaccuracyandintegrity• Checkstart,stopandexonboundaries(splicesites)

– Trytofixnon-canonicalsplicesitesifpossible• CheckifyoucanannotateUTRs(e.g.usingRNA-Seq data)• Checkforgapsinthegenome• Ifyouchangethegenomesequence,addajustificationcommenttothe

correspondinggenemodel• UseBLASToramultiplesequencealigner

– Tolookatcompletenessofmodel– Toverifytheappropriatenessofthegenename

• IntheInformationeditormRNA field– FillintheReplacedModelfortheMaker gene(EAFF_v0.5.3-Models)– UpdatetheNameifappropriate– Addcommentsthatdescribe

• yourevidencefortheannotation• Modificationsthatyoumadetothegenemodel

cf.https://www.slideshare.net/MonicaMunozTorres/editing-functionality-apollo-workshop

Page 40: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

WhathappenstomyannotationwhenI’mdone?

• Thisdependsonthegenomeprojectthatyou’reworkingon.• IfthegenomecoordinatorhasaskedustogenerateanOGS(Official

GeneSet),wewilldoso– Wearestillworkingonthisprocess,soifyouaskustodothis,1)itwill

takesometime,and2)wewillprobablyaskyouforco-authorshipifyoupublishapaperontheOGS.

– WeareworkingonapipelinetosubmitOfficialGeneSetstoGenBank,wheretheywillbearchived/accessioned

• Otherwise,don’tassumethatyourannotationwillbearchived.– Ifyouneedittobe,getintouchwithusandwe’llfigureoutwhatto

do.• Getintouchwithusandthegenomeprojectcoordinatorifyou’re

notsureaboutthestatusofagenomeproject.• https://i5k.nal.usda.gov/data-management-policy

Page 41: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Upcomingwebinars(tentativeschedule)

• October:ApollomanualannotationQ&A• December:ManualannotationwithApollo• February:i5kWorkspaceroadmapandQ&A• April:Orientationandresourcesforprojectcoordinators

• June:Overviewofi5kWorkspaceresources• Wewillpostslides,recordingswillbeavailableonrequest

Page 42: Using Apollo at the i5k Workspace@NAL...General process of manual annotation 1. Select a chromosomal region of interest (e.g. scaffold) 1. E.g. find sequence of interest from one or

Thankyou!TheNALTeam• Yu-yu Lin• ChaitanyaGutta• Li-MeiChiang• YiHsiao• GaryMoore• SusanMcCarthyI5kWorkspacealumni• Chien-Yueh Lee• HanLin• Jun-WeiLin• Vijaya Tsavatapalli• Mei-Ju Cheni5kWorkspace@NAL advisorycommittee

• i5kCoordinatingCommittee• i5kPilotProject• Apollo&JBrowse DevelopmentTeams

o MonicaMunoz-Torres,NathanDunn

• GMOD/Tripal community

• Allofourusersandcontributors!