apollo workshop ags2017 editing functionality

64
Apollo Collaborative genome annotation editing A workshop for the Arthropod Genomics Community Monica Munoz-Torres, PhD | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Environmental Genomics & Systems Biology Division Lawrence Berkeley National Laboratory University of Notre Dame, South Bend, IN. 08 June, 2017 http://GenomeArchitect.org

Upload: monica-munoz-torres

Post on 28-Jan-2018

311 views

Category:

Science


5 download

TRANSCRIPT

Page 1: Apollo Workshop AGS2017 Editing functionality

ApolloCollaborative genome annotation editing A workshop for the Arthropod Genomics Community

Monica Munoz-Torres, PhD | @monimunoztoBerkeley Bioinformatics Open-Source Projects (BBOP)Environmental Genomics & Systems Biology DivisionLawrence Berkeley National Laboratory

University of Notre Dame, South Bend, IN. 08 June, 2017

http://GenomeArchitect.org

Page 2: Apollo Workshop AGS2017 Editing functionality
Page 3: Apollo Workshop AGS2017 Editing functionality

editingfunctionality

Page 4: Apollo Workshop AGS2017 Editing functionality

beginwithanewgenemodel

Page 5: Apollo Workshop AGS2017 Editing functionality

Annotatorpanel.

• Chooseappropriateevidencefromlistof“Tracks”onannotatorpanel.

• Select&dragelementsfromevidencetrackintothe‘User-createdAnnotations’area.

• Hoveringoverannotationinprogressbringsupaninformationpop-up.

Creating a new annotation

Page 6: Apollo Workshop AGS2017 Editing functionality

Adding a gene model

Page 7: Apollo Workshop AGS2017 Editing functionality

Adding a gene model

Page 8: Apollo Workshop AGS2017 Editing functionality

Adding a gene model

Page 9: Apollo Workshop AGS2017 Editing functionality

thesequencetrack

Page 10: Apollo Workshop AGS2017 Editing functionality

•‘Zoom to base level’ reveals the sequence track.

Page 11: Apollo Workshop AGS2017 Editing functionality

Color exons by CDS from the ‘View’ menu.

Page 12: Apollo Workshop AGS2017 Editing functionality

Zoomin/outwithkeyboard:shift+arrowkeysup/down

Toggle reference DNA sequence and translation frames in forward strand.

Also, toggle models in either direction.

Page 13: Apollo Workshop AGS2017 Editing functionality

curatingsimplecases

Page 14: Apollo Workshop AGS2017 Editing functionality

• “Simple case”: • the predicted gene model is correct or nearly correct, and • this model is supported by evidence that completely or mostly agrees

with the prediction. • evidence that extends beyond the predicted model is assumed to be

non-coding sequence.

The following are simple modifications.

SIMPLECASES

Page 15: Apollo Workshop AGS2017 Editing functionality

Editing functionality

SIMPLECASES

Page 16: Apollo Workshop AGS2017 Editing functionality

• Aconfirmationboxwillwarnyouifthereceivingtranscriptisnotonthesamestrandastheelementfromwherethe‘new’exonoriginated.

• Check‘Start’and‘Stop’ signalsaftereachedit.

ADDING EXONS

SIMPLECASES

Page 17: Apollo Workshop AGS2017 Editing functionality

Editing functionalityExample: Adding an exon supported by experimental data

• RNAseq reads show evidence in support of a transcribed product that was not predicted.• Add exon by dragging up one of the RNAseq reads.

SIMPLECASES

Page 18: Apollo Workshop AGS2017 Editing functionality

• Iftranscriptalignmentdataareavailable&extendbeyondyouroriginalannotation,youmayextendoraddUTRs.

1. Rightclickattheexonedgeand‘Zoomtobaselevel’.

2. PlacethecursorovertheedgeoftheexonuntilitbecomesablackarrowthenclickanddragtheedgeoftheexontothenewcoordinatepositionthatincludestheUTR.

ADDING UTRs

ToaddanewsplicedUTRtoanexistingannotationalsofollowtheprocedureforaddinganexon,orto‘SetasX’end’.

SIMPLECASES

Page 19: Apollo Workshop AGS2017 Editing functionality

MATCHING EXON BOUNDARY TO EVIDENCE

SIMPLECASES

To modify an exon boundary and match data in the evidence tracks: select both the offending exon and the element with the correct boundary, then right click on the annotation to select ‘Set 3’ end’ or ‘Set 5’ end’ as appropriate.

Page 20: Apollo Workshop AGS2017 Editing functionality

1. Twoexonsfromdifferenttrackssharingthesamestart/endcoordinatesdisplayaredbartoindicatematchingedges.

2. Selectingthewholeannotationoroneexonatatime,usethis edge-matchingfunctionandscrollalongthelengthoftheannotation,verifyingexonboundariesagainstavailabledata.Usesquare[]bracketstoscrollfromexontoexon.Usercurly{}bracketstoscrollfromannotationtoannotation.

3. CheckifcDNA/RNAseqreadslackoneormoreoftheannotatedexonsorincludeadditionalexons.

CHECK FOR EXON INTEGRITY

SIMPLECASES

Page 21: Apollo Workshop AGS2017 Editing functionality

Doubleclickselectstheentiremodel

EvidenceTracksArea

‘User-createdAnnotations’Track

Edge-matching

Apollo’seditinglogic(brain):§ selectslongestORFasCDS§ recalculatesORFaftereachedit,unlessset

ORFs - setting & recalculating

SIMPLECASES

Redlinesaroundexons:‘edge-matching’allowsannotatorstoconfirmwhethertheevidenceisinagreement,withoutexaminingeachexonatthebaselevel.

Page 22: Apollo Workshop AGS2017 Editing functionality

Non-canonical splices are indicated with orange circles with a white exclamation point inside, placed over the edge of the offending exon.

Canonicalsplicesites:

3’-…exon]GA/TG[exon…-5’

5’-…exon]GT/AG[exon…-3’reversestrand,notreverse-complemented:

forwardstrand

SPLICE SITES

Zoom to review non-canonical splice site warnings. Although these may not always have to be corrected (e.g. GC donor), they should be flagged with a comment.

Exon/intron splice site error warning

Curatedmodel

SIMPLECASES

Page 23: Apollo Workshop AGS2017 Editing functionality

Editing functionalityExample: Adjusting exon boundaries supported by experimental data

SIMPLECASES

Page 24: Apollo Workshop AGS2017 Editing functionality

• Apollo calculates the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons.

• If ‘Start’ appears to be incorrect, modify it by selecting an in-frame ‘Start’ codon further up or downstream, depending on evidence (e.g. proteins, RNAseq).

It may be present outside the predicted gene model, within a region supported by another evidence track.

In very rare cases, the actual ‘Start’ codon may be non-canonical (non-ATG).

‘Start’ AND ‘Stop’ SITES

SIMPLECASES

Page 25: Apollo Workshop AGS2017 Editing functionality

curatingcomplexcases

Page 26: Apollo Workshop AGS2017 Editing functionality

Evidencemaysupportjoiningtwoormoredifferentgenemodels.Warning: proteinalignmentsmayhaveincorrectsplicesitesandlacknon-conservedregions!

1. In‘User-createdAnnotations’areashift-clicktoselectanintronfromeachgenemodelandrightclicktoselectthe‘Merge’ optionfromthemenu.

2. Dragsupportingevidencetracksoverthecandidatemodelstocorroborateoverlap,orreviewedgematchingandcoverageacrossmodels.

3. Checktheresultingtranslationbyqueryingaproteindatabasee.g.UniProt,NCBInr.Addcommentstorecordthatthisannotationistheresultofamerge.

MERGE TWO GENE PREDICTIONS ON THE SAME SCAFFOLD

COMPLEXCASES

Redlinesaroundexons:‘edge-matching’allowsannotatorstoconfirmwhethertheevidenceisinagreement,withoutexaminingeachexonatthebaselevel.

Page 27: Apollo Workshop AGS2017 Editing functionality

• Oneormoresplitsmayberecommendedwhen:- differentsegmentsofthepredictedproteinaligntotwoormoredifferentgenefamilies- predictedproteindoesn’taligntoknownproteinsoveritsentirelength- Transcriptdatamaysupportasplit;BUT- first,verifywhethertheyarealternativetranscripts.

SPLIT A GENE PREDICTION

COMPLEXCASES

Page 28: Apollo Workshop AGS2017 Editing functionality

DNATrack

‘User-createdAnnotations’Track

ANNOTATE FRAMESHIFTS AND CORRECT SINGLE-BASE ERRORS

Alwaysremember:whenannotatinggenemodelsusingApollo,youarelookingata‘frozen’versionofthegenomeassemblyandyouwillnotbeabletomodifytheassemblyitself.

COMPLEXCASES

Page 29: Apollo Workshop AGS2017 Editing functionality

CORRECTING SELENOCYSTEINE CONTAINING PROTEINS

COMPLEXCASES

Page 30: Apollo Workshop AGS2017 Editing functionality

COMPLEXCASES

CORRECTING SELENOCYSTEINE CONTAINING PROTEINS

Page 31: Apollo Workshop AGS2017 Editing functionality

1. Apolloallowsannotatorstomakesinglebasemodificationsorframeshiftsthatarereflectedinthesequenceandstructureofanytranscriptsoverlappingthemodification.ThesemanipulationsdoNOTchangetheunderlyinggenomicsequence.Ifyoudeterminethatyouneedtomakeoneofthesechanges,zoomintothenucleotidelevelandrightclickoverasinglenucleotideonthegenomicsequencetoaccessamenuthatprovidesoptionsforcreatinginsertions,deletionsorsubstitutions.

2. The‘CreateGenomicInsertion’featurewillrequireyoutoenterthenecessarystringofnucleotideresiduesthatwillbeinsertedtotherightofthecursor’scurrentlocation.The‘CreateGenomicDeletion’ optionwillrequireyoutoenterthelengthofthedeletion,startingwiththenucleotidewherethecursorispositioned.The‘CreateGenomicSubstitution’featureasksforthestringofnucleotideresiduesthatwillreplacetheonesontheDNAtrack.

3. Onceyouhaveenteredthemodifications,Apollowillrecalculatethecorrectedtranscriptandproteinsequences,whichwillappearwhenyouusetheright-clickmenu‘GetSequence’option.Sincetheunderlyinggenomicsequenceisreflectedinallannotationsthatincludethemodifiedregionyoushouldalertthecuratorsofyourorganismsdatabaseusingthe‘Comments’sectiontoreporttheCDSedits.

4. Inspecialcasessuchasselenocysteinecontainingproteins(read-throughs),right-clickovertheoffending/premature‘Stop’signalandchoosethe‘Setreadthroughstopcodon’optionfromthemenu.

ANNOTATING FRAMESHIFTS, CORRECTING SINGLE-BASE ERRORS & SELENOCYSTEINES

COMPLEXCASES

Page 32: Apollo Workshop AGS2017 Editing functionality

addingmetadata

Page 33: Apollo Workshop AGS2017 Editing functionality

Information Editor

Page 34: Apollo Workshop AGS2017 Editing functionality

BECOMINGACQUAINTEDWITHAPOLLO

Information Editor

Page 35: Apollo Workshop AGS2017 Editing functionality

Information Editor

Page 36: Apollo Workshop AGS2017 Editing functionality

history

Page 37: Apollo Workshop AGS2017 Editing functionality

Keeping track of each edit

Page 38: Apollo Workshop AGS2017 Editing functionality

Annotations, annotation edits, and History:are stored in a centralized database.

Page 39: Apollo Workshop AGS2017 Editing functionality

checklist

Page 40: Apollo Workshop AGS2017 Editing functionality

• Followthischecklistuntilyouaresatisfiedtheannotationisthebestrepresentationoftheunderlyingbiology.

• Andrememberto…• commenttovalidateyourannotation,evenifyoumadenochangestoanexistingmodel.Thinkofcommentsasyour‘voteofconfidence’.

• addacommenttoinformthecommunityofunresolvedissuesyouthinkthismodelmayhave.

AlwaysRemember:Apollocurationisacommunityeffortsopleaseusecommentstocommunicatethereasonsforyour

annotation.Yourcommentswillbevisibletoeveryone.

COMPLETING THE ANNOTATION

Page 41: Apollo Workshop AGS2017 Editing functionality

• Check‘Start’ and‘Stop’sites.• Checksplicesites:mostsplicesitesdisplaythese

residues…]5’-GT/AG-3’[…• CheckifyoucanannotateUTRs,forexampleusing

RNA-Seq data:• alignitagainstrelevantgenes/genefamily• blastp againstNCBI’sRefSeq ornr

• Check&commentgaps inthegenome.• Additionalfunctionalitymaybenecessary:

• merge 2genepredictions- samescaffold• ‘merge’ 2genepredictions- differentscaffolds

• split ageneprediction• annotate frameshifts• annotateselenocysteines,correctingsingle-baseandotherassemblyerrors,etc.

• Add:– Importantprojectinformationintheformofcomments.

– IDsforthisgenemodelinpublicorprivatedatabasesviaDBXRefs,e.g.GenBank ID,genesymbol(s),commonname(s),synonyms.

– Commentsaboutthechangesyoumadetoeachgenemodel,ifany.

– Anyappropriatefunctionalassignments,e.g.viaBLAST+HMM(e.g.InterProScan),RNA-Seq orotherdataofyourown,literaturesearches,etc.

CHECKLISTfor accuracy and integrity

Page 42: Apollo Workshop AGS2017 Editing functionality

example

Page 43: Apollo Workshop AGS2017 Editing functionality

Apis mellifera genome data in Apollo

GenomeArchitect.org

1. Evidence in support of protein coding gene models.

1.1 Consensus Gene Sets:Official Gene Set v3.2Official Gene Set v1.0

1.2 Consensus Gene Sets comparison:OGSv3.2 genes that merge OGSv1.0 andRefSeq genesOGSv3.2 genes that split OGSv1.0 and RefSeq genes

1.3 Protein Coding Gene Predictions Supported by Biological Evidence:NCBI GnomonFgenesh++ with RNASeq training dataFgenesh++ without RNASeq training dataNCBI RefSeq Protein Coding Genes and Low Quality Protein Coding Genes

1.4 Ab initio protein coding gene predictions:Augustus Set 12, Augustus Set 9, Fgenesh, GeneID, N-SCAN, SGP2

1.5 Transcript Sequence Alignment:NCBI ESTs, Apis cerana RNA-Seq, Forager Bee Brain Illumina Contigs, Nurse Bee Brain Illumina Contigs, Forager RNA-Seq reads, Nurse RNA-Seq reads, Abdomen 454 Contigs, Brain and Ovary 454 Contigs, Embryo 454 Contigs, Larvae 454 Contigs, Mixed Antennae 454 Contigs, Ovary 454 Contigs, Testes 454 Contigs, Forager RNA-Seq HeatMap, Forager RNA-Seq XY Plot, Nurse RNA-Seq HeatMap, Nurse RNA-Seq XY Plot

Page 44: Apollo Workshop AGS2017 Editing functionality

Apis mellifera genome data in Apollo

GenomeArchitect.org

1. Evidence in support of protein coding gene models (Continued).

1.6 Protein homolog alignment:Acep_OGSv1.2Aech_OGSv3.8Cflo_OGSv3.3Dmel_r5.42Hsal_OGSv3.3Lhum_OGSv1.2Nvit_OGSv1.2Nvit_OGSv2.0Pbar_OGSv1.2Sinv_OGSv2.2.3Znev_OGSv2.1Metazoa_Swissprot

2. Evidence in support of non protein coding gene models

2.1 Non-protein coding gene predictions:NCBI RefSeq Noncoding RNANCBI RefSeq miRNA

2.2 Pseudogene predictions:NCBI RefSeq Pseudogene

Page 45: Apollo Workshop AGS2017 Editing functionality

Ceramidase

Ceramidase is an enzyme, which cleaves fatty acids from ceramide, producing sphingosine (SPH), which in turn is phosphorylated by a sphingosine kinase to form sphingosine-1-phosphate (S1P). Ceramide, SPH, and S1P are bioactive lipids that mediate cell proliferation, differentiation, apoptosis, adhesion, and migration.

It has come to our attention that the honey bee Apis mellifera ortholog of Ceramidase is fragmented into 2 or more genes in the current gene set (Official Gene Set v3.2).

Page 46: Apollo Workshop AGS2017 Editing functionality

Interrogate the genome using Blat

Search all genomic sequences

Page 47: Apollo Workshop AGS2017 Editing functionality

Blat results

Click on a high-scoring segment pair (hsp) to navigate and highlight the region.

Page 48: Apollo Workshop AGS2017 Editing functionality

48

BIPAA resources - blast

Page 49: Apollo Workshop AGS2017 Editing functionality

49i5KWorkspace@NAL

BIPAA resources - Apollo

You may find candidate genes from blast results using the ‘Search’ box with coordinates in main window.

Page 50: Apollo Workshop AGS2017 Editing functionality

Create a new annotation

Drag and drop ‘GB40335-RA’

Page 51: Apollo Workshop AGS2017 Editing functionality

Transcriptomic data support a longer gene

51

RNA-Seq reads support a large intron and additional exons located about 20k bp downstream (3’) of the last predicted exon for GB40335-RA.

Page 52: Apollo Workshop AGS2017 Editing functionality

Transcriptomic data support a longer gene

Drag and drop ‘GB40336-RA’

Page 53: Apollo Workshop AGS2017 Editing functionality

Merge transcripts

Select one exon from each gene model, holding down the ‘Shift’ key. Then, select ‘Merge’ from right-click menu to bring gene models together.

Note non-canonical splice sites.

Page 54: Apollo Workshop AGS2017 Editing functionality

Exon not supported by RNA-Seq data

At the end of GB40335-RA, select last exon and right-click to choose the ‘Delete’ option.

Page 55: Apollo Workshop AGS2017 Editing functionality

Fix remaining non-canonical splice siteNow, on the other offending exon (was first exon of GB40336-RA), use RNA-seq reads - or use ‘Set Downstream Splice Acceptor’, or drag the intron/exon boundary manually - to use a canonical splice site.

Page 56: Apollo Workshop AGS2017 Editing functionality

Retrieve resulting peptide, compare to public databases

Page 57: Apollo Workshop AGS2017 Editing functionality

Results from NCBI blastp vs nr

Page 58: Apollo Workshop AGS2017 Editing functionality

Add metadata in ‘Information Editor’

Don’t forget!

Nice to have

Page 59: Apollo Workshop AGS2017 Editing functionality

Add metadata in ‘Information Editor’

PubMed Identifiers

Gene Ontology terms

Comments

Page 60: Apollo Workshop AGS2017 Editing functionality

Publicdemoinstances

Page 61: Apollo Workshop AGS2017 Editing functionality

APOLLO ON THE WEBinstructions

•Public Honey bee demo available at:

genomearchitect.org/demo/

•Username:[email protected]

•Password:demo

Page 62: Apollo Workshop AGS2017 Editing functionality

APOLLOdemonstration

Demonstrationvideoavailableathttp://bit.ly/apollo-video1

Page 63: Apollo Workshop AGS2017 Editing functionality

Apollo Development

Nathan DunnTechnical Lead Eric Yao

Christine Elsik’s Lab, University of Missouri

Suzi LewisPrincipal Investigator

BBOP

Moni Munoz-TorresProject Manager Deepak Unni

JBrowse. Ian Holmes’ Lab University of California, Berkeley

Page 64: Apollo Workshop AGS2017 Editing functionality

Thank You.Berkeley Bioinformatics Open-Source Projects, Environmental Genomics & Systems Biology, Lawrence Berkeley National Laboratory

Suzanna Lewis & Chris MungallSeth Carbon (GO - Noctua / AmiGO)

Eric Douglas (GO / Monarch Initiative)

Nathan Dunn (Apollo)

Monica Munoz-Torres (Apollo / GO)

Funding

• Work for GOC is supported by NIH grant 5U41HG002273-14 from NHGRI.

• Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI.

• BBOP is also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231

berkeleybop.org

Collaborators• Ian Holmes, Eric Yao, UC Berkeley (JBrowse)• Chris Elsik, Deepak Unni, U of Missouri (Apollo)• Paul Thomas, USC (Noctua)• Monica Poelchau, USDA/NAL (Apollo)• Gene Ontology Consortium (GOC)• i5k Community

UNIVERSITY OF CALIFORNIA