apollo collaborative genome annotation editing

Post on 09-Jan-2017

59 Views

Category:

Science

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ApolloCollaborative genome annotation editing

A workshop for the Stowers Institute Research Community

Monica Munoz-Torres, PhD | @monimunozto

Berkeley Bioinformatics Open-Source Projects (BBOP)Environmental Genomics & Systems Biology DivisionLawrence Berkeley National Laboratory

Kansas City, MO | 12 December, 2016

http://GenomeArchitect.org

Outline

• Today we will discusseffective ways to extract valuable information about a genome through curation efforts.

After this talk you will...• Better understand curation in the context of genome annotation:

assembled genome à automated annotation à manual annotation

• Become familiar with Apollo’s environment and functionality.

• Learn to identify homologs of known genes of interest in your newly sequenced genome.

• Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.

Experimental design, sampling.

Comparative analyses

Merged Gene Set

Manual Annotation

Automated Annotation

SequencingAssembly

Synthesis & dissemination.

Genome sequencing projects

Unlocking genomes

Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild

First,arefresher

A few things to remember during curation of gene models

7BIO-REFRESHER

• KEEPAGLOSSARY HANDYfromcontig tosplicesite

• WHATISAGENE?definingyourgoal

• TRANSCRIPTIONmRNAindetail

• TRANSLATIONreadingframes,etc.

• GENOMECURATIONstepsinvolved

The gene: a moving target

“The gene is a union of genomic

sequences encoding a coherent set of

potentially overlapping

functional products.”

Gerstein et al., 2007. Genome Res

9

"Gene structure" by Daycd- Wikimedia Commons

BIO-REFRESHER

mRNA

• Although of brief existence, understanding mRNAs is crucial,as they will become the center of your work.

10BIO-REFRESHER

Reading frames

v In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF)• ORF = Start signal + coding sequence (divisible by 3) + Stop signal

11BIO-REFRESHER

Splice sites

v The spliceosome catalyzes the removal of introns and the ligation of flanking exons.

v Splicing signals (from the point of view of an intron): • One splice signal (site) on the 5’ end: usually GT (less common: GC)• And a 3’ end splice site: usually AG• Canonical splice sites look like this: …]5’-GT/AG-3’[…

12BIO-REFRESHER

Exons and Introns

v Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons

v Between the first and second nucleotide of a codon

v Or between the second and third nucleotide of a codon

"Exon and Intron classes”. Licensed under Fair use via Wikipedia

13BIO-REFRESHER

Obstaclesto transcription and translation

v The presence of premature Stop codons in the message is possible. A process called non-sense mediated decay checks for them and corrects them to avoid: incomplete splicing, DNA mutations, transcription errors, and leaky scanning of ribosome – causing changes in the reading frame (frame shifts).

v Insertions and deletions (indels) can cause frame shifts when indel is not divisible by three. As a result, the peptide can be abnormally long, or abnormally short – depending when the first in-frame Stop signal is located.

Prediction&Annotation

15GENE PREDICTION & ANNOTATION

PREDICTION & ANNOTATION

v Identificationandannotationofgenomefeatures:

• primarilyfocusesonprotein-codinggenes.• alsoidentifiesRNAs(tRNA,rRNA,longandsmallnon-coding

RNAs(ncRNA)),regulatorymotifs,repetitiveelements,etc.

• happensin2phases:1. Computationphase2. Annotationphase

16GENE PREDICTION & ANNOTATION

COMPUTATION PHASE

a. Experimentaldataarealignedtothegenome:expressedsequencetags,RNA-sequencingreads,proteins(alsofromotherspecies).

b. Genepredictionsaregenerated:- ab initio:basedonnucleotidesequenceandcompositione.g.Augustus,GENSCAN,geneid,fgenesh,etc.

- evidence-driven:identifyingalsodomainsandmotifse.g.SGP2,JAMg,fgenesh++,etc.

Result:thesinglemostlikelycodingsequence,noUTRs,noisoforms.Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174

17GENE PREDICTION & ANNOTATION

ANNOTATION PHASE

Experimentaldata(evidence)and predictionsaresynthetizedintogeneannotations.

Result: genemodelsthatgenerallyincludeUTRs,isoforms,evidencetrails.

Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174

5’UTR 3’UTR

18

Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.

CONSENSUS GENE SETS

Genemodelsmaybeorganizedintosetsusing:v combinersforautomaticintegrationofpredictedsets

e.g:GLEAN,EvidenceModeler

orv toolspackagedintopipelines

e.g:MAKER,PASA,Gnomon,Ensembl,etc.

GENE PREDICTION & ANNOTATION

19BIO-REFRESHER

Good genes are required!

1. Generate gene modelsv A few rounds of gene prediction.

2. Annotate gene modelsv Function, expression patterns,

metabolic network memberships.

3. Manually review themv Structure & Function.

Best representation of biology & removal of elements reflecting errors in automated analyses.

Functional assignments through comparative analysis using literature, databases, and experimental data.

Apollo

Gene Ontology

Curation improves quality

21BIO-REFRESHER

Curation is inherently collaborative

• It is impossible for a single individual to curate an entire genome with precise biological fidelity.

• Curators need second opinions and insights from colleagues with domain and gene family expertise.

CollaborativecurationwithApollo

Apollo: genome annotation editing• Collaborative, instantaneous, web-based, built on top of JBrowse.

• Supports real time collaboration & generates analysis-ready data

USER-CREATED ANNOTATIONS

EVIDENCE TRACKS

ANNOTATOR PANEL

GenomeArchitect.org

BECOMING ACQUAINTED WITH APOLLO

General process of curation

1. Select or find a region of interest (e.g. scaffold).

2. Select appropriate evidence tracks to review the genome element to annotate (e.g. gene model).

3. Determine whether a feature in an existing evidence track will provide a reasonable gene model to start working.

4. If necessary, adjust the gene model.

5. Check your edited gene model for integrity and accuracy by comparing it with available homologs.

6. Comment and finish.

ColorbyCDSframe,togglestrands,setcolorschemeandhighlights.

Uploadevidencefiles(GFF3,BAM,BigWig),addcombinationandsequencesearchtracks.

QuerythegenomeusingBLAT.

Navigationandzoom.Searchforagenemodelorascaffold.

User-createdannotations. Annotator

panel.

EvidenceTracks.

Stageandcell-typespecifictranscriptiondata.

GenomeArchitect.org

Apollo Genome Annotation Editor

Protein coding, pseudogenes, ncRNAs, regulatory elements, variants, etc.

Admin.

Apollo Architecture

GenomeArchitect.org

Web-based client + annotation-editing engine + server-side data service

Generates ready-made computable data

GenomeArchitect.org

Ref Sequence

Supports real time collaboration

GenomeArchitect.org

Let’sPlay!

Access Apollo

Annotations Organism Users Groups AdminTracks Reference Sequence

Removable side dock

1

Annotations

gene

mRNA

Annotation details & exon boundaries

1

2

2

CuratingwithApollo

34 | BECOMING ACQUAINTED WITH APOLLO

USER NAVIGATION

Annotatorpanel.

• Chooseappropriateevidencefromlistof“Tracks”onannotatorpanel.

• Select&dragelementsfromevidencetrackintothe‘User-createdAnnotations’area.

• Hoveringoverannotationinprogressbringsupaninformationpop-up.

• Creatinganewannotation

Adding a gene model

Adding a gene model

Adding a gene model

Editing functionality

Editing functionalityExample: Adding an exon supported by experimental data

• RNAseq reads show evidence in support of a transcribed product that was not predicted.• Add exon by dragging up one of the RNAseq reads.

Editing functionalityExample: Adjusting exon boundaries supported by experimental data

41 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• ‘Zoomtobaselevel’ revealstheDNATrack.

42 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• ColorexonsbyCDSfromthe‘View’menu.

43 |

Zoomin/outwithkeyboard:shift+arrowkeysup/down

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• TogglereferenceDNAsequenceand translationframesinforwardstrand.Togglemodelsineitherdirection.

annotatingsimplecases

“Simplecase”:- thepredictedgenemodeliscorrectornearlycorrect,and- thismodelissupportedbyevidencethatcompletely ormostlyagreeswiththeprediction.- evidencethatextendsbeyondthepredictedmodelisassumedtobenon-codingsequence.

Thefollowingaresimplemodifications.

ANNOTATING SIMPLE CASES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

• A confirmation box will warn you if the receiving transcript is not on thesame strand as the feature where the new exon originated.

• Check ‘Start’ and ‘Stop’ signals after each edit.

ADDING EXONS

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Iftranscriptalignmentdataareavailable&extendbeyondyouroriginalannotation,youmayextendoraddUTRs.

1. Rightclickattheexonedgeand‘Zoomtobaselevel’.

2. PlacethecursorovertheedgeoftheexonuntilitbecomesablackarrowthenclickanddragtheedgeoftheexontothenewcoordinatepositionthatincludestheUTR.

ADDING UTRs

ToaddanewsplicedUTRtoanexistingannotationalsofollowtheprocedureforaddinganexon.

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

To modify an exon boundary and matchdata in the evidence tracks: selectboth the [offending] exon and thefeature with the expected boundary,then right click on the annotation toselect ‘Set 3’ end’ or ‘Set 5’ end’ asappropriate.

Insomecasesallthedatamaydisagreewiththeannotation,inothercasessomedatasupporttheannotationandsomeofthe

datasupportoneormorealternativetranscripts.Trytoannotateasmanyalternativetranscriptsasarewellsupportedbythedata.

MATCHING EXON BOUNDARY TO EVIDENCE

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

1. Twoexonsfromdifferenttrackssharingthesamestart/endcoordinatesdisplayaredbartoindicatematchingedges.

2. Selectingthewholeannotationoroneexonatatime,usethis edge-matching functionandscrollalongthelengthoftheannotation,verifyingexonboundariesagainstavailabledata.Usesquare[]bracketstoscrollfromexontoexon.Usercurly{}bracketstoscrollfromannotationtoannotation.

3. CheckifcDNA/RNAseqreadslackoneormoreoftheannotatedexonsorincludeadditionalexons.

CHECKING EXON INTEGRITY

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Non-canonicalsplicesitesflags. Doubleclick:selectionoffeatureandsub-features

EvidenceTracksArea

‘User-createdAnnotations’Track

Edge-matching

Apollo’seditinglogic(brain):§ selectslongestORFasCDS§ flagsnon-canonicalsplicesites

ORFs AND SPLICE SITES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Non-canonicalsplices areindicatedbyanorangecirclewithawhiteexclamationpointinside,placedovertheedgeoftheoffendingexon.

Canonicalsplicesites:

3’-…exon]GA/TG[exon…-5’

5’-…exon]GT/AG[exon…-3’reversestrand,notreverse-complemented:

forwardstrand

SPLICE SITES

Zoom toreviewnon-canonicalsplicesitewarnings.Althoughthesemaynotalwayshavetobecorrected(e.g GCdonor),theyshouldbeflaggedwithacomment.

Exon/intronsplicesiteerrorwarning

Curatedmodel

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Apollocalculatesthelongestpossibleopenreadingframe(ORF)thatincludescanonical‘Start’and‘Stop’signalswithinthepredictedexons.

If‘Start’appearstobeincorrect,modifyitbyselectinganin-frame‘Start’codonfurtherupordownstream,dependingonevidence(proteins,RNAseq).

Itmaybepresentoutsidethepredictedgenemodel,withinaregionsupportedbyanotherevidencetrack.

Inveryrarecases,theactual‘Start’ codonmaybenon-canonical(non-ATG).

‘Start’ AND ‘Stop’ SITES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

annotatingcomplexcases

Evidencemaysupportjoiningtwoormoredifferentgenemodels.Warning: proteinalignmentsmayhaveincorrectsplicesitesandlacknon-conservedregions!

1. In‘User-createdAnnotations’area shift-clicktoselectanintronfromeachgenemodelandrightclicktoselectthe‘Merge’ optionfromthemenu.

2. Dragsupportingevidencetracksoverthecandidatemodelstocorroborateoverlap,orreviewedgematchingandcoverageacrossmodels.

3. Checktheresultingtranslationbyqueryingaproteindatabase e.g.UniProt,NCBInr.Addcommentstorecordthatthisannotationistheresultofamerge.

Redlinesaroundexons:‘edge-matching’allowsannotatorstoconfirmwhethertheevidenceisinagreementwithoutexaminingeachexonatthebaselevel.

COMPLEX CASESmerge two gene predictions on the same scaffold

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Oneormoresplitsmayberecommendedwhen:- differentsegmentsofthepredictedproteinaligntotwoormoredifferentgenefamilies- predictedproteindoesn’taligntoknownproteinsoveritsentirelength- Transcriptdatamaysupportasplit,butfirstverifywhethertheyarealternativetranscripts.

COMPLEX CASESsplit a gene prediction

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

DNATrack

‘User-createdAnnotations’Track

COMPLEX CASESannotate frameshifts and correct single-base errors

Alwaysremember:whenannotatinggenemodelsusingApollo,youarelookingata‘frozen’versionofthegenomeassemblyandyouwillnotbeabletomodifytheassemblyitself.

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

COMPLEX CASEScorrecting selenocysteine containing proteins

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

COMPLEX CASEScorrecting selenocysteine containing proteins

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

1. Apolloallowsannotatorstomakesinglebasemodificationsorframeshiftsthatarereflectedinthesequenceandstructureofanytranscriptsoverlappingthemodification.ThesemanipulationsdoNOTchangetheunderlyinggenomicsequence.

2. Ifyoudeterminethatyouneedtomakeoneofthesechanges,zoomintothenucleotidelevelandrightclickoverasinglenucleotideonthegenomicsequencetoaccessamenuthatprovidesoptionsforcreatinginsertions,deletionsorsubstitutions.

3. The‘CreateGenomicInsertion’featurewillrequireyoutoenterthenecessarystringofnucleotideresiduesthatwillbeinsertedtotherightofthecursor’scurrentlocation.The‘CreateGenomicDeletion’ optionwillrequireyoutoenterthelengthofthedeletion,startingwiththenucleotidewherethecursorispositioned.The‘CreateGenomicSubstitution’featureasksforthestringofnucleotideresiduesthatwillreplacetheonesontheDNAtrack.

4. Onceyouhaveenteredthemodifications,Apollowillrecalculatethecorrectedtranscriptandproteinsequences,whichwillappearwhenyouusetheright-clickmenu‘GetSequence’option.Sincetheunderlyinggenomicsequenceisreflectedinallannotationsthatincludethemodifiedregionyoushouldalertthecuratorsofyourorganismsdatabaseusingthe‘Comments’sectiontoreporttheCDSedits.

5. Inspecialcasessuchasselenocysteinecontainingproteins(read-throughs),right-clickovertheoffending/premature‘Stop’signalandchoosethe‘Setreadthroughstopcodon’optionfromthemenu.

COMPLEX CASESannotating frameshifts and correcting single-base errors & selenocysteines

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

60 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• Information Editor

TheAnnotationInformationEditorUSER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

TheAnnotationInformationEditor

• AddPubMedIDs• IncludeGO termsasappropriate

fromanyofthethreeontologies• Writecomments statinghowyou

havevalidatedeachmodel.

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

63 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• Keeping track of each edit

Annotations,annotationedits,andHistory: storedinacentralizeddatabase.

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

Followthechecklistuntilyouarehappywiththeannotation!

Andrememberto…– commenttovalidateyourannotation,evenifyoumadenochangestoanexistingmodel.Thinkofcommentsasyourvoteofconfidence.

– oraddacommenttoinformthecommunityofunresolvedissuesyouthinkthismodelmayhave.

65 |

AlwaysRemember:Apollocurationisacommunityeffortsopleaseusecommentstocommunicatethereasonsforyour

annotation.Yourcommentswillbevisibletoeveryone.

COMPLETING THE ANNOTATION

BECOMING ACQUAINTED WITH APOLLO

Checklist

• Check‘Start’ and‘Stop’sites.

• Checksplicesites:mostsplicesitesdisplaytheseresidues…]5’-GT/AG-3’[…

• CheckifyoucanannotateUTRs,forexampleusingRNA-Seq data:– alignitagainstrelevantgenes/genefamily– blastp againstNCBI’sRefSeq ornr

• Checkforgaps inthegenome.

• Additionalfunctionalitymaybenecessary:–merging 2genepredictions- samescaffold– ‘merging’ 2genepredictions- differentscaffolds

– splitting ageneprediction– annotating frameshifts– annotatingselenocysteines,correctingsingle-baseandotherassemblyerrors,etc.

67 |

• Add:– Importantprojectinformationintheformof

comments– IDsfrompublicdatabasese.g.GenBank (via

DBXRef),genesymbol(s),commonname(s),synonyms,topBLASThits,orthologswithspeciesnames,andeverythingelseyoucanthinkof,becauseyouaretheexpert.

– Commentsaboutthekindsofchangesyoumadetothegenemodelofinterest,ifany.

– Anyappropriatefunctionalassignments,e.g.viaBLAST,RNA-Seq data,literaturesearches,etc.

CHECKLISTfor accuracy and integrity

MANUAL ANNOTATION CHECKLIST

Example

What’s new?... finding inspiration in the literature.

Example 69

“Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.”

Example

Example 70

CurationexampleusingtheHyalella aztecagenome(amphipodcrustacean).

What we know about this genome

• DataatNCBI:• >37,000 nucleotideseqsà scaffolds,mitochondrialgenes• 344 aminoacidseqsà mitochondrion• 47 ESTs• 0 conserveddomainsidentified• 0 “gene”entriessubmitted

• Dataati5KWorkspace@NAL(annotationhostedatUSDA)- 10,832scaffolds:23,288transcripts:12,906proteins

Example 71

PubMed Search: what’s new?

Example 72

PubMed Search: what’s new?

Example 73

“Tenpopulationsdifferedbyatleast550-foldinsensitivity topyrethroids.”

“Sequencingtheprimarypyrethroid targetsite,thevoltage-gatedsodiumchannel(vgsc),showsthatpointmutationsandtheirspreadinnaturalpopulationswereresponsiblefordifferencesinpyrethroid sensitivity.”

“Thefindingthatanon-targetaquaticspecieshasacquiredresistancetopesticidesusedonlyonterrestrialpestsistroublingevidenceoftheimpactofchronicpesticidetransportfromland-basedapplicationsintoaquaticsystems.”

How many sequences are there, publicly available, for our gene of interest?

Example 74

• Para,(voltage-gatedsodiumchannelalphasubunit;Nasonia vitripennis).

• NaCP60E (Sodiumchannelprotein60E;D.melanogaster).– MF:voltage-gatedcation channelactivity(IDA,GO:0022843).

– BP:olfactorybehavior(IMP,GO:0042048),sodiumiontransmembrane transport(ISS,GO:0035725).

– CC:voltage-gatedsodiumchannelcomplex(IEA,GO:0001518).

Andwhatdoweknowaboutthem?

Retrieving sequences for a sequence similarity search.

Example 75

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

BLAT searchinput

Example 76

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

BLAT searchresults

Example 77

• High-scoringsegmentpairs(hsp)arelistedintabulatedformat.

• Clickingononelineofresultssendsyoutothosecoordinates.

Creating a new gene model: drag and drop

Example 78

• ApolloautomaticallycalculateslongestORF.

• Inthiscase,ORFincludesthehigh-scoringsegmentpairs(hsp),markedhereinblue.

• Notethatgeneistranscribedfromreversestrand.

Available evidence tracks

Example 79

Get Sequence

Example 80

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Also, flanking sequences (other gene models) vs. NCBI nr

Example 81

Inthiscase,twogenemodelsupstream,at5’end.

BLASThsps

Review alignments

Example 82

HaztTmpM006234

HaztTmpM006233

HaztTmpM006232

Hypothesis for vgsc gene model

Example 83

Editing: merge the three models

Example 84

Mergebydroppinganexonorgenemodelontoanother.

Mergebyselectingtwoexons(holdingdown“Shift”)andusingtherightclickmenu.

or…

Result of merging the gene models:

Example 85

Editing: correct offending splice site

Example 86

Modifyexon/intronboundary:- Dragtheendofthe

exontothenearestcanonicalsplicesite.

or

- Useright-clickmenu.

Editing: set translation start

Example 87

Editing: delete exon not supported by evidence

Example 88

DeletefirstexonfromHaztTmpM006233

Editing: add an exon supported by RNAseq

Example 89

• RNAseqreadsshowevidenceinsupportoftranscribedproduct,whichwasnotpredicted.• Addexonatcoordinates97946-98012bydragginguponeoftheRNAseqreads.

Editing: adjust offending splice site using evidence

Example 90

Editing: adjust other boundaries supported by evidence

Example 91

Finished model

Example 92

Corroborateintegrityandaccuracyofthemodel:- Start andStop- Exonstructureandsplicesites…]5’-GT/AG-3’[…- Checkthepredictedproteinproductvs.NCBInr,UniProt,etc.

Information Editor

• DBXRefs:e.g.NP_001128389.1,N.vitripennis,RefSeq

• PubMedidentifier:PMID:24065824

• GeneOntologyIDs:GO:0022843,GO:0042048,GO:0035725,GO:0001518.

• Comments

• Name,Symbol

• Approve/Deleteradiobutton

Example 93

Comments(ifapplicable)

Exercises

Example dataset: Apis mellifera

GenomeArchitect.org

Practice using the Apis mellifera genome

GenomeArchitect.org

1. Evidence in support of protein coding gene models.

1.1 Consensus Gene Sets:Official Gene Set v3.2Official Gene Set v1.0

1.2 Consensus Gene Sets comparison:OGSv3.2 genes that merge OGSv1.0 andRefSeq genesOGSv3.2 genes that split OGSv1.0 and RefSeq genes

1.3 Protein Coding Gene Predictions Supported by Biological Evidence:NCBI GnomonFgenesh++ with RNASeq training dataFgenesh++ without RNASeq training dataNCBI RefSeq Protein Coding Genes and Low Quality Protein Coding Genes

1.4 Ab initio protein coding gene predictions:Augustus Set 12, Augustus Set 9, Fgenesh, GeneID, N-SCAN, SGP2

1.5 Transcript Sequence Alignment:NCBI ESTs, Apis cerana RNA-Seq, Forager Bee Brain Illumina Contigs, Nurse Bee Brain Illumina Contigs, Forager RNA-Seq reads, Nurse RNA-Seq reads, Abdomen 454 Contigs, Brain and Ovary 454 Contigs, Embryo 454 Contigs, Larvae 454 Contigs, Mixed Antennae 454 Contigs, Ovary 454 Contigs, Testes 454 Contigs, Forager RNA-Seq HeatMap, Forager RNA-Seq XY Plot, Nurse RNA-Seq HeatMap, Nurse RNA-Seq XY Plot

Practice using the Apis mellifera genome

GenomeArchitect.org

1. Evidence in support of protein coding gene models (Continued).

1.6 Protein homolog alignment:Acep_OGSv1.2Aech_OGSv3.8Cflo_OGSv3.3Dmel_r5.42Hsal_OGSv3.3Lhum_OGSv1.2Nvit_OGSv1.2Nvit_OGSv2.0Pbar_OGSv1.2Sinv_OGSv2.2.3Znev_OGSv2.1Metazoa_Swissprot

2. Evidence in support of non protein coding gene models

2.1 Non-protein coding gene predictions:NCBI RefSeq Noncoding RNANCBI RefSeq miRNA

2.2 Pseudogene predictions:NCBI RefSeq Pseudogene

Apollo Development

Nathan DunnTechnical Lead Eric Yao

Christine Elsik’s Lab, University of Missouri

Suzi LewisPrincipal Investigator

BBOP

Moni Munoz-TorresProject Manager

Deepak Unni

JBrowse. Ian Holmes’ Lab University of California, Berkeley

Thank you!Berkeley Bioinformatics Open-Source Projects, Environmental Genomics & Systems Biology, Lawrence Berkeley National Laboratory

Funding• Apollo is supported by NIH grants 5R01GM080203 from NIGMS,

and 5R01HG004483 from NHGRI.

• BBOP is also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231

Collaborators• Ian Holmes, Eric Yao, UC Berkeley (JBrowse)• Chris Elsik, Deepak Unni, U of Missouri (Apollo)• Monica Poelchau, USDA/NAL (Apollo)• i5k Community

berkeleybop.org

UNIVERSITY OF CALIFORNIA

Suzanna Lewis & Chris Mungall

Seth Carbon (Noctua / AmiGO)Nathan Dunn (Apollo)Monica Munoz-Torres (Apollo / GO)Jeremy Nguyen Xuan (Monarch Init.)

PUBLIC DEMO100 |

APOLLO ON THE WEBinstructions

Public Honey bee demo available at:

genomearchitect.org/demo/

Username:demo@demo.com

Password:demo

APOLLOdemonstration

PUBLIC DEMO 101

Demonstrationvideoavailableathttps://youtu.be/VgPtAP_fvxY

Access Apollo

top related