Download - Apolo Taller en BIOS
![Page 1: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/1.jpg)
Editando anotaciones con Apollo Un taller para la comunidad científica reunida en BIOS
Monica Munoz-Torres, PhD | @monimunozto
Berkeley Bioinformatics Open-Source Projects (BBOP)Lawrence Berkeley National Laboratory | University of California Berkeley | U.S. Department of Energy
BIOS, Manizales, Colombia | 21 Septiembre, 2015
![Page 2: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/2.jpg)
APOLLO DEVELOPMENT
APOLLO DEVELOPERS 2
h"p : / /GenomeA r c h i t e c t . o r g /
Nathan Dunn
Eric Yao JBrowse, UC Berkeley
Christine Elsik’s Lab, University of Missouri
Suzi Lewis Principal Investigator
BBOP
Moni Munoz-Torres
Stephen Ficklin GenSAS,
Washington State University
Colin Diesh Deepak Unni
![Page 3: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/3.jpg)
OUTLINE
Web Apollo Collabora(ve Cura(on and Interac(ve Analysis of Genomes
3 OUTLINE
• Hoy descubriremos cómo sortear obstáculos para extraer la información más valiosa en un proyectos de secuenciación & anotación de genomas.
![Page 4: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/4.jpg)
4
BY THE END OF THIS TALKyou will
v BeAer understand genome cura(on in the context of annota(on: assembled genome à automated annota=on à manual annota=on
v Become familiar with the environment and func(onality of the Apollo genome annota(on edi(ng tool.
v Learn to iden(fy homologs of known genes of interest in a newly sequenced genome.
v Learn about corrobora(ng and modifying automa(cally annotated gene models using available evidence in Apollo.
Introduction
![Page 5: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/5.jpg)
¿Cómo se traza el mapa de un genoma?
![Page 6: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/6.jpg)
6
El mapa del genoma
Introduction
Diseño & muestreo
Análisis comparativos
Colección consenso de genes
Anotación manual
Anotación automatizada
Secuenciación Ensamblaje
Síntesis & publicación
![Page 7: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/7.jpg)
7
El mapa del genoma
Introduction
Diseño & muestreo
Análisis comparativos
Colección consenso de genes
Anotación manual
Anotación automatizada
Secuenciación Ensamblaje
Síntesis & publicación
QC
QC
QC QC
QC QC
QC
![Page 8: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/8.jpg)
CURATING GENOMESsteps involved
1 Genera=on of Gene Models calling ORFs, one or more rounds of gene predic(on, etc.
2 Annota=on of gene models Describing func(on, expression paAerns, metabolic network memberships.
3 Manual annota=on
CURATING GENOMES 8
![Page 9: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/9.jpg)
ANOTACION DE GENOMASrequiere precisión y profundidad
Anotando Genomas 9
La colección de genes de cada organismo informa una variedad de análisis: • Número de genes, % GC, composición de TEs, áreas repe((vas • Asignar función
• Evolución molecular, conservación de secuencias • Familias de genes • Caminos metabólicos • ¿Qué hace único a cada organismo?
¿Qué hace “abeja” a una abeja?
Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
![Page 10: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/10.jpg)
Refresquemos nuestra memoria.
![Page 11: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/11.jpg)
REVIEW ON YOUR OWNfor manual annotation
To remember… Biological concepts to beAer understand manual annota(on
11 FOOD FOR THOUGHT
• GLOSSARY from con1g to splice site
• CENTRAL DOGMA
in molecular biology • WHAT IS A GENE?
defining your goal
• TRANSCRIPTION mRNA in detail
• TRANSLATION
and other defini(ons
• GENOME CURATION steps involved
![Page 12: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/12.jpg)
12 BIO-REFRESHER
What is a gene?
v The defini(on of a gene paints a very complex picture of molecular ac(vity and it is a con(nuously evolving concept.
• From the Sequence Ontology (SO): “A gene is a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other func(onal sequence regions”. “Evolving Concept” at hAp://goo.gl/LpsajQ
![Page 13: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/13.jpg)
13 BIO-REFRESHER
What is a gene?
v In our life(me, the Encyclopedia of DNA Elements (ENCODE) project updated this concept yet again. Long transcripts & dispersed regula1on!
“A gene is a DNA segment that contributes phenotype/func(on. In the absence of demonstrated func(on, a gene may be characterized by sequence, transcrip(on or homology.”
https://www.encodeproject.org/
![Page 14: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/14.jpg)
14 BIO-REFRESHER
What is a gene?let’s think computationally!
v Think of the genome as an operating system for a living being
• Considering that the nucleo(des of the genome are put together into a code that is executed through the process of transcription and translation…
• … think of genes as subroutines that are repe((vely called in the process of transcription
Gerstein et al., 2007. Genome Res.
![Page 15: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/15.jpg)
15 BIO-REFRESHER
What is a gene?considerations
v Also consider : • A gene is a genomic sequence (DNA or RNA) directly encoding
func(onal product molecules, either RNA or protein.
• If several func(onal products share overlapping regions, we take the union of all overlapping genomics sequences coding for them.
• This union must be coherent – i.e., processed separately for final protein and RNA products – but does not require that all products necessarily share a common subsequence.
Gerstein et al., 2007. Genome Res.
![Page 16: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/16.jpg)
16 BIO-REFRESHER
“El gen es la unión de secuencias genómicas que codifican una
colección coherente de productos
funcionales que
pueden o no superponerse.”
Gerstein et al., 2007. Genome Res
El Gen: un blanco en movimiento.
¿QUÉ ES UN GEN?
![Page 17: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/17.jpg)
17 BIO-REFRESHER
TRANSLATIONreading frame
v Reading frame is a manner of dividing the sequence of nucleo(des in mRNA (or DNA) into a set of consecu(ve, non-‐overlapping triplets (codons).
v Three frames can be read in the 5’ à 3’ direc(on. Given that DNA has two an(-‐parallel strands, an addi(onal three frames are possible to be read on the an(-‐sense strand. Six total possible reading frames exist.
v In eukaryotes, only one reading frame per sec(on of DNA is biologically relevant at a (me: it has the poten(al to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF) • ORF = Start signal + coding sequence (divisible by 3) + Stop signal
v The sec(ons of the mature mRNA transcribed with the coding sequence but not translated are called UnTranslated Regions (UTR); one at each end.
![Page 18: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/18.jpg)
18
"Reading Frame" by Hornung Ákos - Wikimedia Commons
BIO-REFRESHER
TRANSLATIONreading frame
![Page 19: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/19.jpg)
19
"ORF" by Thatsonginc - Wikimedia Commons
BIO-REFRESHER
TRANSLATIONreading frame
![Page 20: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/20.jpg)
20 BIO-REFRESHER
TRANSLATIONreading frame: splice sites
v The spliceosome catalyzes the removal of introns and the liga(on of flanking exons. • introns: spaces inside the gene, not part of the coding sequence • exons: expression units (of the coding sequence)
v Splicing “signals” (from the point of view of an intron): • There is a 5’ end splice “signal” (site): usually GT (less common: GC) • And a 3’ end splice site: usually AG • …]5’-‐GT/AG-‐3’[…
v It is possible to produce more than one protein (polypep(de) sequence from the same genic region, by alterna(vely bringing exons together= alterna=ve splicing. For example, the gene Dscam (Drosophila) has 38,000 alterna(vely spliced mRNAs = isoforms
![Page 21: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/21.jpg)
21
"Gene structure" by Daycd- Wikimedia Commons
BIO-REFRESHER
TRANSLATIONnow in your mind
• Although of brief existence, understanding mRNAs is crucial, as they will become the center of your work.
![Page 22: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/22.jpg)
22
Text for figures goes here
BIO-REFRESHER
TRANSLATIONreading frame: phase
v Introns can interrupt the reading frame of a gene by inser(ng a sequence between two consecu(ve codons
v Between the first and second nucleo(de of a codon
v Or between the second and third nucleo(de of a codon
"Exon and Intron classes”. Licensed under Fair use via Wikipedia
![Page 23: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/23.jpg)
23
"Protein synthesis" by Kelvinsong - Wikimedia Commons
CURATING GENOMES
TRANSLATIONin detail
![Page 24: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/24.jpg)
24 BIO-REFRESHER
HICCUPSin transcription and translation
v The presence of premature Stop codons in the message is possible. A process called non-‐sense mediated decay checks for them and corrects them to avoid: incomplete splicing, DNA muta(ons, transcrip(on errors, and leaky scanning of ribosome – causing changes in the reading frame (frame shiYs).
v Inser(ons and dele(ons (indels) can cause frame shios, when indel is not divisible by three (3). As a result, the pep(de can be abnormally long, or abnormally short – depending when the first in-‐frame Stop signal is located.
![Page 25: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/25.jpg)
Predicción & Anotación
![Page 26: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/26.jpg)
26 Gene Prediction
GENE PREDICTION
v The iden(fica(on of structural features of the genome:
• Primarily focused on protein-‐coding genes. • Predicts also transfer RNAs (tRNA), ribosomal RNAs (rRNA),
regulatory mo(fs, long and small non-‐coding RNAs (ncRNA), repe((ve elements (masked), etc.
• Two methods for iden(fica(on. • Some are self-‐trained and some must be trained.
![Page 27: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/27.jpg)
27 Gene Prediction
GENE PREDICTIONmethods for discovery
1) Ab ini,o: -‐ based on DNA composi(on, -‐ deals strictly with genomic sequences -‐ makes use of sta(s(cal approaches to search for coding regions and typical gene signals. • E.g. Augustus, GENSCAN,
geneid, fgenesh, etc.
3’
Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920
![Page 28: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/28.jpg)
28
Nucleic Acids 2003 vol. 31 no. 13 3738-3741
Gene Prediction
GENE PREDICTIONmethods for discovery (ctd)
2) Homology-‐based: -‐ evidence-‐based, -‐ finds genes using either similarity searches in the main databases or experimental data including RNAseq, expressed sequence tags (ESTs), full-‐length complementary DNAs (cDNAs), etc.
• E.g: fgenesh++, Just Annotate My genome (JAMg), SGP2
![Page 29: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/29.jpg)
29
GENE ANNOTATION
Integra(on of data from computa(onal & experimental evidence with data from predic(on tools, to generate a reliable set of structural annota=ons. Involves: 1) ab ini1o predic(ons 2) assessment of biological evidence to drive the gene predic(on process 3) synthesis of these results to produce a set of consensus gene models
Gene Annotation
![Page 30: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/30.jpg)
30
In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representa(on.
GENE ANNOTATION
Gene models may be organized into “sets” using: v automa(c integra(on of predicted sets (combiners); e.g: GLEAN,
EvidenceModeler or
v tools packaged into pipelines; e.g: MAKER, PASA, Gnomon, Ensembl, etc.
Gene Annotation
![Page 31: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/31.jpg)
ANOTACIONun arte imperfecto
No one is perfect, least of all automated annotation. 31
Nuevas tecnologías traen nuevos retos: • Errores en el ensamblaje pueden causar
fragmentación en las anotaciones • Cobertura limitada dificulta la
iden(ficación con certeza
Image: www.BroadInstitute.org
![Page 32: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/32.jpg)
ANOTACION MANUALmejorando predicciones
Schiex et al. Nucleic Acids 2003 (31) 13: 3738-‐3741
Predicciones Automatizadas
Evidencia Experimental
Manual Annotation – to the rescue. 32
cDNAs, búsquedas con HMM, RNAseq, genes de otras especies.
Entonces, es necesario refinar las predicciones de elementos biológicos
codificados en el genoma, lo que requiere una cuidadosa revisión.
![Page 33: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/33.jpg)
33
BIOCURACIONajustes estructurales y funcionales
Iden(ficar los elementos del genoma que mejor representan la biología subyacente y eliminar los elementos que reflejan errores sistémicos de los análisis automa(zados.
Asignar funciones a través de análisis compara(vos entre elementos genómicos similares de organismos cercanamente relacionados usando literatura, bases de datos, y datos experimentales.
BIOCURACION
hAp://GeneOntology.org
1
2
![Page 34: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/34.jpg)
MANUAL ANNOTATION 34
PERO, EN CURACIONno siempre era posible ampliar estos esfuerzos
Researchers on their own; may or may not publicize results; may be a dead-‐end with very few people ever aware of these results.
Elsik et al. 2006. Genome Res. 16(11):1329-‐33.
Too many sequences and not enough hands.
A small group of highly trained experts (e.g. GO).
1 Museum
A few very good biologists, a few very good bioinforma(cians camping together for intense but short periods of (me.
Jamboree 2
Co"age 3
![Page 35: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/35.jpg)
ANOTACIONun ejercicio en colaboración
COLABORANDO 35
Los inves1gadores usualmente buscamos las opiniones y percepciones de colegas con
experiencia en áreas específicas del conocimiento.
Por ejemplo, dominios conservados o familias de genes.
![Page 36: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/36.jpg)
Apollouna herramienta para editar anotaciones
36
v En la web, integrado con JBrowse.
v ¡Permite la colaboración en (empo real!
v Automá(camente genera datos en
formatos comunes para análisis.
v Anotación manual de genes, pseudogenes, tRNAs,
snRNAs, snoRNAs, ncRNAs, miRNAs, TEs, y fragmentos repe((vos.
v Funciones intui(vas y menús desplegables crean y editan estructuras
de transcritos y exones, insertan comentarios (CV, texto libre), y
términos de GO, etc.
INTRODUCING APOLLO
hAp : / /GenomeArch i tec t .o rg /
![Page 37: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/37.jpg)
ARQUITECTURAsimple, flexible
ARCHITECTURE 37
Cliente de web + Motor de edición de anotaciones + Servicio de datos en el servidor
REST / JSON Websockets
Motor de Anotación (Servidor)
Shiro
LDAP
OAuth
Annotations
Security
Preferences
Organisms
Tracks
BAM BED VCF GFF3 BigWig
Curadores
Google Web Toolkit (GWT) / Bootstrap
JBrowse DOJO / jQuery Datos a JBrowse Organismo 1
Carga de datos con evidencia genómica para cada organismo
Servicio único de almacenamiento PostgreSQL, MySQL, MongoDB,
ElasticSearch
Apollo v2.0
Datos a JBrowse Organismo 2
![Page 38: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/38.jpg)
CLIENTE DE WEBpanel del curador
ARCHITECTURE 38
Motor de Anotación (Servidor)
Curadores
Google Web Toolkit (GWT) / Bootstrap
JBrowse DOJO / jQuery
Apollo v2.0
BAM BED VCF GFF3 BigWig
REST / JSON Websockets
Usa GWT/Bootstrap en el frente para proveerle
un comportamiento versátil a la aplicación.
Panel Del Curador
¡NUEVO!
¡NUEVO!
![Page 39: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/39.jpg)
MOTOR DE ANOTACIONlógica de edición
ARCHITECTURE 39
Motor de Anotación (Servidor)
Shiro
LDAP
OAuth Datos a JBrowse
Organismo 2
Datos a JBrowse Organismo 1
Servicio único de almacenamiento
Apollo v2.0
Controladores Grails (J2EE servlet) llevan las solicitudes al directorio
de datos apropiado para cada organismo en JBrowse
Carga de datos con evidencia genómica para cada organismo
¡NUEVO!
Cliente de web
REST / JSON Websockets
![Page 40: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/40.jpg)
SERVICIO DE DATOS EN EL SERVIDORservicio único de almacenamiento
ARCHITECTURE 40
Anotaciones
Seguridad
Preferencias
Organismos
Pistas de datos Servicio único de almacenamiento PostgreSQL, MySQL, MongoDB,
ElasticSearch
Motor de Anotación (Servidor)
Un solo servicio de almacenamiento, consultable, para guardar las anotaciones. ¡NUEVO!
Apollo v2.0
![Page 41: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/41.jpg)
¡COLABOREMOS!Apollo tiene código abierto y es expandible
HIGHLIGHTED IMPROVEMENTS 41
The Genome Sequence Annotation Server (GenSAS) Annotate
Los usuarios pueden adicionar programas para permi=r sus propios procesos de trabajo.
Ejemplos: • GenSAS: plataforma para
anotación estructural del genoma.
• i5K: -‐ Espacio en NAL para compar(r ensamblajes y conjuntos de genes, y para anotación manual. -‐ Proyecto Piloto >40 genomas: 47 charlas, 9 posters en Simposio de Genómica de Artrópodos.
Annotate
National Agricultural Library
![Page 42: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/42.jpg)
We train and support hundreds of geographically dispersed scien(sts from diverse research communi(es to conduct manual annota(ons, to recover coding sequences in agreement with all available biological evidence using Apollo. v Gate keeping and monitoring. v Tutorials, training workshops, and “geneborees”.
42
DISPERSED COMMUNITIES collaborative manual annotation efforts
APOLLO
![Page 43: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/43.jpg)
LESSONS LEARNED
What we have learned: • Collabora(ve work dis(lls invaluable knowledge • We must enforce strict rules and formats • We must evolve with the data • A liAle training goes a long way • NGS poses addi(onal challenges
LESSONS LEARNED 43
![Page 44: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/44.jpg)
¿Cuál es la tarea del curador?
![Page 45: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/45.jpg)
Becoming Acquainted with Web Apollo 45 | 45
GENERAL PROCESS OF CURATIONmain steps to remember
1. Select or find a region of interest, e.g. scaffold. 2. Select appropriate evidence tracks to review the gene model.
3. Determine whether a feature in an exis(ng evidence track will provide a reasonable gene model to start working.
4. If necessary, adjust the gene model.
5. Check your edited gene model for integrity and accuracy by comparing it with available homologs.
6. Comment and finish.
![Page 46: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/46.jpg)
46 CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FORannotators: that’s you!
v Annota=ng a simple case: WHEN “The official predic(on is correct, or nearly correct, assuming that no aligned data extends beyond the gene model and if so, it is not likely to be coding sequence, and/or the gene predic(on matches what you know about the gene”: a. Can you add UTRs? b. Check exon structures. c. Check splice sites: …]5’-‐GT/AG-‐3’[… d. Check ‘start’ and ‘stop’ sites. e. Check the predicted protein product(s). f. If the protein product s(ll does not look correct, go on to “Annota(ng
more complex cases”.
![Page 47: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/47.jpg)
47 CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FORcontinued
v Addi=onal func=onality. You may also need to learn how to: a. Get genomic sequence b. Merge exons c. Add/Delete an exon d. Create an exon de novo (within an intron or outside exis(ng
annota(ons). e. Right/apple-‐click on a feature to get feature ID and addi(onal
informa(on f. Looking up homolog descrip(ons going to the accession web page at
UniProt/Swissprot
![Page 48: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/48.jpg)
48 CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FORcontinued
v Annota=ng more complex cases: a. Incomplete annota(on: protein integrity checks, indicate gaps, missing 5’
sequences or missing 3’ sequences. b. Merge of 2 gene predic(ons on same scaffold c. Merge of 2 gene predic(ons on different scaffolds (uh-‐oh!). d. Split of a gene predic(on e. Frameshios, Selenocysteine, single-‐base errors, and other inconvenient
phenomena
![Page 49: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/49.jpg)
49 CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FORcontinued
v Adding important project informa=on in the form of Canned and/or Customized Comments: a. NCBI ID, RefSeq ID, gene symbol(s), common name(s), synonyms, top
BLAST hits (GenBank IDs), orthologs with species names, and anything else you can think of, because you are the expert.
b. Type of annota(on (e.g.: whether or not the gene model was changed) c. Data source (for example if the Fgeneshpp predicted gene was the
star(ng point for your annota(on) d. The kinds of changes you made to the gene model, e.g.: split, merge e. Func(onal descrip(on f. Whether you would like for your MOD curator to check the annota(on g. Whether part of your gene is on a different scaffold.
![Page 50: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/50.jpg)
50
TRAINING CURATORSa little training goes a long way!
Provided with adequate tools, wet lab scien(sts make excep(onal curators who can easily learn to maximize the genera(on of accurate, biologically supported gene models.
APOLLO
![Page 51: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/51.jpg)
Conozcamos a Apollo hAp://genomearchitect.org/web_apollo_user_guide
![Page 52: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/52.jpg)
52
Apolloámbito de edición para anotaciones
BECOMING ACQUAINTED WITH APOLLO
Color por marco de lectura, alternar cadena, cambiar esquema de color, resaltador
Cargar evidencia experimental (GFF3, BAM, BigWig), pistas de datos de combinación y búsqueda.
Interrogar el genoma usando BLAT.
Navegación y zoom.
Buscar un gen o un grupo
Obtener coordenadas, y hacer zoom con “selección elás(ca”
Login
Anotaciones creadas por el usuario Panel del
curador
Pistas de datos de evidencia
Datos transcriptómicos de estadío y (po celular específicos.
![Page 53: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/53.jpg)
¡Ahora juguemos!
![Page 54: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/54.jpg)
Instrucciones 54 | 54
APOLLO EN LA WEBinstrucciones
Email: [email protected]
Contraseña:
nombreapellido
Email Contraseña Servidor Empezar en [email protected] userone 1 1 [email protected] usertwo 2 1 [email protected] userthree 3 1 [email protected] userfour 4 1 [email protected] userfive 5 1 [email protected] usersix 1 7 [email protected] userseven 2 7 [email protected] usereight 3 7 [email protected] usernine 4 7 [email protected] userten 5 7 [email protected] usereleven 1 1 [email protected] usertwelve 2 1 [email protected] userthirteen 3 1 [email protected] userfourteen 4 1 [email protected] userfioeen 5 1 [email protected] usersixteen 1 7 [email protected] userseventeen 2 7 [email protected] usereighteen 3 7 [email protected] usernineteen 4 7 [email protected] usertwenty 5 7 [email protected] usertwentyone 1 1 [email protected] usertwentytwo 2 1 [email protected] usertwentythree 3 1 [email protected] usertwentyfour 4 1 [email protected] usertwentyfive 5 1 [email protected] usertwentysix 1 7 [email protected] usertwentyseven 2 7 [email protected] usertwentyeight 3 7 [email protected] usertwentynine 4 7
Servidor URL 1 hAp://54.94.132.228:8080/apollo/annotator/index 2 hAp://54.207.71.112:8080/apollo/annotator/index 3 hAp://54.207.106.136:8080/apollo/annotator/index 4 hAp://54.207.113.253:8080/apollo/annotator/index 5 hAp://54.232.217.84:8080/apollo/annotator/index
![Page 55: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/55.jpg)
Funcionalidad y navegación.
![Page 56: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/56.jpg)
56
Apollofuncionalidad
BECOMING ACQUAINTED WITH APOLLO
• Agregar: – IDs de bases de datos públicas (e.g. GenBank, usando DBXRef); símbolo(s) de cada gen, nombre(s) común(es), sinónimos, el mejor resultado de BLAST, ortólogos con el nombre de la especie.
– Asignaciones de función apropiadas (e.g. via datos de RNA-‐Seq, búsquedas de literatura, búsquedas con HMMs, etc.)
– Comentarios acerca de las modificaciones que se realizaron, o si ninguna fue necesaria.
– Y otras notas que se le ocurran al biocurador.
• Corregir si(os de ‘Inicio’ y ‘Parada’ • Arreglar si(os de ayuste no canónicos • Anotar UTRs (e.g.: usando RNA-‐Seq) • Obtener & corregir predicciones de
productos de proteínas -‐ Alinearlos con genes o familias de genes relevantes. -‐ Usar blastp en RefSeq o nr de NCBI
• Revisar la falta de datos en el ensamble • Unir 2 predicciones de genes en el
mismo grupo • Dividir una predicción de gen • Corregir desplazamientos de la pauta
de lectura, y otros errores en el ensamblaje
• Anotar selenocisteínas, errores de una sola base, etc.
![Page 57: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/57.jpg)
REMOVABLE SIDE DOCKwith customizable tabs
HIGHLIGHTED IMPROVEMENTS 57
Annotations Organism Users Groups Admin Tracks Reference Sequence
![Page 58: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/58.jpg)
EDITS & EXPORTSannotation details, exon boundaries, data export
HIGHLIGHTED IMPROVEMENTS 58
1 2
Annotations
1
2
![Page 59: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/59.jpg)
HIGHLIGHTED IMPROVEMENTS 59
Reference Sequences
3
FASTA
GFF3
EDITS & EXPORTSannotation details, exon boundaries, data export
3
![Page 60: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/60.jpg)
60 | 60 Becoming Acquainted with Web Apollo.
USER NAVIGATION
Annotator panel.
• Choose appropriate evidence tracks from list on annotator panel. • Select & drag elements from evidence track into the ‘User-created Annotations’ area. • Hovering over annotation in progress brings up an information pop-up.
![Page 61: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/61.jpg)
61 | 61
USER NAVIGATION
Becoming Acquainted with Web Apollo.
• Annotation right-click menu
![Page 62: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/62.jpg)
62
Annota(ons, annota(on edits, and History: stored in a centralized database.
62
USER NAVIGATION
Becoming Acquainted with Web Apollo.
![Page 63: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/63.jpg)
63
The Annota(on Informa=on Editor
DBXRefs are database crossed references: if you have reason to believe that this gene is linked to a gene in a public database (including your own), then add it here.
63
USER NAVIGATION
Becoming Acquainted with Web Apollo.
![Page 64: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/64.jpg)
64
The Annota(on Informa=on Editor
• Add PubMed IDs • Include GO terms as appropriate
from any of the three ontologies • Write comments sta(ng how you
have validated each model.
64
USER NAVIGATION
Becoming Acquainted with Web Apollo.
![Page 65: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/65.jpg)
65 | 65
USER NAVIGATION
Becoming Acquainted with Web Apollo.
• ‘Zoom to base level’ op(on reveals the DNA Track.
![Page 66: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/66.jpg)
66 | 66
USER NAVIGATION
Becoming Acquainted with Web Apollo.
• Color exons by CDS from the ‘View’ menu.
![Page 67: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/67.jpg)
67 |
Zoom in/out with keyboard: shio + arrow keys up/down
67
USER NAVIGATION
Becoming Acquainted with Web Apollo.
• Toggle reference DNA sequence and transla=on frames in forward strand. Toggle models in either direc(on.
![Page 68: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/68.jpg)
Anotación
![Page 69: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/69.jpg)
casos simples
![Page 70: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/70.jpg)
“Simple case”: -‐ the predicted gene model is correct or nearly correct, and
-‐ this model is supported by evidence that completely or mostly agrees with the predic(on.
-‐ evidence that extends beyond the predicted model is assumed to be non-‐coding sequence.
The following are simple modifica(ons.
70 | 70
ANNOTATING SIMPLE CASES
Becoming Acquainted with Web Apollo. SIMPLE CASES
![Page 71: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/71.jpg)
71 |
• A confirma(on box will warn you if the receiving transcript is not on the same strand as the feature where the new exon originated.
• Check ‘Start’ and ‘Stop’ signals aoer each edit.
71
ADDING EXONS
Becoming Acquainted with Web Apollo. SIMPLE CASES
![Page 72: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/72.jpg)
If transcript alignment data are available and extend beyond your original annota(on, you may extend or add UTRs.
1. Right click at the exon edge and ‘Zoom to base level’.
2. Place the cursor over the edge of the exon un1l it becomes a black arrow then click and drag the edge of the exon to the new coordinate posi(on that includes the UTR.
72 |
To add a new spliced UTR to an exis(ng annota(on follow the procedure for adding an exon.
72
ADDING UTRs
Becoming Acquainted with Web Apollo. SIMPLE CASES
![Page 73: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/73.jpg)
1. Zoom in to clearly resolve each exon as a dis(nct rectangle.
2. Two exons from different tracks sharing the same start and/or end coordinates will display a red bar to indicate matching edges.
3. Selec(ng the whole annota(on or one exon at a (me, use this ‘edge-‐matching’ func(on and scroll along the length of the annota(on, verifying exon boundaries against available data. Use square [ ] brackets to scroll from exon to exon.
4. Check if cDNA / RNAseq reads lack one or more of the annotated exons or include addi(onal exons.
73 | 73
CHECK EXON INTEGRITY
Becoming Acquainted with Web Apollo. SIMPLE CASES
![Page 74: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/74.jpg)
To modify an exon boundary and match data in the evidence tracks: select both the offending exon and the feature with the expected boundary, then right click on the annota(on to select ‘Set 3’ end’ or ‘Set 5’ end’ as appropriate.
74 |
In some cases all the data may disagree with the annota(on, in other cases some data support the annota(on and some of the
data support one or more alterna(ve transcripts. Try to annotate as many alterna(ve transcripts as are well supported by the data.
74
EXON STRUCTURE INTEGRITY
Becoming Acquainted with Web Apollo. SIMPLE CASES
![Page 75: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/75.jpg)
Flags non-‐canonical splice sites.
Selec(on of features and sub-‐features
Edge-‐matching
Evidence Tracks Area
‘User-‐created Annota(ons’ Track
Apollo’s edi(ng logic (brain): § selects longest ORF as CDS § flags non-‐canonical splice sites
75
ORFs AND SPLICE SITES
Becoming Acquainted with Web Apollo. SIMPLE CASES
![Page 76: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/76.jpg)
76 |
Exon/intron junc(on possible error
Original model
Curated model
Non-‐canonical splices are indicated by an orange circle with a white exclama(on point inside, placed over the edge of the offending exon.
Canonical splice sites:
3’-‐…exon]GA / TG[exon…-‐5’
5’-‐…exon]GT / AG[exon…-‐3’ reverse strand, not reverse-‐complemented:
forward strand
76
SPLICE SITES
Becoming Acquainted with Web Apollo. SIMPLE CASES
Zoom to review non-‐canonical splice site warnings. Although these may not always have to be corrected (e.g GC donor), they should be flagged with the appropriate comment.
![Page 77: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/77.jpg)
Web Apollo calculates the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons.
If ‘Start’ appears to be incorrect, modify it by selec(ng an in-‐frame ‘Start’ codon further up or downstream, depending on evidence (protein database, addi(onal evidence tracks).
It may be present outside the predicted gene model, within a region supported by another evidence track.
In very rare cases, the actual ‘Start’ codon may be non-‐canonical (non-‐ATG).
77 | 77
‘START’ AND ‘STOP’ SITES
Becoming Acquainted with Web Apollo. SIMPLE CASES
![Page 78: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/78.jpg)
complex cases
![Page 79: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/79.jpg)
Evidence may support joining two or more different gene models. Warning: protein alignments may have incorrect splice sites and lack non-‐conserved regions!
1. In ‘User-‐created Annota=ons’ area shio-‐click to select an intron from each gene model and right click to select the ‘Merge’ op(on from the menu.
2. Drag suppor(ng evidence tracks over the candidate models to corroborate overlap, or review edge matching and coverage across models.
3. Check the resul(ng transla(on by querying a protein database e.g. UniProt. Add comments to record that this annota(on is the result of a merge.
79 | 79
Red lines around exons: ‘edge-‐matching’ allows annotators to confirm whether the evidence is in agreement without examining each exon at the base level.
COMPLEX CASES merge two gene predictions on the same scaffold
Becoming Acquainted with Web Apollo. COMPLEX CASES
![Page 80: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/80.jpg)
One or more splits may be recommended when: -‐ different segments of the predicted protein align to two or more different gene families -‐ predicted protein doesn’t align to known proteins over its en(re length
Transcript data may support a split, but first verify whether they are alterna(ve transcripts.
80 | 80
COMPLEX CASES split a gene prediction
Becoming Acquainted with Web Apollo. COMPLEX CASES
![Page 81: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/81.jpg)
DNA Track
‘User-‐created Annota=ons’ Track
81
COMPLEX CASES correcting frameshifts and single-base errors
Becoming Acquainted with Web Apollo. COMPLEX CASES
Always remember: when annota(ng gene models using Apollo, you are looking at a ‘frozen’ version of the genome assembly and you will not be able to modify the assembly itself.
![Page 82: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/82.jpg)
82
COMPLEX CASES correcting selenocysteine containing proteins
Becoming Acquainted with Web Apollo. COMPLEX CASES
![Page 83: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/83.jpg)
83
COMPLEX CASES correcting selenocysteine containing proteins
Becoming Acquainted with Web Apollo. COMPLEX CASES
![Page 84: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/84.jpg)
1. Apollo allows annotators to make single base modifica(ons or frameshios that are reflected in the sequence and structure of any transcripts overlapping the modifica(on. These manipula(ons do NOT change the underlying genomic sequence.
2. If you determine that you need to make one of these changes, zoom in to the nucleo(de level and right click over a single nucleo(de on the genomic sequence to access a menu that provides op(ons for crea(ng inser(ons, dele(ons or subs(tu(ons.
3. The ‘Create Genomic Inser=on’ feature will require you to enter the necessary string of nucleo(de residues that will be inserted to the right of the cursor’s current loca(on. The ‘Create Genomic Dele=on’ op(on will require you to enter the length of the dele(on, star(ng with the nucleo(de where the cursor is posi(oned. The ‘Create Genomic Subs=tu=on’ feature asks for the string of nucleo(de residues that will replace the ones on the DNA track.
4. Once you have entered the modifica(ons, Apollo will recalculate the corrected transcript and protein sequences, which will appear when you use the right-‐click menu ‘Get Sequence’ op(on. Since the underlying genomic sequence is reflected in all annota(ons that include the modified region you should alert the curators of your organisms database using the ‘Comments’ sec(on to report the CDS edits.
5. In special cases such as selenocysteine containing proteins (read-‐throughs), right-‐click over the offending/premature ‘Stop’ signal and choose the ‘Set readthrough stop codon’ op(on from the menu.
84 | 84 Becoming Acquainted with Web Apollo. COMPLEX CASES
COMPLEX CASES correcting frameshifts, single-base errors, and selenocysteines
![Page 85: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/85.jpg)
Follow the checklist un(l you are happy with the annota(on!
And remember to… – comment to validate your annota(on, even if you made no changes to an exis(ng model. Think of comments as your vote of confidence.
– or add a comment to inform the community of unresolved issues you think this model may have.
85 | 85
Always Remember: Web Apollo cura(on is a community effort so please use comments to communicate the reasons for your
annota(on (your comments will be visible to everyone).
COMPLETING THE ANNOTATION
Becoming Acquainted with Web Apollo.
![Page 86: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/86.jpg)
Checklist
![Page 87: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/87.jpg)
1. Can you add UTRs (e.g.: via RNA-‐Seq)?
2. Check exon structures
3. Check splice sites: most splice sites display these residues …]5’-‐GT/AG-‐3’[…
4. Check ‘Start’ and ‘Stop’ sites
5. Check the predicted protein product(s) – Align it against relevant genes/gene family. – blastp against NCBI’s RefSeq or nr
6. If the protein product s(ll does not look correct then check: – Are there gaps in the genome? – Merge of 2 gene predic(ons on the same scaffold
– Merge of 2 gene predic(ons from different scaffolds
– Split a gene predic(on – FrameshiYs
– error in the genome assembly? – Selenocysteines, single-‐base errors, etc
87 | 87
7. Finalize annota(on by adding: – Important project informa(on in the form of
comments – IDs from public databases e.g. GenBank (via
DBXRef), gene symbol(s), common name(s), synonyms, top BLAST hits, orthologs with species names, and everything else you can think of, because you are the expert.
– Whether your model replaces one or more models from the official gene set (so it can be deleted).
– The kinds of changes you made to the gene model of interest, if any.
– Any appropriate func(onal assignments of interest to the community (e.g. via BLAST, RNA-‐Seq data, literature searches, etc.)
THE CHECKLIST for accuracy and integrity
MANUAL ANNOTATION CHECKLIST
![Page 88: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/88.jpg)
Example
![Page 89: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/89.jpg)
Example
Example 89
A public Apollo Demo using the Honey Bee genome is available at hAp://genomearchitect.org/WebApolloDemo
-‐ Demonstra(on using the Hyalella azteca genome (amphipod crustacean).
![Page 90: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/90.jpg)
What do we know about this genome?
• Currently publicly available data at NCBI: • >37,000 nucleo(de seqsà scaffolds, mitochondrial genes • 300 amino acid seqsà mitochondrion • 53 ESTs • 0 conserved domains iden(fied • 0 “gene” entries submiAed
• Data at i5K Workspace@NAL (annota(on hosted at USDA) -‐ 10,832 scaffolds: 23,288 transcripts: 12,906 proteins
Example 90
![Page 91: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/91.jpg)
PubMed Search: what’s new?
Example 91
![Page 92: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/92.jpg)
PubMed Search: what’s new?
Example 92
“Ten popula(ons (3 cultures, 7 from California water bodies) differed by at least 550-‐fold in sensi=vity to pyrethroids.”
“By sequencing the primary pyrethroid target site, the voltage-‐gated sodium channel (vgsc), we show that point muta(ons and their spread in natural popula(ons were responsible for differences in pyrethroid sensi(vity.”
“The finding that a non-‐target aqua(c species has acquired resistance to pes(cides used only on terrestrial pests is troubling evidence of the impact of chronic pes=cide transport from land-‐based applica(ons into aqua(c systems.”
![Page 93: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/93.jpg)
How many sequences for our gene of interest?
Example 93
• Para, (voltage-‐gated sodium channel alpha subunit; Nasonia vitripennis).
• NaCP60E (Sodium channel protein 60 E; D. melanogaster). – MF: voltage-‐gated ca(on channel ac(vity (IDA, GO:0022843).
– BP: olfactory behavior (IMP, GO:0042048), sodium ion transmembrane transport (ISS,GO:0035725).
– CC: voltage-‐gated sodium channel complex (IEA, GO:0001518).
And what do we know about them?
![Page 94: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/94.jpg)
Retrieving sequences for sequence similarity searches.
Example 94
>vgsc-‐Segment3-‐DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
![Page 95: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/95.jpg)
BLAT search
Example 95
>vgsc-‐Segment3-‐DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
![Page 96: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/96.jpg)
BLAT search
Example 96
![Page 97: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/97.jpg)
Customizations: high-scoring segment pairs (hsp) in “BLAST+ Results” track
Example 97
![Page 98: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/98.jpg)
Creating a new gene model: drag and drop
Example 98
• Apollo automatically calculates ORF. In this case, ORF includes the high-scoring segment pairs (hsp).
![Page 99: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/99.jpg)
Available Tracks
Example 99
![Page 100: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/100.jpg)
Get Sequence
Example 100
http://blast.ncbi.nlm.nih.gov/Blast.cgi
![Page 101: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/101.jpg)
Also, flanking sequences (other gene models) vs. NCBI nr
Example 101
In this case, two gene models upstream, at 5’ end.
BLAST hsps
![Page 102: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/102.jpg)
Review alignments
Example 102
HaztTmpM006234
HaztTmpM006233
HaztTmpM006232
![Page 103: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/103.jpg)
Hypothesis for vgsc gene model
Example 103
![Page 104: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/104.jpg)
Editing: merge the three models
Example 104
Merge by dropping an exon or gene model onto another.
Merge by selec(ng two exons (holding down “Shio”) and using the right click menu.
or…
![Page 105: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/105.jpg)
Editing: correct boundaries, delete exons
Example 105
Modify exon / intron boundary: -‐ Drag the end of the
exon to the nearest canonical splice site.
-‐ Use right-‐click menu.
Delete first exon from HaztTmpM006233
![Page 106: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/106.jpg)
Editing: set translation start
Example 106
![Page 107: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/107.jpg)
Editing: modify boundaries
Example 107
Modify intron / exon boundary also at coord. 78,999.
![Page 108: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/108.jpg)
Finished model
Example 108
Corroborate integrity and accuracy of the model: -‐ Start and Stop -‐ Exon structure and splice sites …]5’-‐GT/AG-‐3’[… -‐ Check the predicted protein product vs. NCBI nr
![Page 109: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/109.jpg)
Information Editor
• DBXRefs: e.g. NP_001128389.1, N. vitripennis, RefSeq
• PubMed iden(fier: PMID: 24065824
• Gene Ontology IDs: GO:0022843, GO:0042048, GO:0035725, GO:0001518.
• Comments.
• Name, Symbol.
• Approve / Delete radio buAon.
Example 109
Comments (if applicable)
![Page 110: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/110.jpg)
Video demostración
![Page 111: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/111.jpg)
APOLLOdemonstration
DEMO 111
Apollo demo video available at: hAps://youtu.be/VgPtAP_fvxY
![Page 112: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/112.jpg)
CONTENIDO
Web Apollo Collabora(ve Cura(on and Interac(ve Analysis of Genomes
112 OUTLINE
• BIO-‐REFRESHER conceptos que neceistamos
• ANOTACION predicciones automá(cas
• ANOTACION MANUAL necesaria, en colaboración
• APOLLO
avanzando la curación en colaboración • EJEMPLO
demonstraciones
• EJERCICIOS
![Page 113: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/113.jpg)
Ejercicios
![Page 114: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/114.jpg)
![Page 115: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/115.jpg)
Exercises Live Demonstra(on using the Apis mellifera genome.
115
1. Evidence in support of protein coding gene models. 1.1 Consensus Gene Sets: Official Gene Set v3.2 Official Gene Set v1.0 1.2 Consensus Gene Sets comparison: OGSv3.2 genes that merge OGSv1.0 and RefSeq genes OGSv3.2 genes that split OGSv1.0 and RefSeq genes 1.3 Protein Coding Gene Predic=ons Supported by Biological Evidence: NCBI Gnomon Fgenesh++ with RNASeq training data Fgenesh++ without RNASeq training data NCBI RefSeq Protein Coding Genes and Low Quality Protein Coding Genes
1.4 Ab ini,o protein coding gene predic=ons: Augustus Set 12, Augustus Set 9, Fgenesh, GeneID, N-‐SCAN, SGP2 1.5 Transcript Sequence Alignment: NCBI ESTs, Apis cerana RNA-‐Seq, Forager Bee Brain Illumina Con(gs, Nurse Bee Brain Illumina Con(gs, Forager RNA-‐Seq reads, Nurse RNA-‐Seq reads, Abdomen 454 Con(gs, Brain and Ovary 454 Con(gs, Embryo 454 Con(gs, Larvae 454 Con(gs, Mixed Antennae 454 Con(gs, Ovary 454 Con(gs Testes 454 Con(gs, Forager RNA-‐Seq HeatMap, Forager RNA-‐Seq XY Plot, Nurse RNA-‐Seq HeatMap, Nurse RNA-‐Seq XY Plot
Becoming Acquainted with Web Apollo.
![Page 116: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/116.jpg)
Exercises Live Demonstra(on using the Apis mellifera genome.
116
1. Evidence in support of protein coding gene models (Con=nued). 1.6 Protein homolog alignment: Acep_OGSv1.2 Aech_OGSv3.8 Cflo_OGSv3.3 Dmel_r5.42 Hsal_OGSv3.3 Lhum_OGSv1.2 Nvit_OGSv1.2 Nvit_OGSv2.0 Pbar_OGSv1.2 Sinv_OGSv2.2.3 Znev_OGSv2.1 Metazoa_Swissprot
2. Evidence in support of non protein coding gene models 2.1 Non-‐protein coding gene predic=ons: NCBI RefSeq Noncoding RNA NCBI RefSeq miRNA 2.2 Pseudogene predic=ons: NCBI RefSeq Pseudogene
Becoming Acquainted with Web Apollo.
![Page 117: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/117.jpg)
Instrucciones 117 | 117
APOLLO EN LA WEBinstrucciones
Servidor URL 1 hAp://54.94.132.228:8080/apollo/annotator/index 2 hAp://54.94.132.228:8080/apollo/annotator/index 3 hAp://54.94.132.228:8080/apollo/annotator/index 4 hAp://54.94.132.228:8080/apollo/annotator/index 5 hAp://54.94.132.228:8080/apollo/annotator/index
Email: [email protected]
Contraseña:
nombreapellido
Email Contraseña Servidor Empezar en [email protected] userone 1 1 [email protected] usertwo 2 1 [email protected] userthree 3 1 [email protected] userfour 4 1 [email protected] userfive 5 1 [email protected] usersix 1 7 [email protected] userseven 2 7 [email protected] usereight 3 7 [email protected] usernine 4 7 [email protected] userten 5 7 [email protected] usereleven 1 1 [email protected] usertwelve 2 1 [email protected] userthirteen 3 1 [email protected] userfourteen 4 1 [email protected] userfioeen 5 1 [email protected] usersixteen 1 7 [email protected] userseventeen 2 7 [email protected] usereighteen 3 7 [email protected] usernineteen 4 7 [email protected] usertwenty 5 7 [email protected] usertwentyone 1 1 [email protected] usertwentytwo 2 1 [email protected] usertwentythree 3 1 [email protected] usertwentyfour 4 1 [email protected] usertwentyfive 5 1 [email protected] usertwentysix 1 7 [email protected] usertwentyseven 2 7 [email protected] usertwentyeight 3 7 [email protected] usertwentynine 4 7
![Page 118: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/118.jpg)
Thank you. 118
• Berkeley Bioinforma=cs Open-‐source Projects (BBOP), Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).
• § Chris1ne G. Elsik (PI). University of Missouri.
• * Ian Holmes (PI). University of California Berkeley.
• Arthropod genomics community: i5K Steering CommiAee (esp. Sue Brown (Kansas State)), Alexie Papanicolaou (UWS), and the Honey Bee Genome Sequencing Consor(um.
• Stephen Ficklin GenSAS Washington State University
• Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI. Both projects are also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-‐AC02-‐05CH11231
•
• For your a"en=on, thank you!
Apollo
Nathan Dunn
Colin Diesh §
Deepak Unni §
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Apollo: hAp://GenomeArchitect.org
GO: hAp://GeneOntology.org
i5K: hAp://arthropodgenomes.org/wiki/i5K
¡Gracias!
NAL at USDA
Monica Poelchau
Christopher Childers
Gary Moore
HGSC at BCM
fringy Richards
Kim Worley
JBrowse Eric Yao *
![Page 119: Apolo Taller en BIOS](https://reader031.vdocument.in/reader031/viewer/2022020410/58ef784e1a28abe41e8b45f1/html5/thumbnails/119.jpg)