apollo: scalable & collaborative curation of genomes - biocuration 2015

Post on 17-Jul-2015

120 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

APOLLO: Scalable and collaborative genome curation

Monica Munoz-Torres, PhD | @monimunoztoNathan Dunn, Colin Diesh*, Deepak Unni*, Seth Carbon, Heiko Dietze, Christopher Mungall, Nicole Washington, Ian Holmes*, Christine Elsik*, and Suzanna E. LewisBerkeley Bioinformatics Open-Source ProjectsGenomics Division, Lawrence Berkeley National Laboratory8th International Biocuration Conference. Beijing, China. 24 April, 2015

OUTLINE

•  LAST  TIME  where  we  le.  off  last  year  

 •  IMPROVEMENTS  

architecture,  scalability,  features    •  COLLABORATIONS  

JBrowse  &  GenSAS    •  FUTURE  PLANS  

what  lies  on  the  horizon  

Apollo  Scalable  and  CollaboraJve    Genome  CuraJon  

2 OUTLINE

APOLLOgenome annotation editing tool

3

v  Web  based,  integrated  with  JBrowse.  v  Supports  real  Jme  collaboraJon!  v  AutomaJc  generaJon  of  ready-­‐made  computable  data.    v  Supports  annotaJon  of  genes,    pseudogenes,  tRNAs,  snRNAs,  

snoRNAs,  ncRNAs,  miRNAs,  TEs,  and  repeats.  v  IntuiJve  annotaJon,  gestures,  and  pull-­‐down  menus  to  create  and  

edit  transcripts  and  exons  structures,  insert  comments  (CV,  freeform  text),  GO  terms,  etc.  

INTRODUCTION

DETAILS FROM OUR LAST UPDATE

•  ~  100  insJtuJons  worldwide    •  >  60  genomes  across  the  tree  of  life:    

•  from  plants  to  arthropods,  to  fungi,    to  fish  and  other  vertebrates  including  human,  bovine  ca]le,  and  dog  

PREVIOUSLY WE LEARNED 4

©BroadInsJtute.org    

Nature Rev Gen 2009

©alexanderwild.com

©alexanderwild.com

©outdooralabama.com

National Agricultural Library

LESSONS WE HAVE LEARNED

What  we  have  learned:    •  CollaboraJve  work  disJlls  invaluable  knowledge  •  We  must  enforce  strict  rules  and  formats  •  We  must  evolve  with  the  data  •  A  li]le  training  goes  a  long  way  •  NGS  poses  addiJonal  challenges  

PREVIOUSLY WE LEARNED 5

HIGHLIGHTED IMPROVEMENTSscalability

SCALABILITY 6

•  Easier  deployment,  more  detailed  documentaJon  

•  Supports  mulJple  organisms  per  server,  improved  comparaJve  tools  

•  Easier  to  query  the  data  and  build  extensions    •  More  flexible  user  interface  via  removable  side-­‐dock  with  customizable  tabs;  

be]er  search  funcJonality,  validaJon  checks,  and  ediJng  capability    •  Allows  larger  set  of  sequence  annotaJons  based  on  the  Sequence  Ontology  

•  Offers  fine-­‐grained  user  and  group  level  permissions  

NEW APOLLO ARCHITECTUREsimpler, more flexible

ARCHITECTURE 7

Web-­‐based  client  +  annotaJon-­‐ediJng  engine  +  server-­‐side  data  service  

REST / JSON Websockets

Annotation Engine (Server)

Shiro

LDAP

OAuth

JBrowse Data Organism 2

Annotations

Security

Preferences

Organisms

Tracks

BAM BED VCF GFF3 BigWig

Annotators

Google Web Toolkit (GWT) / Bootstrap

JBrowse DOJO / jQuery JBrowse Data Organism 1

Load genomic evidence for selected organism

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

Apollo v2.0

NEW APOLLO ARCHITECTUREsimpler, more flexible

ARCHITECTURE 8

REST / JSON Websockets

Annotation Engine (Server)

Shiro

LDAP

OAuth

JBrowse Data Organism 2

Annotations

Security

Preferences

Organisms

Tracks

BAM BED VCF GFF3 BigWig

Annotators

Google Web Toolkit (GWT) / Bootstrap

JBrowse DOJO / jQuery JBrowse Data Organism 1

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

Apollo v2.0

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

   

Grails controllers (J2EE servlet) route requests to the appropriate JBrowse data directory for a given organism NEW!

Load genomic evidence for selected organism

NEW APOLLO ARCHITECTUREsimpler, more flexible

ARCHITECTURE 9

REST / JSON Websockets

Annotation Engine (Server)

Shiro

LDAP

OAuth

JBrowse Data Organism 2

Annotations

Security

Preferences

Organisms

Tracks

BAM BED VCF GFF3 BigWig

Annotators

Google Web Toolkit (GWT) / Bootstrap

JBrowse DOJO / jQuery JBrowse Data Organism 1

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

Apollo v2.0

Load genomic evidence for selected organism

Single Data Store PostgreSQL, MySQL,

MongoDB, ElasticSearch

A single, queryable datastore houses annotations NEW!

Apollo v2.0

HIGHLIGHTED IMPROVEMENTSscalability

SCALABILITY 10

•  Improvements  to  architecture:  easier  deployment,  be]er  documentaJon  

•  Supports  mulJple  organisms  per  server,  improved  comparaJve  tools  

•  Easier  to  query  the  data  and  build  extensions    •  More  flexible  user  interface  via  removable  side-­‐dock  with  customizable  tabs;  

be]er  search  funcJonality,  validaJon  checks,  and  ediJng  capability    •  Allows  larger  set  of  sequence  annotaJons  based  on  the  Sequence  Ontology    •  Offers  fine-­‐grained  user  and  group  level  permissions  

HIGHLIGHTED IMPROVEMENTSremovable side dock with customizable tabs

HIGHLIGHTED IMPROVEMENTS 11

Tracks Organism Users Groups Preferences Annotations Reference Sequence

HIGHLIGHTED IMPROVEMENTSannotation details, exon boundaries, data export

HIGHLIGHTED IMPROVEMENTS 12

Annotations Reference Sequences

1 2 3

1

2

3

HIGHLIGHTED IMPROVEMENTSvisible in the Apollo window

HIGHLIGHTED IMPROVEMENTS 13

AutomaJcally  calculates  upstream  and  downstream  acceptor  and  donor  sites.  

OTHER IMPROVEMENTSbehind the scenes

OTHER IMPROVEMENTS 14

h]ps://github.com/GMOD/Apollo  

APOLLOdemonstration

DEMO 15

See  Apollo  DemonstraJon  Video  at:  h]ps://youtu.be/VgPtAP_fvxY      

COLLABORATIONSApollo is open-source and extensible

HIGHLIGHTED IMPROVEMENTS 16

The Genome Sequence Annotation Server (GenSAS) Annotate

Examples:    •  GenSAS    

whole-­‐genome  structural  annotaJon  pipeline.  

•  i5K  Workspace@NAL  space  to  display  and  share  genome  assemblies  &  gene  models,  and  conduct  manual  annotaJon  efforts.  

Apollo  users  can  add  so.ware  to  support  their  own  workflow.  

FUTURE PLANScurrently working on

Footer 17

JOIN US

Footer 18

h]p://GenomeArchitect.org/  

Nathan  Dunn    Apollo  Technical  Lead  

Please  bring  your  suggesJons,  requests,  and  contribuJons  to:  

Special  Thanks  to:  Stephen  Ficklin  

GenSAS,  Washington  State  University    

Deepak  Unni  Colin  Diesh  

Apollo  Developers,    University  of  Missouri  

Suzi  Lewis  Principal  InvesJgator  

BBOP  

Eric  Yao  JBrowse,  UC  Berkeley  

•  Berkeley  Bioinforma9cs  Open-­‐source  Projects  (BBOP),  Berkeley  Lab:  Web  Apollo  and  Gene  Ontology  teams.  Suzanna  E.  Lewis  (PI).  

•  §  Chris5ne  G.  Elsik  (PI).  University  of  Missouri.    

•  *  Ian  Holmes  (PI).  University  of  California  Berkeley.  

•  Arthropod  genomics  community:  i5K  Steering  Commi]ee  (esp.  Sue  Brown  (Kansas  State)),  Alexie  Papanicolaou  (UWS),  BGI,  Oliver  Niehuis  (1KITE  h]p://www.1kite.org/),  and  the  Honey  Bee  Genome  Sequencing  ConsorJum.  

•  Apollo  is  supported  by  NIH  grants  5R01GM080203  from  NIGMS,  and  5R01HG004483  from  NHGRI;  by  Contract  No.  60-­‐8260-­‐4-­‐005  from  the  NaJonal  Agricultural  Library  (NAL)  at  the  United  States  Department  of  Agriculture  (USDA);  and  by  the  Director,  Office  of  Science,  Office  of  Basic  Energy  Sciences,  of  the  U.S.  Department  of  Energy  under  Contract  No.  DE-­‐AC02-­‐05CH11231.  

•  Insect  images  used  with  permission:  h]p://AlexanderWild.com  

•  For  your  aAen9on,  thank  you!  Thank you. 19

Web  Apollo  

Nathan  Dunn  

Colin  Diesh  §  

Deepak  Unni  §    

 

Gene  Ontology  

Chris  Mungall  

Seth  Carbon  

Heiko  Dietze  

 

BBOP  

Web  Apollo:  h]p://GenomeArchitect.org    

i5K:  h]p://arthropodgenomes.org/wiki/i5K  

GO:  h]p://GeneOntology.org  

Thanks!  

NAL  at  USDA  

Monica  Poelchau  

Christopher  Childers  

Gary  Moore  

HGSC  at  BCM  

fringy  Richards  

Dan  Hughes  

Kim  Worley  

 

JBrowse          Eric  Yao  *  

top related