ensembl and ena · kethi reddy, stephane rivière, marc rosello, alexander senf, dimitriy smirnov,...

24
Denise Carvalho-Silva Ensembl Outreach Team On behalf of Ensembl and ENA teams European Molecular Biology Laboratories Euroepan Bioinformatics Institute SME Bioinformatics Forum Barcelona 8-9 October 2012 Ensembl and ENA High level overview and use cases

Upload: others

Post on 18-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Denise Carvalho-Silva Ensembl Outreach Team

On behalf of Ensembl and ENA teams

European Molecular Biology Laboratories

Euroepan Bioinformatics Institute

SME Bioinformatics Forum

Barcelona 8-9 October 2012

Ensembl and ENA High level overview and use cases

Page 2: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Outline

Ensembl project: background and goals

Data available

Data access and Ensembl tools

Use cases: Ensembl and ENA

Ensembl Outreach and Support

Acknowledgements

Page 3: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Ensembl project

Launched in 1999: before the release of the

draft of the human genome

Joint project between the EBI and WTSI

launched in March 2000 www.ensembl.org

Page 4: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Goals Provide comprehensive annotation of genomes

Integrate the annotation with other biological data

Make them all publicly available

+ many more

Page 5: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Ensembl: an integration point

66 vertebrate genomes Release 68 July 2012

Page 6: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Extends the use of Ensembl to other species

Wider taxonomic range (v15, 354 genomes)

6

Annotation of non-vertebrate genomes

launched in 2009 www.ensemblgenomes.org

Page 7: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Data available in Ensembl 68

• Gene annotation for 66 vetebrate species

• Variation data for 19 species

• Comparative Genomics data for 69 species

• Regulation data for 16 species

Page 8: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Data access: browser sites

www.ensembl.org

pre.ensembl.org

archive.ensembl.org

Page 9: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Data access: BioMart

• web interface to export Ensembl data

• no programming skills required

DATASET

FILTER ATTRIBUTES

RESULTS

www.ensembl.org/biomart/martview

Page 10: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

BioMart results

Tables/sequences

Export/email

Page 11: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Data access: APIs and FTP

• Ensembl Database (open source): Perl-API, MySQL

http://www.ensembl.org/info/data/ftp/index.html

• FTP download site

http://www.ensembl.org/info/docs/api/index.html

Page 12: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Ensembl Tools h

ttp

://

ww

w.e

nsem

bl.

org

/to

ols

.htm

l

Assembly converter

ID history converter

Virtual Machine

Region Report

Variant Effect Predictor

Page 13: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Gene annotation

• Automatic pipeline

Genome-wide determination

• Manual curation

Gene determination on a case-by-case by an

annotator

+ 63 species

+ gene lists 5 species

Page 14: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Ensembl (20_)

Havana (00_)

Merged (“gold”)

Havana (00_)

Gene annotation on the browser

• Merged (“gold”) gene set: identical annotation from

Ensembl and Havana for human, mouse, zebrafish

• high confidence and quality

Exons are drawn as boxes. Filled boxes are translated (coding) exons, empty boxes are untranslated regions (UTRs).

Page 15: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Biological Evidence

• International Nucleotide Sequence databases

• Protein sequence databases

• NCBI RefSeq

• RNAseq (transcriptomic) data

Page 16: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data

Data submission

Data search/download

European Nucleotide Archive

http://www.ebi.ac.uk/ena/

Page 17: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Use case 1 - ENA

© Mo Hassan

Retrieve and browse the mitochondrial genome of the cave bear (Ursus spelaeus).

Page 18: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

I have submitted a DNA sequence to ENA and got

the ID AF489725. Can I view this ID in Ensembl?

• Which gene is associated with?

• Which chromosome is the gene found on?

• What are the neighbouring genes?

• Is there a homologue to this gene in dog?

• Find the cDNA alignment between the two genes

• Can I jump to ENA from Ensembl?

Use case 2 - Ensembl and ENA

Page 19: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Our sequencing results identified a known SNP (rs4988235) in

one of our samples in individuals from Barcelona (Spain).

• What is the major allele for this SNP? Is it the same in all

1000 Genomes super-populations?

• What is the ancestral allele? Is it conserved in vertebrates?

• Are there any phenotypes associated with this SNP?

• How many variants are associated with this phenotype?

• Which gene is associated to this phenotype?

Use case 3 - Ensembl

Page 20: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

• Course online www.ensembl.info/ecourse

• Tutorials www.ensembl.org/info/website/tutorials

• YouTube channel www.youtube.com/user/EnsemblHelpdesk

• Mailing lists [email protected], [email protected]

• Comments and questions? [email protected]

Ensembl Outreach and Support

Page 21: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Acknowledgements

Funded by the Wellcome Trust, NIH-NHGRI, EU and EMBL

Page 22: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Ensembl Team Retreat 2012 Norwich, United Kingdom

Page 23: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Clara Amid, Ewan Birney, Lawrence Bower, Ana Cerdeño-Tárraga, Ying Cheng, Iain Cleland, Nadeem Faruque, Richard Gibson, Neil Goodgame, Christopher Hunter, Mikyung Jang, Rasko Leinonen, Xin Liu, Arnaud Oisel, Nima Pakseresht, Sheila Plaister, Rajesh Radhakrishnan, Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

Acknowledgements

[email protected] [email protected]

Funded by EMBL, EU, Wellcome Trust, BBSRC

Page 24: Ensembl and ENA · Kethi Reddy, Stephane Rivière, Marc Rosello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane