1 of 42 browsing genes and genomes with ensembl maria wilbe department of animal breeding and...

44
1 of 42 Browsing Genes and Genomes Browsing Genes and Genomes with Ensembl with Ensembl QuickTime™ TIFF (Uncompres are needed to QuickTime TIFF (Uncomp are needed Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden [email protected]

Upload: brenda-kelly

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

1 of 42

Browsing Genes and Genomes Browsing Genes and Genomes with Ensemblwith Ensembl

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Maria WilbeDepartment of Animal Breeding and Genetics, SLU, Sweden

[email protected]

Page 2: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

2 of 42

Several lecture notes taken Several lecture notes taken from:from:

• Bert OverduinEnsembl User Support

EMBL Outstation

European Bioinformatics Institute

Wellcome Trust Genome Campus

Hinxton, Cambridge, UK

• Alvaro Martinez BarrioLinneaus Centre for Bioinformatics,Uppsala University, Sweden

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 3: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

3 of 42

What is EnsemblWhat is Ensembl

• A software system which produces and maintains automatic annotation on selected eukaryotic genomes.

• Perform automatic analysis of new genome data• Analysis and annotation maintained on the current data• Presentation of the analysis to all via the web • Ensembl will concentrate on vertebrate genomes, but

other groups have adapted the system for use with plant and fungal genomes

• Powered by Ensembl shows a list of projects that use Ensembl technology

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 4: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

4 of 42

Ensembl - OrganisationEnsembl - Organisation

• Joint project between European Bioinformatics Institute (EMBL-EBI) and Wellcome Trust Sanger Institute

• Started in 1999 for the Human Genome Project• Funded primarily by the Wellcome Trust, additional

funding by EMBL, EU, NIH-NIAID, BBSRC and MRC

• Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger)

• Uses the largest dedicated computer system in biology in Europe

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 5: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

5 of 42

A Bit of HistoryA Bit of History

• 1995 Haemophilus influenzae 1.8 Mb• 1996 Yeast 12 Mb• 1998 C. elegans 100 Mb• 1999 Fruit fly 125 Mb• 2000 Arabidopsis 115 Mb• 2001 Human (draft)• 2002 Mouse 2.6 Gb• 2004 Human (“finished”) 3 Gb

Sequenced genomes

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 6: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

6 of 42

Sequencing genomesSequencing genomesThe term DNA sequencing is a method for determining the order of the nucleotide bases (A,T,C,G)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 7: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

7 of 42

Ensembl genomes Ensembl genomes (Ensembl release 49 - March 2008)(Ensembl release 49 - March 2008)

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 8: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

8 of 42

Species in EnsemblSpecies in Ensembl

CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA

57

0

50

5

43

8

40

8

36

0

28

6

24

5

20

8

14

4

65

MY

BP

FISHES

BIRDSREPTILES

MAMMALS PLACENTALS

MONOTREMES

MARSUPIALS

OTHER BIRDS

PALEOGNATHS

PASSERINES

CROCODILES

TURTLES

LIZARDS

AMPHIBIANS

TELEOSTS

SHARKS

RAYS

LATIMERIA

BICHIR/POLYPTERUS

LUNGFISHES

AGNATHANS

NON-VERTEBRATESQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 9: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

9 of 42

Ensembl - GoalsEnsembl - Goals

• Provide automatic annotation of genomic sequence

• Integrate other biological data

• Make data available to all via the web

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 10: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

10 of 42

AnnotationAnnotationWikipedia:Genome annotation is the process of attaching biological

information to sequences. It consists of two main steps:

1. identifying elements on the genome, a process called Gene Finding:- ORFs and their localisation- gene structure- coding regions- location of regulatory motifs

2. attaching biological information to these elements.- biochemical function- biological function- involved regulation and interactions- expression

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 11: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

11 of 42

The big Genome BrowsersThe big Genome Browsers

• Ensembl Genome browserhttp://www.ensembl.org

• NCBI Map Viewerhttp://www.ncbi.nlm.nih.gov/mapview/

• UCSC Genome Browserhttp://genome.ucsc.edu

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 12: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

12 of 42

Ensembl / NCBI Map Viewer / Ensembl / NCBI Map Viewer / UCSCUCSC

• All allow access of multiple organisms

• All are based on same data

• Annotations are different

• Assembly versions may differ

• Some organisms specific to only a certain browser

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 13: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

13 of 42

NCBI Map Viewer - NCBI Map Viewer - Opening pageOpening page

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 14: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

14 of 42

NCBI Map Viewer - NCBI Map Viewer - Result pageResult page

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 15: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

15 of 42

UCSC Genome Browser - UCSC Genome Browser - Opening pageOpening page

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 16: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

16 of 42

UCSC Genome Browser - UCSC Genome Browser - Search pageSearch page

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 17: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

17 of 42

UCSC Genome Browser - UCSC Genome Browser - Default viewDefault view

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 18: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

18 of 42

UCSC Genome Browser - UCSC Genome Browser - OptionsOptions

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 19: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

19 of 42

UCSC Genome Browser - UCSC Genome Browser - BLAT searchBLAT search

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 20: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

20 of 42

Ensembl Genome BrowserEnsembl Genome Browser-Opening page-Opening page

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 21: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

21 of 42

Ensembl Genome BrowserEnsembl Genome Browser- Search view- Search view

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Choose human gene

Page 22: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

22 of 42

Ensembl Genome BrowserEnsembl Genome Browser- Gene view- Gene view

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 23: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

23 of 42

Ensembl Genome BrowserEnsembl Genome Browser- BLAST- BLAST

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 24: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

24 of 42

What Distinguishes Ensembl from What Distinguishes Ensembl from the UCSC and NCBI Browsers?the UCSC and NCBI Browsers?

• Automatic annotation for those species for which no manually curated gene set exists

• Direct database access and programmatic access via the Perl API

• Not only the data, but also the software source code is open source

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 25: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

25 of 42

Which Data Are Available?Which Data Are Available?• Genomic sequence• Transcript and peptide models• External references• Variation data: SNPs• Mapped cDNAs, peptides, micro array probes,

BAC clones etc.• Other features of the genome:

cytogenetic bands, markers, repeats etc.• Comparative data:

orthologues and paralogues, protein families, whole genome alignments, syntenic regions

• Regulatory data:“best guess” set of regulatory elements

• Data from external sources (DAS)QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 26: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

26 of 42

Genomic sequence

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Gene location

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 27: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

27 of 42

Genomic sequence

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Export

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 28: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

28 of 42

Transcript and peptide info

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Click to view

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 29: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

29 of 42

External references

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Click to view

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 30: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

30 of 42

Single nucleotide polymorphisms Single nucleotide polymorphisms (SNPs)(SNPs)

• Two human genomes differ by ~0.1%

• Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of people

• Most polymorphisms (~90%) take the forms of SNPs: variations that involve just one nucleotide• ~1 out of every 300 bases in the human

genome• ~10 million in the human genome

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 31: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

31 of 42

Practical ApplicationsPractical Applications

• Disease diagnosis

• Association studies

• Forensic testing

• Population genetics and evolutionary studies

• Marker-assisted selection

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 32: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

32 of 42

SNPs in Ensembl - TypesSNPs in Ensembl - Types

Non-synonymous In coding sequence, resulting in an aa changeSynonymous In coding sequence, not resulting in an aa changeFrameshift In coding sequence, resulting in a frameshiftStop lost In coding sequence, resulting in the loss of a stop codonStop gained In coding sequence, resulting in the gain of a stop codon

Essential splice site In the first 2 or the last 2 basepairs of an intronSplice site 1-3 bps into an exon or 3-8 bps into an intron

Upstream Within 5 kb upstream of the 5'-end of a transcriptRegulatory region In regulatory region annotated by Ensembl5' UTR In 5' UTRIntronic In intron3' UTR In 3' UTRDownstream Within 5 kb downstream of the 3'-end of a transcriptIntergenic More than 5 kb away from a transcript

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 33: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

33 of 42

SNPs in EnsemblSNPs in Ensembl

ContigView: SNPs in genomic context

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 34: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

34 of 42

SNPs in EnsemblSNPs in Ensembl

Page 35: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

35 of 42

Biological EvidenceBiological Evidence

• UniProt/Swiss-ProtA manually curated database and therefore of highest accuracy

• NCBI RefSeqA partially manually curated database

• UniProt/TrEMBLAutomatically annotated translations of EMBL coding sequence (CDS) features

• EMBL / GenBank / DDBJPrimary nucleotide sequence repository

All Ensembl gene predictions are based on experimental evidence:

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 36: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

36 of 42

The Ensembl GenebuildThe Ensembl Genebuild

Genome assembly

Computer programs

Experimental evidence

Ensembl Ensembl GenesGenes

+

+

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 37: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

37 of 42

Ensembl IdentifiersEnsembl Identifiers

• ENSG### Ensembl Gene ID• ENST### Ensembl Transcript ID• ENSP### Ensembl Peptide ID• ENSE### Ensembl Exon ID• ENSF### Ensembl Family ID• ENSR### Ensembl Regulatory Feature ID

• For other species than human a suffix is added:MUS for mouse (Mus musculus) : ENSMUSG###,DAR for zebrafish (Danio rerio) : ENSDARG### etc.etc.

• For imported genes Ensembl uses the original identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 38: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

38 of 42

PrPree!! and Archiv and Archivee!! Sites Sites

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 39: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

39 of 42

Powered by EnsemblPowered by Ensembl

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 40: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

40 of 42

Ensembl – Open SourceEnsembl – Open Source

• Data and software freely available

• More than 50 installs worldwide

• Academia and industry

• Local or available via the web• Mirrors with Ensembl data, e.g. http:

//ensembl.genome.tugraz.at/index.html

or user projects with own data

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 41: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

41 of 42

Ensembl AccountsEnsembl Accounts

• Personalise Ensembl by saving bookmarks, view configurations and homepage preferences in a user account

• Share bookmarks and configurations by setting up groups

Please note that all Ensembl data remains free access. It is not necessary to register in order to gain access to Ensembl data!

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 42: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

42 of 42

Website StatisticsWebsite Statistics

On average 1,000,000 page impressions / week

Top 3 species:

Top 3 countries:

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 43: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

43 of 42

What If I Need Help?What If I Need Help?

• Helpdesk:

[email protected]

• Mailing lists:

[email protected] [email protected]

• Animated tutorials

http://www.ensembl.org/common/Workshops_Online

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 44: 1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden Maria.Wilbe@hgen.slu.se

44 of 42

TodayToday

1. Ensembl: www.ensembl.org

1. WORKED EXAMPLE: A walk through the main pages of the Ensembl browser, using the EPO (Erythropoietin precursor) gene as an example (Course Homepage).

2. Ensembl Exercise: Answering questions by using Ensembl (Course Homepage).

3. If time, find information about your favorite gene by using Ensembl.