ensembl. going beyond a,t, g and c ewan birney. there is more to life than proteins (but not much)...

30
Ensembl. Going beyond A,T, G and C Ewan Birney

Upload: marcia-powers

Post on 18-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Ensembl. Going beyond A,T, G and C

Ewan Birney

Page 2: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

There is more to life than proteins(but not much)

Ensembl

ENCODE

Reactome

Page 3: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Human/Mouse

ReconcilewithGenome

Project orthologousproteins onto genome

Human

Mouse/OtherMammals

Page 4: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Increase in quality

0%

20%

40%

60%

80%

100%

Human-UniSw-33Human-UniSw-34Human-UniSw-35Human-RefSeq-33Human-RefSeq-34Human-RefSeq-35Mouse-UniSw-30Mouse-UniSw-32Mouse-UniSw-33Mouse-UniSw-34Mouse-RefSeq-30Mouse-RefSeq-32Mouse-RefSeq-33Mouse-RefSeq-34

Missing

Matching

Edge perfect

Identical

Page 5: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Chicken

ReconcilewithGenome

Project orthologousproteins onto genome

Chicken

Human

Mouse

Page 6: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Chicken• Extant dinosaur lineage• Split from mammals 300 Mya• Neutral rate of 1.5

substitutions per base• No pseudogenes• Good synteny to human

• Tested Ensembl Gene Build:– 90% Perfect exon boundary

prediction

– 4% within 10 base pairs

– 85% sensitivity

Page 7: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

StickleBack• “close” to

Fugu/Tetraodon• 21,135 Genes• 97% Gene Loci

sensitivity (held out cDNAs)

• 87% exact exon prediction, 6% overlapping

• 63% of cDNAs had a perfect prediction without cDNA evidence

Page 8: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Human

MouseRat

Fugu, IMCB

Tetraodon, GENOSCOPE

Zebrafish

C. savignyi *

Fruitfly, FLYBASE

Malaria mosquito, VECTORBASE

C. elegans WORMBASE

Medaka

Rhesus macaqueChimpanzee

DogCow

Chicken

Xenopus

C. intestinalis

Fever mosquito*, VECTORBASE

523

41

91

83

310

92

360

450

990 25

70

140

?

550

25070?

1002003004005001000

Million years

19 species currently in Ensembl8 to be added by the end of the year* already in pre-site

Honey bee

340

Yeast, SGD

Opposum

170

1500?

?

Stickleback

Armadillo *

Elephant *

Tenrec *

105

?

Rabbit *95

?

Chordata

Vertebrata

AmniotaTetrapoda

Mammalia

Eutheria

Teleostei

Urochordata

Arthropoda

Nematoda

Fungi

Aves

Amphibia

Metatheria

Page 9: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Example of the Insulin clusterand data flattening

Duplication nodeSpeciation node or leaf

one2one

one2one

one2many

many2many

apparentone2one

Page 10: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Gene tree : 1st data assessment

Good concordance with the classical BRH/RHS paired species approach (RHS are based on gene order conservation)

Find more complex one-to-many and many-to-many relations

To do : compare with ~1000 curated trees from TreeFam

RHS BRH NEW

many2many 177 113 1,439

one2many 725 1,309 2,815

one2one 205 10,736 109

apparent one2one

78 1,571 104

lost 2,027 2,060

BRH NEW

many2many 170 1,599

one2many 1,870 4,563

one2one 880 80

apparent one2one

2,040 241

lost 620

Human/Mouse Human/Drosophila

19,001 5,580

11

,44

3

19

,38

1

Page 11: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Example of AlignSliceView between Human/Mouse/Rat/Dog with MLAGAN

Page 12: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Transcript SNP View

Page 13: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Ensembl OutreachEnsembl Outreach

Page 14: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

How do you get it?• www ensembl org

– Pretty pictures for genomes and genes– Web based data mining

• Open MySQL server - ensembldb– Script across the internet in Perl, Java or Python– 100% consistent semantics between genomes

• Extend via DAS– At genome, protein or “gene” levels

• Full download– Extend in house, run in-house DAS servers

• Send someone to us (geek for a week)• Bring over Xose to run a course (only travel costs need to

be covered)• Email [email protected] for more info.

Page 15: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

The ENCODE project

1% of the human

The Kitchen Sink of experimental methods

Page 16: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Protein coding loci are far more complex than we think

• On average 5 transcripts per locus

• Many do not encode proteins (as far as we can see)

• Even the ones which do encode proteins, many of these proteins look “weird”

Page 17: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

a inactive, "stressed"

(d) (e)

b active (beta inserted)(c)

(f)

The Clade B Serpins PotentialMissing fragments

Page 18: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Parsing the regulatory code

PolII

Myc

E2F2

H3K4Me3

Page 19: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Chromatin marks, Polymerase

Page 20: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

In vivo Transcription Factors

Page 21: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Nimblegen

Data

Import API

Client

ExportAPI

FuncGen DB(Archive?)

Mirror

Tab2MAGE

MAGE-ML

?

AnalysisPipeline

ProcessedData

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Browsers:Wiggle PlotHistone GlyphsRaw Data?

DAS

FuncGen DB(& Results?)

Export API

Import API

Web API?

FuncGen Results

DB?

Import API

API

Local

Page 22: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Reactome

Page 23: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Pathways…

Insulin binds the insulin receptor, causing it todimerise. The dimerised form the autophosphorylateson 6 cytoplasmic tyrosines. This phosphorylated form recruits the IRS adaptor....

Page 24: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Reactome data model

InsulinPeptide

InsulinReceptor

ReactionInsulin Receptordimer complex

Insulin Receptordimer complex,P-Tyr on 67...

Reaction

GO:phosphorylation

CatalystActivity

PubMed:1543

PubMed:5623Insulin Receptor Signalling

x2

Page 25: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome
Page 26: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Lineage Deletion rates

Trp Catabolism

Head or Tail

DNA Repair

Redundant Paths

Insulin Signalling

Pathway modules

Page 27: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Back to Proteins

Page 28: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Proteins are the natural Hub

Variation

Pathways

Regulation

Structures Literature

Genome

Proteins

Page 29: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Thanks: Ensembl

Leaders Ewan Birney (EBI), Tim Hubbard (Sanger Institute)

Analysis and Annotation Pipeline

Val Curwen, Steve Searle, Browen Aken, Juilo Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Simon White

Database Schema and Core API

Glenn Proctor, Ian Longden, Craig Melsopp, Patrick Meidl

BioMartArek Kasprzyk, Syed Heider, Richard Holland, Damian Smedley

Distributed Annotation System (DAS)

Andreas Kähäri, Eugene Kulesha

OutreachXosé M Fernández, Bert Overduin, Michael Schuster, Giulietta Spudlich

Web TeamJames Smith, Fiona Cunningham, Anne Parker, Stephen Rice, Steve Trevanion, Matt Wood

Comparative GenomicsAbel Ureta-Vidal, Benoit Ballester, Kathryn Beal, Stephen Fitzgerald, Javier Herrero, Albert Vilella

Functional Genomics

+ VariationPaul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios

Zebrafish Annotation Kerstin Jekosch, Mario Caccamo

Systems & Support Guy Coates, Tim Cutts

Page 30: Ensembl. Going beyond A,T, G and C Ewan Birney. There is more to life than proteins (but not much) Ensembl ENCODE Reactome

Thanks: Reactome and Consortia

Reactome EBI and CSHL

Reactome @ EBIEwan Birney, Imre Vastrik, Esther Schmidt, Bernard de Bono, Bijay Jassal

Reactome @ CSHLLincoln Stein, Peter D’Eustauchio, Gopal Gopinathrao, Guaming Wu, Lisa Matthews, Marc Gillispie

ENCODE 40 groups worldwide

Leaders:

Zhiping Weng (BU), Mike Snyder (Yale), John Stam. (U. Wash), Roderic Guigo (Barcelona), Tom Gingeras (Affy), Elliott Marguilles (NIH), Anindya Dutta (Duke), Manolis Dermzakalis (Sanger)

BioSapiens 20 groups across Europe

Structural work of ENCODE

Alfonso Valencia (Madrid), Michael Trees (Madrid), Janet Thornton (EBI) Gabby Logan (EBI)