ensembl. going beyond a,t, g and c ewan birney. there is more to life than proteins (but not much)...
TRANSCRIPT
Ensembl. Going beyond A,T, G and C
Ewan Birney
There is more to life than proteins(but not much)
Ensembl
ENCODE
Reactome
Human/Mouse
ReconcilewithGenome
Project orthologousproteins onto genome
Human
Mouse/OtherMammals
Increase in quality
0%
20%
40%
60%
80%
100%
Human-UniSw-33Human-UniSw-34Human-UniSw-35Human-RefSeq-33Human-RefSeq-34Human-RefSeq-35Mouse-UniSw-30Mouse-UniSw-32Mouse-UniSw-33Mouse-UniSw-34Mouse-RefSeq-30Mouse-RefSeq-32Mouse-RefSeq-33Mouse-RefSeq-34
Missing
Matching
Edge perfect
Identical
Chicken
ReconcilewithGenome
Project orthologousproteins onto genome
Chicken
Human
Mouse
Chicken• Extant dinosaur lineage• Split from mammals 300 Mya• Neutral rate of 1.5
substitutions per base• No pseudogenes• Good synteny to human
• Tested Ensembl Gene Build:– 90% Perfect exon boundary
prediction
– 4% within 10 base pairs
– 85% sensitivity
StickleBack• “close” to
Fugu/Tetraodon• 21,135 Genes• 97% Gene Loci
sensitivity (held out cDNAs)
• 87% exact exon prediction, 6% overlapping
• 63% of cDNAs had a perfect prediction without cDNA evidence
Human
MouseRat
Fugu, IMCB
Tetraodon, GENOSCOPE
Zebrafish
C. savignyi *
Fruitfly, FLYBASE
Malaria mosquito, VECTORBASE
C. elegans WORMBASE
Medaka
Rhesus macaqueChimpanzee
DogCow
Chicken
Xenopus
C. intestinalis
Fever mosquito*, VECTORBASE
523
41
91
83
310
92
360
450
990 25
70
140
?
550
25070?
1002003004005001000
Million years
19 species currently in Ensembl8 to be added by the end of the year* already in pre-site
Honey bee
340
Yeast, SGD
Opposum
170
1500?
?
Stickleback
Armadillo *
Elephant *
Tenrec *
105
?
Rabbit *95
?
Chordata
Vertebrata
AmniotaTetrapoda
Mammalia
Eutheria
Teleostei
Urochordata
Arthropoda
Nematoda
Fungi
Aves
Amphibia
Metatheria
Example of the Insulin clusterand data flattening
Duplication nodeSpeciation node or leaf
one2one
one2one
one2many
many2many
apparentone2one
Gene tree : 1st data assessment
Good concordance with the classical BRH/RHS paired species approach (RHS are based on gene order conservation)
Find more complex one-to-many and many-to-many relations
To do : compare with ~1000 curated trees from TreeFam
RHS BRH NEW
many2many 177 113 1,439
one2many 725 1,309 2,815
one2one 205 10,736 109
apparent one2one
78 1,571 104
lost 2,027 2,060
BRH NEW
many2many 170 1,599
one2many 1,870 4,563
one2one 880 80
apparent one2one
2,040 241
lost 620
Human/Mouse Human/Drosophila
19,001 5,580
11
,44
3
19
,38
1
Example of AlignSliceView between Human/Mouse/Rat/Dog with MLAGAN
Transcript SNP View
Ensembl OutreachEnsembl Outreach
How do you get it?• www ensembl org
– Pretty pictures for genomes and genes– Web based data mining
• Open MySQL server - ensembldb– Script across the internet in Perl, Java or Python– 100% consistent semantics between genomes
• Extend via DAS– At genome, protein or “gene” levels
• Full download– Extend in house, run in-house DAS servers
• Send someone to us (geek for a week)• Bring over Xose to run a course (only travel costs need to
be covered)• Email [email protected] for more info.
The ENCODE project
1% of the human
The Kitchen Sink of experimental methods
Protein coding loci are far more complex than we think
• On average 5 transcripts per locus
• Many do not encode proteins (as far as we can see)
• Even the ones which do encode proteins, many of these proteins look “weird”
a inactive, "stressed"
(d) (e)
b active (beta inserted)(c)
(f)
The Clade B Serpins PotentialMissing fragments
Parsing the regulatory code
PolII
Myc
E2F2
H3K4Me3
Chromatin marks, Polymerase
In vivo Transcription Factors
Nimblegen
Data
Import API
Client
ExportAPI
FuncGen DB(Archive?)
Mirror
Tab2MAGE
MAGE-ML
?
AnalysisPipeline
ProcessedData
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Browsers:Wiggle PlotHistone GlyphsRaw Data?
DAS
FuncGen DB(& Results?)
Export API
Import API
Web API?
FuncGen Results
DB?
Import API
API
Local
Reactome
Pathways…
Insulin binds the insulin receptor, causing it todimerise. The dimerised form the autophosphorylateson 6 cytoplasmic tyrosines. This phosphorylated form recruits the IRS adaptor....
Reactome data model
InsulinPeptide
InsulinReceptor
ReactionInsulin Receptordimer complex
Insulin Receptordimer complex,P-Tyr on 67...
Reaction
GO:phosphorylation
CatalystActivity
PubMed:1543
PubMed:5623Insulin Receptor Signalling
x2
Lineage Deletion rates
Trp Catabolism
Head or Tail
DNA Repair
Redundant Paths
Insulin Signalling
Pathway modules
Back to Proteins
Proteins are the natural Hub
Variation
Pathways
Regulation
Structures Literature
Genome
Proteins
Thanks: Ensembl
Leaders Ewan Birney (EBI), Tim Hubbard (Sanger Institute)
Analysis and Annotation Pipeline
Val Curwen, Steve Searle, Browen Aken, Juilo Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Simon White
Database Schema and Core API
Glenn Proctor, Ian Longden, Craig Melsopp, Patrick Meidl
BioMartArek Kasprzyk, Syed Heider, Richard Holland, Damian Smedley
Distributed Annotation System (DAS)
Andreas Kähäri, Eugene Kulesha
OutreachXosé M Fernández, Bert Overduin, Michael Schuster, Giulietta Spudlich
Web TeamJames Smith, Fiona Cunningham, Anne Parker, Stephen Rice, Steve Trevanion, Matt Wood
Comparative GenomicsAbel Ureta-Vidal, Benoit Ballester, Kathryn Beal, Stephen Fitzgerald, Javier Herrero, Albert Vilella
Functional Genomics
+ VariationPaul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel Rios
Zebrafish Annotation Kerstin Jekosch, Mario Caccamo
Systems & Support Guy Coates, Tim Cutts
Thanks: Reactome and Consortia
Reactome EBI and CSHL
Reactome @ EBIEwan Birney, Imre Vastrik, Esther Schmidt, Bernard de Bono, Bijay Jassal
Reactome @ CSHLLincoln Stein, Peter D’Eustauchio, Gopal Gopinathrao, Guaming Wu, Lisa Matthews, Marc Gillispie
ENCODE 40 groups worldwide
Leaders:
Zhiping Weng (BU), Mike Snyder (Yale), John Stam. (U. Wash), Roderic Guigo (Barcelona), Tom Gingeras (Affy), Elliott Marguilles (NIH), Anindya Dutta (Duke), Manolis Dermzakalis (Sanger)
BioSapiens 20 groups across Europe
Structural work of ENCODE
Alfonso Valencia (Madrid), Michael Trees (Madrid), Janet Thornton (EBI) Gabby Logan (EBI)