ebi as a research infrastructure graham cameron, ebi
Post on 30-Dec-2015
236 Views
Preview:
TRANSCRIPT
EBI as a research infrastructure
Graham Cameron, EBI
Heidelberg
Hinxton
Monterotondo
Hamburg
Grenoble
Service Research Training Industry
EMBL
EBI
Member States of EMBL
• Austria
• Belgium
• Denmark
• Finland
• France
• Portugal
• Spain
• Sweden
• Switzerland
• United Kingdom
• Germany
• Greece
• Israel
• Italy
• The Netherlands
• Norway
Hinxton
Service Research Training Industry
EBI
Wellcome Trust
Economic & Social Research Council
Council for the Central Laboratory
of the Research Councils
Natural Environment Research Council
Engineering & Physical Sciences Research Council
Particle Physics & Astronomy
Research Council
Biotechnology & Biological Sciences Research Council
Medical Research Council
Arts & Humanities Research Council
~ €3.8 Billion
We have amassed a wealth of knowledge about the molecular processes of living systems• Biomacromolecules• Biologically active molecules• The behaviour and interactions
of these molecules• The phenotypic effects of
molecular changes• Mutations• Drugs• Nutrients
• The molecular adjuncts of phenotypic changes• Disease• Aging
• Databases• Web access• Tools to explore the information• Systems to capture the
information• Service centres
DNA
Protein Sequences
Expression
Structures
PDB code 1DIFHIV-1 Protease/Inhibitor Complex A79285 (Difluoroketone)
molecules interact
Pathways
Reactome
EnsEMBLGenome
Annotation
EMBL-BankDNA sequences
UniProtProtein Sequences
Array-ExpressMicroarray
Expression Data
EMSDMacromolecularStructure Data
IntActProtein Interactions
Usage
• Basic research• Industry
• Pharma• Diagnostics• Medical device research• Personal care• Nutrition• Agriculture• Forestries• Fishery
• Patent searching and provenance
Using the information
Not Salt TolerantSalt Tolerant
Disease proneDisease Resistant
Low YieldHigh Yield
DiseasedHealthy
Suppose a gene’s variation seems important
Using the information
Not Salt TolerantSalt Tolerant
Disease proneDisease Resistant
Low YieldHigh Yield
DiseasedHealthy
Look in databases for similar genes, their products, and functions, structures, interactions and expression patterns. The processes in which they are involved.
Using the information
Not Salt TolerantSalt Tolerant
Disease proneDisease Resistant
Low YieldHigh Yield
DiseasedHealthy
Can we influence the processes in which they are involved?
Using the information
Not Salt TolerantSalt Tolerant
Disease proneDisease Resistant
Low YieldHigh Yield
DiseasedHealthy
Can we influence the processes in which they are involved?
•Working out what in the lab what a gene does could easily be a year’s work
•Searching databases can do it in half an hour
0
20000
40000
60000
80000
100000
120000
Jun-82
Jun-83
Jun-84
Jun-85
Jun-86
Jun-87
Jun-88
Jun-89
Jun-90
Jun-91
Jun-92
Jun-93
Jun-94
Jun-95
Jun-96
Jun-97
Jun-98
Jun-99
Jun-00
Jun-01
Jun-02
Jun-03
Jun-04
Jun-05
Nucleotide SequenceDatabase Growth
Meg
abas
es
Date
A new se
quence o
nce a
seco
nd
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
1st99
2nd99
3rd99
4th99
1st00
2nd00
3rd00
4th00
1st01
2nd01
3rd01
4th01
1st02
2nd02
3rd02
4th02
1st03
2nd03
3rd03
4th03
1st04
2nd04
3rd04
4th04
1st05
2nd05
3rd05
Average Web Hits per Day
Including Ensembl
Quarter Year
Ave
rage
Hits
per
Day
Note: Ensembl is a joint project withThe Wellcome Trust Sanger Institute. Equivalent usage data have only beenavailable since 2004.
A few hundre
d thousa
nd
unique users
per month
A milli
on unique users
per year
European Context
• BioSapiens• EMBRACE• ENFIN
• (and many others)
Biosapiens
• European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, Cambridge, UK.
• European Molecular Biology Laboratory, Heidelberg, Germany.
• German National Centre for Environment and Health, Neuherberg, Münich, Germany
• Université Libre de Bruxelles, Brussels, Belgium
• Consejo Superior de Investigaciones Cientificas, Madrid, Spain
• Institut Municipal d'Assistència Sanitària, Barcelona, Spain
• Genome Research Ltd, Hinxton, Cambridge, UK.
• Max-Planck Institute for Informatics, Saarbrücken, Germany
• The Hebrew University of Jerusalem, Girat Ram, Israel
• Department of Biochemical Sciences University of Rome "La Sapienza", Rome, Italy
• University of Stockholm, Stockholm, Sweden
• University of Oxford, Oxford, UK.
• University College London, London, UK.
• Radboud University Nijmegen, Nijmegen, The Netherlands
• Swiss Institute of Bioinformatics, Geneva, Switzerland
• Technical University of Denmark, Lyngby, Denmark
• University of Helsinki, Helsinki, Finland
• University of Geneva, Geneva, Switzerland
• Institute of Enzymology, Hungarian Academy of Sciences, Budapest, Hungary
• University of Cologne, Cologne, Germany
• Institut Pasteur, Paris, France
• BioInfo Bank Institute, Poznan, Poland
• Max Planck Institute for Molecular Genetics, Berlin, Germany
• Genoscope, Evry, France
• University of Bologna, Bologna, Italy
• European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, Cambridge, UK
EMBRACE• European Molecular Biology Laboratory -
European Bioinformatics Institute, Hinxton, Cambridge, UK.
• European Molecular Biology Laboratory, Heidelberg, Germany.
• Institute of Biomedical Technologies, Section Bari, CNR, Bari, Italy
• University of Manchester, UK• Swiss Institute of Bioinformatics, Geneva,
Switzerland• Swedish University of Agricultural Sciences.The
Linnaeus Centre for Bioinformatics, Sweden• Centre National de la Recherche Scientifique,
Clermont-Ferrand and Lyon, France• Centre for Biological Sequence
Analysis,Technical University of Denmark, Lyngby, Denmark
• Centro Nacional de Biotecnologia/Consejo Superior de Investigaciones Cientificas, Madrid, Spain
• University of Stockholm, Stockholm Bioinformatics Centre, Sweden
• Institut National de la Recherche Agronomique, Toulouse, France
• Max Planck Institute for Molecular Genetics, Berlin, Germany
• CSC, the Finnish IT Center for Science, Espoo, Finland
• University College London, London, UK.• The Weizmann Institute, Rehovot, Israel• Centre for Molecular and Biomolecular
Informatics, University of Nijmegen, The Netherlands
• Carretera de Ajalvir, km. 4, 28850 Torrejon de Ardoz, Madrid
ENFIN
• The European Bioinformatics Institute / The European Molecular Biology Laboratory, Europe
• The University of Dundee UK
• Technical University of Denmark
• University of Rome Tor Vergata Italy)
• Medical Research Council Mammalian Genetics Unit (MRCMGU), UK
• Ludwig Institute for Cancer Research, Uppsala (LICR-UPP), Germany
• The Max Planck Institute, Germany
• University of Helsinki (UH), Iceland
• University College London (UCL), UK
• National Center for Research and Technology, Hellas (CERTH), Greece
• Universitaet zu Koeln (UNIK), Germany• Weizmann Institute (Weizmann), Israel• Egeen (EGEEN), Estonia• Serono Pharmaceutical Research Institute
(SPRI), Switzerland• Consejo Superior de Investigaciones
Científicas (CSIC), Spain• Centre for Integrative Bioinformatics VU
(IBIVU), Netherlands
Global Picture
• DNA – tripartite international collaboration
(including patent data acquisition)• Protein sequences – Uniprot collaboration• Macromolecular structures – tripartite international
collaboration• Intact international agreements• Reactome – USA Europe collaboration• Etc.
Flybase
MGD
SGD
BRENDA
Chemicaldata
resources
Medical data resources
Biodiversitydata
resources
IMGT
Pasteur DBs
Eumorphia/Phenotypes
Corebiomolecular
resources
Specialist biomolecular data resource examples
Mutants
Large resources in related disciplines
Model organism resource examples
Mouse Atlas
Large resources in related disciplines
Biodiversitydata
resources
Flybase
MGD
SGD
BRENDA
Chemicaldata
resources
Medical data resources
IMGT
Pasteur DBs
Eumorphia/Phenotypes
Corebiomolecular
resources
Specialist biomolecular data resource examples
Mutants
Model organism resource examples
Mouse Atlas
Medical data resources
Corebiomolecular
resources
Flybase
MGD
SGD
BRENDA
Chemicaldata
resources
Medical data resources
Biodiversitydata
resources
IMGT
Pasteur DBs
Eumorphia/Phenotypes
Corebiomolecular
resources
Specialist biomolecular data resource examples
Mutants
Large resources in related disciplines
Model organism resource examples
Mouse Atlas
USA
UKGermany
France
Japan
Italy
Spain
Canada
Sweden
Other
Norway
Netherlands)
SwitzerlandBelgium
IsraelAustralia
Taiwan
Denmark Austria
Finland
Web Hits
EBI Total RunningBudget 2005 = €26 million
EMBL50%
EU22%
USA8%
Other3%Industry
3%
Wellcome Trust7%
UK Research Councils7%
Projected budget 2011 = €43 million
€ 0
€ 10
€ 20
€ 30
€ 40
€ 50
€ 60
NCBI 2004/5 + PDB EBI 2005 EBI 2011
Mill
ions
€ 0
€ 500
€ 1,000
€ 1,500
€ 2,000
€ 2,500
€ 3,000
Cost of thedata
NCBI 2004/5 +PDB
EBI 2005 EBI 2011
Mill
ions
Read-only or dynamic
• There’s nothing particularly difficult about archiving unchanging data• But most aren’t
• Todays best bet• E.g, Ensembl
• Provenance• E.g., patent searching• N.B. Versioning (complex!)
• Cititation
How much data
• Canonical vs. episodic• Genomes, expression profiles
• Raw vs. processed• Sequence traces• Structure factors
Custodianship acquisition and ownership
• Widely accepted obligation to deposit data• Depend on the goodwill of the community
• Add “organisation”• Add “services”• Add “value”
Annotation as added value
• First/second/third party annotation• Computational vs. experimental• Bundled vs. distributed
• (DAS)
Openness
• We approve of it• Data must be made available as soon as they are
discussed in a publication• Data from “community” projects should be made available
immediately
• Confidentiality issues must be addressed
Federation
• Monolithic solutions fail• Centralisation yields more than the sum of the parts• Aggregation of institutional repositories is essential
Slice it vertically or horizontally?
• E.g., the EBI and AstroGrid are domain specific• Would it be better if they were jointly managed by data
experts?
• Standardisation• Mixed success
Supporting the electronic record of science
• This is more like libraries than research projects• Needs long term commitment• With accountability
• Current funding structures are not well adapted to the task
• Pitching the information providers in competition with their research community is damaging.
Bioinformatics Infrastructure
• Has captured the data from several billion Euros worth of science
• Serves a community of perhaps a million users• Supports science on which the UK alone spends €3-4 billion a
year• Cuts years of lab work down to hours of computer work• Is crucial to human well being from medicine to agriculture• Sees data volume and usage growing exponentially• Might cost a few tens of millions (at most a couple of percent of
the cost of the science it supports).
top related