ebi as a research infrastructure graham cameron, ebi

Post on 30-Dec-2015

236 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EBI as a research infrastructure

Graham Cameron, EBI

Heidelberg

Hinxton

Monterotondo

Hamburg

Grenoble

Service Research Training Industry

EMBL

EBI

Member States of EMBL

• Austria

• Belgium

• Denmark

• Finland

• France

• Portugal

• Spain

• Sweden

• Switzerland

• United Kingdom

• Germany

• Greece

• Israel

• Italy

• The Netherlands

• Norway

                                                                                                                               

                                                                                          

Hinxton

Service Research Training Industry

EBI

Wellcome Trust

Economic & Social Research Council

Council for the Central Laboratory

of the Research Councils

Natural Environment Research Council

Engineering & Physical Sciences Research Council

Particle Physics & Astronomy

Research Council

Biotechnology & Biological Sciences Research Council

Medical Research Council

Arts & Humanities Research Council

~ €3.8 Billion

We have amassed a wealth of knowledge about the molecular processes of living systems• Biomacromolecules• Biologically active molecules• The behaviour and interactions

of these molecules• The phenotypic effects of

molecular changes• Mutations• Drugs• Nutrients

• The molecular adjuncts of phenotypic changes• Disease• Aging

• Databases• Web access• Tools to explore the information• Systems to capture the

information• Service centres

DNA

Protein Sequences

Expression

Structures

PDB code 1DIFHIV-1 Protease/Inhibitor Complex A79285 (Difluoroketone)

molecules interact

Pathways

Reactome

EnsEMBLGenome

Annotation

EMBL-BankDNA sequences

UniProtProtein Sequences

Array-ExpressMicroarray

Expression Data

EMSDMacromolecularStructure Data

IntActProtein Interactions

Usage

• Basic research• Industry

• Pharma• Diagnostics• Medical device research• Personal care• Nutrition• Agriculture• Forestries• Fishery

• Patent searching and provenance

Using the information

Not Salt TolerantSalt Tolerant

Disease proneDisease Resistant

Low YieldHigh Yield

DiseasedHealthy

Suppose a gene’s variation seems important

Using the information

Not Salt TolerantSalt Tolerant

Disease proneDisease Resistant

Low YieldHigh Yield

DiseasedHealthy

Look in databases for similar genes, their products, and functions, structures, interactions and expression patterns. The processes in which they are involved.

Using the information

Not Salt TolerantSalt Tolerant

Disease proneDisease Resistant

Low YieldHigh Yield

DiseasedHealthy

Can we influence the processes in which they are involved?

Using the information

Not Salt TolerantSalt Tolerant

Disease proneDisease Resistant

Low YieldHigh Yield

DiseasedHealthy

Can we influence the processes in which they are involved?

•Working out what in the lab what a gene does could easily be a year’s work

•Searching databases can do it in half an hour

0

20000

40000

60000

80000

100000

120000

Jun-82

Jun-83

Jun-84

Jun-85

Jun-86

Jun-87

Jun-88

Jun-89

Jun-90

Jun-91

Jun-92

Jun-93

Jun-94

Jun-95

Jun-96

Jun-97

Jun-98

Jun-99

Jun-00

Jun-01

Jun-02

Jun-03

Jun-04

Jun-05

Nucleotide SequenceDatabase Growth

Meg

abas

es

Date

A new se

quence o

nce a

seco

nd

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

1st99

2nd99

3rd99

4th99

1st00

2nd00

3rd00

4th00

1st01

2nd01

3rd01

4th01

1st02

2nd02

3rd02

4th02

1st03

2nd03

3rd03

4th03

1st04

2nd04

3rd04

4th04

1st05

2nd05

3rd05

Average Web Hits per Day

Including Ensembl

Quarter Year

Ave

rage

Hits

per

Day

Note: Ensembl is a joint project withThe Wellcome Trust Sanger Institute. Equivalent usage data have only beenavailable since 2004.

A few hundre

d thousa

nd

unique users

per month

A milli

on unique users

per year

European Context

• BioSapiens• EMBRACE• ENFIN

• (and many others)

Biosapiens

• European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, Cambridge, UK.

• European Molecular Biology Laboratory, Heidelberg, Germany.

• German National Centre for Environment and Health, Neuherberg, Münich, Germany

• Université Libre de Bruxelles, Brussels, Belgium

• Consejo Superior de Investigaciones Cientificas, Madrid, Spain

• Institut Municipal d'Assistència Sanitària, Barcelona, Spain

• Genome Research Ltd, Hinxton, Cambridge, UK.

• Max-Planck Institute for Informatics, Saarbrücken, Germany

• The Hebrew University of Jerusalem, Girat Ram, Israel

• Department of Biochemical Sciences University of Rome "La Sapienza", Rome, Italy

• University of Stockholm, Stockholm, Sweden

• University of Oxford, Oxford, UK.

• University College London, London, UK.

• Radboud University Nijmegen, Nijmegen, The Netherlands

• Swiss Institute of Bioinformatics, Geneva, Switzerland

• Technical University of Denmark, Lyngby, Denmark

• University of Helsinki, Helsinki, Finland

• University of Geneva, Geneva, Switzerland

• Institute of Enzymology, Hungarian Academy of Sciences, Budapest, Hungary

• University of Cologne, Cologne, Germany

• Institut Pasteur, Paris, France

• BioInfo Bank Institute, Poznan, Poland

• Max Planck Institute for Molecular Genetics, Berlin, Germany

• Genoscope, Evry, France

• University of Bologna, Bologna, Italy

• European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, Cambridge, UK

EMBRACE• European Molecular Biology Laboratory -

European Bioinformatics Institute, Hinxton, Cambridge, UK.

• European Molecular Biology Laboratory, Heidelberg, Germany.

• Institute of Biomedical Technologies, Section Bari, CNR, Bari, Italy

• University of Manchester, UK• Swiss Institute of Bioinformatics, Geneva,

Switzerland• Swedish University of Agricultural Sciences.The

Linnaeus Centre for Bioinformatics, Sweden• Centre National de la Recherche Scientifique,

Clermont-Ferrand and Lyon, France• Centre for Biological Sequence

Analysis,Technical University of Denmark, Lyngby, Denmark

• Centro Nacional de Biotecnologia/Consejo Superior de Investigaciones Cientificas, Madrid, Spain

• University of Stockholm, Stockholm Bioinformatics Centre, Sweden

• Institut National de la Recherche Agronomique, Toulouse, France

• Max Planck Institute for Molecular Genetics, Berlin, Germany

• CSC, the Finnish IT Center for Science, Espoo, Finland

• University College London, London, UK.• The Weizmann Institute, Rehovot, Israel• Centre for Molecular and Biomolecular

Informatics, University of Nijmegen, The Netherlands

• Carretera de Ajalvir, km. 4, 28850 Torrejon de Ardoz, Madrid

ENFIN

• The European Bioinformatics Institute / The European Molecular Biology Laboratory, Europe

• The University of Dundee UK

• Technical University of Denmark

• University of Rome Tor Vergata Italy)

• Medical Research Council Mammalian Genetics Unit (MRCMGU), UK

• Ludwig Institute for Cancer Research, Uppsala (LICR-UPP), Germany

• The Max Planck Institute, Germany

• University of Helsinki (UH), Iceland

• University College London (UCL), UK

• National Center for Research and Technology, Hellas (CERTH), Greece

• Universitaet zu Koeln (UNIK), Germany• Weizmann Institute (Weizmann), Israel• Egeen (EGEEN), Estonia• Serono Pharmaceutical Research Institute

(SPRI), Switzerland• Consejo Superior de Investigaciones

Científicas (CSIC), Spain• Centre for Integrative Bioinformatics VU

(IBIVU), Netherlands

Global Picture

• DNA – tripartite international collaboration

(including patent data acquisition)• Protein sequences – Uniprot collaboration• Macromolecular structures – tripartite international

collaboration• Intact international agreements• Reactome – USA Europe collaboration• Etc.

Flybase

MGD

SGD

BRENDA

Chemicaldata

resources

Medical data resources

Biodiversitydata

resources

IMGT

Pasteur DBs

Eumorphia/Phenotypes

Corebiomolecular

resources

Specialist biomolecular data resource examples

Mutants

Large resources in related disciplines

Model organism resource examples

Mouse Atlas

Large resources in related disciplines

Biodiversitydata

resources

Flybase

MGD

SGD

BRENDA

Chemicaldata

resources

Medical data resources

IMGT

Pasteur DBs

Eumorphia/Phenotypes

Corebiomolecular

resources

Specialist biomolecular data resource examples

Mutants

Model organism resource examples

Mouse Atlas

Medical data resources

Corebiomolecular

resources

Flybase

MGD

SGD

BRENDA

Chemicaldata

resources

Medical data resources

Biodiversitydata

resources

IMGT

Pasteur DBs

Eumorphia/Phenotypes

Corebiomolecular

resources

Specialist biomolecular data resource examples

Mutants

Large resources in related disciplines

Model organism resource examples

Mouse Atlas

USA

UKGermany

France

Japan

Italy

Spain

Canada

Sweden

Other

Norway

Netherlands)

SwitzerlandBelgium

IsraelAustralia

Taiwan

Denmark Austria

Finland

Web Hits

EBI Total RunningBudget 2005 = €26 million

EMBL50%

EU22%

USA8%

Other3%Industry

3%

Wellcome Trust7%

UK Research Councils7%

Projected budget 2011 = €43 million

€ 0

€ 10

€ 20

€ 30

€ 40

€ 50

€ 60

NCBI 2004/5 + PDB EBI 2005 EBI 2011

Mill

ions

€ 0

€ 500

€ 1,000

€ 1,500

€ 2,000

€ 2,500

€ 3,000

Cost of thedata

NCBI 2004/5 +PDB

EBI 2005 EBI 2011

Mill

ions

Read-only or dynamic

• There’s nothing particularly difficult about archiving unchanging data• But most aren’t

• Todays best bet• E.g, Ensembl

• Provenance• E.g., patent searching• N.B. Versioning (complex!)

• Cititation

How much data

• Canonical vs. episodic• Genomes, expression profiles

• Raw vs. processed• Sequence traces• Structure factors

Custodianship acquisition and ownership

• Widely accepted obligation to deposit data• Depend on the goodwill of the community

• Add “organisation”• Add “services”• Add “value”

Annotation as added value

• First/second/third party annotation• Computational vs. experimental• Bundled vs. distributed

• (DAS)

Openness

• We approve of it• Data must be made available as soon as they are

discussed in a publication• Data from “community” projects should be made available

immediately

• Confidentiality issues must be addressed

Federation

• Monolithic solutions fail• Centralisation yields more than the sum of the parts• Aggregation of institutional repositories is essential

Slice it vertically or horizontally?

• E.g., the EBI and AstroGrid are domain specific• Would it be better if they were jointly managed by data

experts?

• Standardisation• Mixed success

Supporting the electronic record of science

• This is more like libraries than research projects• Needs long term commitment• With accountability

• Current funding structures are not well adapted to the task

• Pitching the information providers in competition with their research community is damaging.

Bioinformatics Infrastructure

• Has captured the data from several billion Euros worth of science

• Serves a community of perhaps a million users• Supports science on which the UK alone spends €3-4 billion a

year• Cuts years of lab work down to hours of computer work• Is crucial to human well being from medicine to agriculture• Sees data volume and usage growing exponentially• Might cost a few tens of millions (at most a couple of percent of

the cost of the science it supports).

top related