ebi is an outstation of the european molecular biology laboratory. chebi: the story so far paula de...

38
EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

Upload: baldric-fleming

Post on 16-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

EBI is an Outstation of the European Molecular Biology Laboratory.

ChEBI: The story so far

Paula de Matos

Page 2: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far2

Private Data Public Data

Page 3: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far3

The state of affairs of bioinformatics in 2002

• Bioinformatics is booming

• Human Genome sequence rough draft published June 2000

• Free resources and free data

Page 4: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far4

A different story for chemoinformatics

• Private data and private software

Page 5: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far5

Too hard to solve… lets put our head in the sand

Page 6: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far6

Bioinformatics data too large to keep track of chemical compounds

• 100000 Protein entries in SwissProt (2002)

• 20 million entries in EMBL Database (2002)

• Small databases unable to keep track

• ENZYME resources ~ 3500 enzymatic reactions

Page 7: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far7

New initiatives start up

• PubChem• Chemical repository, millions of entries, focus on screening

assays

• ChEBI • Manually annotated database, nomenclature reference and

compound database, tens of thousands of entries

Page 8: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far8

Principles of foundation

• December 2002 email exchanges within the EBI to address the issue of chemistry

• Three principles outlined

2002

2003

2004

2005

2006

2007

2008

Page 9: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far9

“Nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution/availability to anyone.”

Page 10: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far10

“Every data item in the database should be fully traceable and explicitly referenced to the original source/version.”

Page 11: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far11

“Although the EBI will provide a web interface, the entirety of the data should be available to all without constraint as, for example, SQL table dumps, ASCII tables, and XML (e.g. DAML+OIL)”

Page 12: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far12

We make a start using existing resources

• Integrate three resources• KEGG Compound• IntEnz • Chemical Ontology

• Annotation starts summer 2003• Focus on nomenclature

2002

2003

2004

2005

2006

2007

2008

Page 13: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far13

Our first release was modest but it was a start

• 21 July 2004

• 2783 annotated entities

• Data:• ChEBI Name, ChEBI Id• IUPAC Names, Synonyms• Formula• Cross-references20

0220

03

2004

2005

2006

2007

2008

Page 14: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far14

We introduce structures - Sep 2005

• Molfiles

• InChI (IUPAC International Chemical Identifier)

• SMILES (Simplified Molecular Input Line Entry System)

• Image (PNG)

2002

2003

2004

2005

2006

2007

2008

Page 15: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far15

Marvin in ChEBI

Page 16: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far16

We start editing the chemical ontology – Dec 2005

2002

2003

2004

2005

2006

2007

2008

Page 17: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far17

2002

2003

2004

2005

2006

2007

2008

Internationalisation of web pages – March 2006

Page 18: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far18

Internationalisation of data – Feb 2008

2002

2003

2004

2005

2006

2007

2008

Page 19: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far19

Web Services - Oct 2006

• Programmatic access to a ChEBI entry• SOAP based Java implementation

• Clients currently available in Java and perl

• Four methods with which to access data• getLiteEntity • getCompleteEntity• getOntologyParents • getOntologyChildren

2002

2003

2004

2005

2006

2007

2008

Page 20: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far20

Automated Cross References – Aug 2007

Current Databases:UniProtKB, Reactome, BioModels, IntAct, SABIO-RK, PubChem and ArrayExpress

2002

2003

2004

2005

2006

2007

2008

Page 21: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far21

2002

2003

2004

2005

2006

2007

2008

Chemical Structure Searching – May 2008

Page 22: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far22

After all this, where are we?

Page 23: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far23

Page 24: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far24

Page 25: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far25

Annotation is linear

ChEBI Data Growth

0

2000

4000

6000

8000

10000

12000

14000

1600007

/200

4 (R

el 1

)

09/2

004

(Rel

3)

12/2

004

(Rel

5)

01/2

005

(Rel

7)

03/2

005

(Rel

9)

05/2

005

(Rel

11)

07/2

005

(Rel

13)

09/2

005

(Rel

15)

11/2

005

(Rel

17)

02/2

006

(Rel

19)

04/2

006

(Rel

21)

06/2

006

(Rel

23)

08/2

006

(Rel

25)

10/2

006

(Rel

27)

01/2

007

(Rel

29)

03/2

007

(Rel

31)

05/2

007

(Rel

33)

07/2

007

(Rel

35)

09/2

007

(Rel

37)

11/2

007

(Rel

39)

01/2

008

(Rel

41)

03/2

008

(Rel

43)

Releases in month and year

Nu

mb

er o

f an

no

tate

d e

nti

ties

Page 26: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far26

Number of web hits grows

• Total pure entry hits in April: 42,612 / 273,219• Total web services hits in April: 88,226 • Web hits for 2007:

Page 27: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far27

Diversity of users

Constant challenge of balancing our users' varied interests.

Page 28: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far28

Our positives

• Nomenclature database

• Manually annotated data

• Attention to detail

• Free and accessible

• Loyal users

Page 29: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far29

Our not so positives

• Size for some people

• Not well integrated into other bioinformatics resources

• Community interaction

• No software publicly available to manipulate the database

Page 30: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far30

Involve the community

• Create a submission web based tool• Users can easily submit their entities on a one to one basis• Also allowing bulk submission from other resources.

Page 31: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far31

Improvements to data depth

• Addition of more Xrefs: PDB, MACIE ???

• Addition of more chemical attributes? What chemical attributes?

• Text mining projects to extract relevant chemical information from patents, journals• European Patent Office

Page 32: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far32

Going Open Source

• Commercial software packages will be replaced with Open Source

• Long term goal: allow people to create a free local instance of ChEBI

• Distribution of data in useful formats: CML, SDF

Page 33: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far33

Proposed changes to the ontology

• New relationships• “Is disjoint from”

molecular entities

organic molecular entities

inorganic molecular entities

organic ions

Page 34: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far34

Is alloprote of

succinate(2−)CHEBI:30031

succinic acid CHEBI:15741

Is alloprote of

Page 35: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far35

Has biological role and Has application

Has biological role

Page 36: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far36

• Currently working with the Swiss Institute of Bioinformatics building a database of biochemical reactions called Rhea

• All reactions mapped to ChEBI

Encourage use of ChEBI nomenclature

CHEBI:15422

C10H16N5O13P3

CHEBI:16027

C10H14N5O7PCHEBI:16761

C10H15N5O10P2

EC 2.7.4.3“ATP + AMP = 2 ADP”

Page 37: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far37

Acknowledgements

• ChEBI Team• Paula de Matos, Kirill

Degtyarenko, Marcus Ennis, Janna Hastings, Christoph Steinbeck

• Alumni • Michael Darsow, Mickael Guedj,

Alan McNaught, Martin Zbinden

• ChEBI supporters• Rolf Apweiler, Michael

Ashburner, Henning Hermjakob, Janet Thornton

• IntEnz Team• Rafael Alcantara, Volker Ast,

Kristian Axelsen, Anne Morgat

• EPO Collaborators• Helene Courrier, Stephane

Nauche, Jeremy Parsons

• Database supporters• ArrayExpress, IntAct,

Reactome, SABIO-RK, RSC, GO, RESID etc…

Page 38: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos

ChEBI: The story so far38

Discussion Points

Data Depth

New Relationships Encourage Nomenclature

Community