ebi is an outstation of the european molecular biology laboratory. msdchem and the chemistry of the...

27
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton UK

Upload: moris-owen

Post on 14-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

EBI is an Outstation of the European Molecular Biology Laboratory.

MSDchem and the chemistry of the

wwPDB

EMBO 22nd-26th September 2008EMBL-EBI Hinxton UK

The PDB Chemical components

PDB has more than the folding of standard polymers in 3-D

It gives an insight of interesting special chemistry

Bound ligands

Modified aminoacids

Non-standard chemical components are often the most interesting

The PDB ligand dictionary has served for many years

As the reference dictionary for the chemical definition of 3 letter codes in the PDB data

The ligand dictionary has been maintained by the curators in all wwPDB sites

Problems were accumulated

Duplicate entries

Impossible chemistry

The definition of what a 3 letter code represents was not clear and consistent

Stereo-chemistry was ignored

The MSDchem database

The database that supported the chemical component dictionary in the MSD.

The curation team had an explicit clear definition about ligands, right from the start

A distinct stereo-isomer;

connectivity,

bond orders,

absolute stereo-descriptors of atoms and bonds

This was reflected in the design and the implementation of the MSDChem database

The ligand identityAtom, elements, bonds and bond ordersAtom and bond absolute stereo-descriptors (Cahn-Ingold-Prelog)Equivalent to a canonical stereo-smile or INCHI string

MSDchem ligand definition

DCF

C4' R

C3' S

C1' R

DCM

C4' S

C3' R

C1' S

Other propertiesAtom names, and atom/bond orderingRepresentative coordinates

Derived propertiesAromatic bondsSmiles – INCHI stringsSystematic namesIdealised coordinatesRings – planesAtom Energy types

For known ligands coordinates are checked with ligand definition (Program DOHLC)

Atom labeling is checkedA new ligand may have to be defined

For a new ligandFundamental properties are checked Derived properties are generatedIs it identical to an existing ligand with another code? (DOHLC)

Ligand curation

3TH

Not possible

New ligand Actually it is6CP

Improvement of the chemical dictionary A core task of the wwPDB remediation projectRemaining issues and data errors were fixed

Duplicate identical ligandsNo representative coordinatesWrong valences

The definition of the ligand identity and the deviations were agreed among wwPDBThe wwPDB invested significantly in this area with a new software toolkit (ChemComp)Replaced most of the MSDChem backend

Ligands in the wwPDB

Additional investment in chemical softwareUse of chemical software packages

CACTVSOpenEyesCORINALexiChem

MSDChem not a separate data resourceJust loading of the wwPDB ligand dictionary in OracleIUPAC atom names,deoxy-bases, better chemical names

Molecules too big to be a single chemical componentSpecial chemistry (like metal complexes)Limitations of chemical softwareLegacy chemical components that is hard to deal with (like ions) Components that have never been fully observedModified components

Difficult Issues

Public pages for the wwPDB ligand dictionaryBased on an Oracle database load

Various search optionsVisualisation and navigationExporting in other formats

Has been running for almost 6 yearsIs used and referred by

Ligand Depot (RCSB equivalent)ChEbi at EBIPubChem at NCBIHIC-Up and others

The MSDChem web application

StatisticsNumber of ligands

0

1000

2000

3000

4000

5000

6000

7000

8000

2000 2001 2002 2003 2004 2005 2006 2007

Daily average load of MSDChem ~ 400 queries~ 100 distinct IP adresses

Hits per location

edu

uk

ebi

other

eu

com

net

Most common case: search for a 3 letter code seen in a PDB file

Search for a chemical name or part of it found in the literature

All known names are searchedCommon, PDBSystematicA synonym

Search following references

3 letter code

Chemical nameCommon, PDBSystematicA synonym

MSDChem search

Ligand details

For every kind of search there is a result list Summary information Preview icon of the molecule

Links to pages for every chemical componentWith detailed imagesLinks for more information about atoms, bond etc.

Various options for 3-D visualizationDownload options for common chemical formats

Results overview

Ligand details

Ligand overviewLigand details

Visualisation - ExportCoordinates

IdealRepresentative

Chemical formatsPDBMolfile (SDF)

Searching for chemical composition

Often aspects of composition are known but not the exact structure

Like particular elements (metals etc.)Or particular chemical fragments

User friendly expression building pages based on formula or fragments

Visually browse through the results

Formula range

Expression can be built with web form

Example : O1-4 N3-100 F01 to 4 oxygensMore than 3 nitrogensNo FluorineAnything else

Fragment search

Web form

Significant fragments

Example : More than 2 benzimidazolesNo piperazineAnything else

Searching for parts of structure

An outline of the structure or of some characteristic part is known

Looking for variants of moleculesLoad the known target and remove the unimportant partsPerform an sub graph search

Looking for chemical components with similar fragments and localized chemistry

Load the known target and perform a fingerprint search

Substructure search

Applet to draw diagram

Load and modify existing ligand

May take a couple of minutes

Links to the PDB

MSDchem searches strictly the reference dictionaryBut provides links to the PDB entries that include a ligand or a set of ligands

From ligand details pagesAnd from any query results page

Links to the summary pages for the entries (MSD Atlas pages)Or instances of the ligands in entries along with their environment and interactions (MSDmotif)

Link to PDB

From any result pageLike a fragment search

Link to PDB entries with such ligands

Link to Binding sites

Details - interactions of these ligands in entriesStatistics – search within results

Ligand index – download

Download of the complete archive

Compressed tar of Molfiles (SDF) CML (ChEBI style)MSDChem XMLRelational database

Just listingsSmile strings – name

Summary

The wwPDB ligand dictionary provides the chemistry of the PDBThe MSDChem backend has been merged in the remediation projectThe state of the dictionary has improvedThe MSDChem web application provides searching of the dictionary

NameFormulaSubstructureFragments - similarity