ambit software for data management and (q)sar applications nina jeliazkova bulgarian academy of...
TRANSCRIPT
AMBIT Software for Data Management and
(Q)SAR Applications
Nina Jeliazkova
Bulgarian Academy of Sciences Institute for Parallel Processing SofiaBulgariaE-mail [email protected]
Joanna Jaworska
Central Product SafetyProcter and GambleBelgium
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Introduction – why AMBIT ? Limited free, publicly accessible, methodologically
transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD)
Realization that efficient use of existing information on chemicals requires better ways for
Storage standardized formats, computer automated verification
of structures, capability to store large amounts of data
Taking advantage of rapidly evolving field of data mining and extraction of relevant information
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Content
Overview of AMBIT functional modules Technology choice and software capabilities Demonstration of the current state
Web application Online similarity search
Standalone applications Ambit Database Tools
Descriptor search Experimental data search Similarity search Verhaar classification scheme
AmbitDiscovery Applicability domain Grouping by different methods
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Software overview
Database
Search engineSearches by (CAS,
SMILES, Name)Substructure search
Similarity Search
EM9-1a,b, 2,3
Data import and export,Format
ConversionsEM9-1,2,3
Applicability domain
EM9-1a
Similarity assessment
EM9-1b
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT Database Today
Not restricted to these datasets! Any dataset can be imported! (e.g. DSSTox, AQUIRE, LLNA dataset …)
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT More about the internals…
Open source, relying on open standards Modular approach Stand alone and web versions Implemented in Java, i.e.
Platform independent (same application runs on Windows, Unix, Mac …) Suitable for web applications
The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit http://cdk.sourceforge.net/
The software is based on a Relational Database Management System Allows much faster and convenient access to the data in contrast to flat text
files. Our choice is MySQL database (www.mysql.com), which is the most popular
open source relational database. Chemical Markup LanguageChemical Markup Language (CML) (CML)
Acknowledged method of encoding chemical data in XML Acknowledged method of encoding chemical data in XML Being adopted by a large number of chemical organisations, from government, Being adopted by a large number of chemical organisations, from government,
through commercial to academia. through commercial to academia. The choice of CML for the The choice of CML for the internal formatinternal format makes the database makes the database independent independent
of the softwareof the software which is able to access it, in contrast to some proprietary which is able to access it, in contrast to some proprietary solutions.solutions.
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT Information stored:
Structures internally stored in (compressed) CML format, allowing transparent and easy storage of 1D,2D or 3D representations (including mixtures)
Multiple 3D structures per compound Identifiers (SMILES, INChi, CAS or other registry numbers; unlimited
number of arbitrary identifiers and synonyms) Inventory indicator Descriptors (unlimited number of arbitrary descriptors) Experimental data (flexible templates for experimental data) QSAR models Literature references Fingerprints and atom environments for fast substructure and
similarity search Other information generated in order to accelerate specific
queries The complete documentation of AMBIT Database is available at
http://ambit.acad.bg/docs
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT Database schema
Descriptors Repository
Compounds Repository
QSAR models RepositoryExperimental
ResultsRepository
UsersRepository
Literature ReferencesRepository
Queries
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT selected functionalities Input/output of chemical compounds, descriptors,
experimental data and QSAR models (many file formats)
Search Simple search (CAS, SMILES, chemical name) Descriptor search Experimental data search Substructure and similarity search
Grouping Verhaar classification scheme Similarity (see J.Jaworska presentation tomorrow)
QSAR Applicability domain assessment
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT Online – Similarity search
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT Online - Query result
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Links to other databases - KEGG
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Information about QSAR models
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT Database ToolsStandalone application
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT User InterfaceExample: Search by descriptor ranges
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT DiscoverySoftware for applicability domain and grouping
Methods: Descriptor space
Ranges Euclidean distance City-block Distance Probability Densityoptions
Threshold Preprocessing (e.g. PCA) Center More….
Structural similarity Fingerprints
Consensus fingerprint + Tanimoto distance
Consensus fingerprint + Missing fragments
Atom environments Consensus atom environments +
Hellinger distance kNN + Tanimoto distance Ranking
Results from several methods can be combined.Results from several methods can be combined.
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT DiscoveryData visualisation
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
AMBIT DiscoveryResults (exported to MSExcel file)
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Similarity based on mechanistic understanding
Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp.471-491, 1992
Verhaar scheme 34 rules 5 classes
Class 1. Narcosis or baseline toxicity
Class 2 Less inert compounds
Class 3 Unspecific reactivity
Class 4 Compounds and groups of compounds acting by a specific mechanism
Class 5 Not possible to classify according to these rules
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Verhaar scheme implementation
Modular approach
Can be used within:
•AMBIT Database Tools
•As an extension to ToxTree http://ecb.jrc.it/qsar/toxtree
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Summary
Many tools were developed and we are working on their seamless integration
Both standalone and web application are in beta stage and are being extensively tested
Synergies with other projects LRI Cefic gold standard BCF database will be stored in AMBIT LRI Cefic biotransformation database will be able to communicate with
AMBIT BCF ECB Cramer rules software for TTC (human health) - ToxTree Fraunhofer Institute subchronic toxicity database (human health) Approaches to similarity assessment will be further extended and tested
in context of category development /read across (ECB funded project) Open source software lowers the user barrier,
facilitates the dissemination activities and enables the reproducibility of models and results
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
This work is funded byCEFIC LRI EEM-9
Building blocks for a future (Q)SAR decision support system :
databases, applicability domain and structure conversions
Acknowledgment
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
The Chemistry Development Kithttp://cdk.sourceforge.net CDK is a freely available open source Java library for
structural chemo- and bioinformatics. Originated in - and is hosted by – the Research Group for
Molecular Informatics at Cologne University’s Bioinformatics Center.
Maintained and enhanced by more than 20 developers from both academic and industrial institutions all over the world.
Used in more than 10 different academic and industrial projects world wide.
Provides methods for many common tasks in molecular informatics SMILES parsing and generation Substructure searching 2D and 3D rendering of chemical structures I/O routines (format conversions) 3D builder QSAR module, etc
QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg
Thank you!Thank you!
Questions?Questions?