iuclid substance data - cefic-lricefic-lri.org/wp-content/uploads/2014/03/4.kochev... · a chemical...
TRANSCRIPT
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Workshop on CEFIC LRI Project EEM9.4
LRI AMBIT with IUCLID6 support and extended search capabilities
IUCLID Substance Data
1
Nikolay Kochev
Ideaconsult Ltd.
Sofia,Bulgaria
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Chemical structure vs. Substance
A chemical structure describes a well-defined molecule.
Chemicals synthesized in reality are not pure substances. In fact such substances
represent mixtures of several components. Therefore real substances can not be
associated with an unique structure. In contrast, components (i.e.: constituents,
impurities and/or additives) can clearly be characterized by a defined structure in
each case.
Under REACH, the concept of substance is clearly described. This definition is
implemented in the IUCLID data base.
Public, IUCLID Substance Data2
1,2-dimethoxyethane
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Substances under REACH
under REACH, a chemical substance is composed of:
Constituents (n>=1)
Impurities (n>=0)
Additives (n>=0)
under REACH, a chemical substance can have several compositions, e.g. crude,
distilled, etc.
under REACH, the type of a chemical substance can be:
Either mono-constituent (a substance, defined by its composition, in which one
main constituent is present to at least 80% (w/w)).
Or multi-constituent (a substance, defined by its composition, in which more than
one main constituent is present in a concentration 10% (w/w) and < 80% (w/w))
Or UVCB (Substance of Unknown or Variable composition, Complex reaction
products or Biological materials)
Public, IUCLID Substance Data3
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
REACH substance definition implemented in IUCLID Example: mono-constituent substance
Three different
compositions
Public, IUCLID Substance Data4
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
REACH substance definition implemented in IUCLID Example: mono-constituent substance
Three different
compositions
Public, IUCLID Substance Data5
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
REACH substance definition implemented in IUCLID Example: mono-constituent substance
Three different
compositions
Public, IUCLID Substance Data6
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
REACH substance definition implemented in IUCLID Example: UVCB N,N-dimethyl-C12-14-(even numbered)-alkyl-1-amines
Public, IUCLID Substance Data7
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
REACH substance definition implemented in IUCLID Example: multi-constituent substance
Public, IUCLID Substance Data8
The substance has 3 constituents and
3 impurities characterized by different structures
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Public, IUCLID Substance Data9
IUCLID6 support in AMBIT
• Given : Completely new XML schema of all objects
• 372 schema files, 111 endpoint study record files
• Different approach of linking between objects (compared to IUCLID5)
• Implementation
• Java classes generated from the XML schema (via JAXB)
• AMBIT code to convert the generated classes to the internal data model and be able to store into the database
• Use existing code for writing into the database
• And existing UI to show the data
• Transparent from user point of view: select .i6z or .i5z
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Public, IUCLID Substance Data10
IUCLID6 support in AMBIT
• Files (both IUCLID5 and IUCLID6)
• Transparent from user point of view: select .i6z or .i5z
• Web services
• IUCLID5 (SOAP) and IUCLID6 (REST)
• All endpoint study records supported previously (and more)
• Potential to support all endpoint study records
• The “Test material” is no more a checkbox
• Each study record links to a test material (a substance, identified by UUID)
• Substance and compositions
• Reference substances
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Public, IUCLID Substance Data11
IUCLID6 new composition types
• legal entity composition of the substance (default)
• boundary composition of the substance
• composition of the substance generated upon use
• other:
• IUCLID5 composition is migrated to “Legal entity composition”
• The composition record includes study information
• Introduced mostly because of nanomaterials, as REACH substance is defined by the main constituent
• (e.g. all TiO2 materials, regardless of the coatings=one substance)
• All different nanoforms are described as different compositions of the same substance
• And they have different shape, size, etc (i.e. characterisation)
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Detailed information Composition (1)Every constituent, impurity and additive is described in detail with a “Reference
substance” with several identifiers
Public, IUCLID Substance Data12
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Detailed information Composition (2)
The structure associated to the
reference substance is stored in the
IUICLID as a picture format only
which is normally not searchable.
InChI notation could be used for
structure identification.
SMILES notation could be used for
structure identification only if unique
SMILES strings are used both on data
import and query definition.
Public, IUCLID Substance Data13
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Public, IUCLID Substance Data14
Full structure support in AMBIT for all substance components
Various chemoinformatics approaches for handling chemical
structures
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Motivation to transfer IUCLID data to Ambit chemoinformatic system
IUCLID Limitation:
IUCLID allows queries in the substance data but has no functionality to search
chemical structures (exact, similar, or substructures). Queries using the SMILES
and InChI notation are possible.
In addition, IUCLID describes endpoints in very detailed complexity. Extraction of
key information relevant for substance evaluation is not convenient.
The IUCLID substance composition and IUCLID endpoint data can be transferred
and updated into the Ambit system. During this process structures are assigned
automatically to the constituents/impurities/additives of the substance.
In contrast to IUCLID, Ambit allows structure and data search.
Public, IUCLID Substance Data15
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Motivation to transfer IUCLID data to Ambit chemoinformatic system
Ambit advantages:
Chemical structure searching: exact, similarity and substructure search;
Read-across workflow;
Flexible faceted and free text searching for structure and data;
Export to various data formats preferred by industry and scientific community;
Modelling, data analysing and visualization utilities;
Support for chemical substances including nanomaterials;
Programmatic access via REST API;
User friendly web interface.
Public, IUCLID Substance Data16
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Extracting data from IUCLID Substances which should be transferred to AMBIT have to be flagged in IUCLID
In the IUCLID chapter “1.3 Identifiers” company specific flags can be added
Public, IUCLID Substance Data17
Company specific flags
examples:
TRA number to identify trade
products in the SAP System
Substances will be
transferred to Ambit
(CompTox – Ambit Transfer)
All Flags will be transferred to
Ambit and are searchable in
Ambit
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Import criteria to specify which studies will be imported into AMBIT
Where can I find these fields in
IUCLID?
In each Endpoint study record the
relevant fields are located in
Administrative Data
Data source
Public, LRI Project EEM9.3, IUCLID Substance Data18
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Why a selection is reasonable?
Only high quality study records of the IUCLID substance itself should be imported
into AMBIT, therefore we recommend to select only:
Key studies and Supporting studies (Adequacy of Study/Purpose flag/); the flags
weight of evidence and disregarded study are not high quality information.
Reliability 1 and 2 (Reliability); 3 (not reliable) and 4 (not assignable) are not
helpful to characterize the relevant endpoint information.
Experimental result (Study result type); Read across information should not be
selected, because these information will be transferred with the original IUCLID
substance to AMBIT.
Study reports, Publications and Review article (Reference type); secondary
source and grey literature should not be imported
Public, LRI Project EEM9.3, IUCLID Substance Data19
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
Import IUCLID files in AMBITIn Ambit some import filters can be selected
Public, IUCLID Substance Data20
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC
In Ambit some import filters can be selected
Public, IUCLID Substance Data21
Retrieve substances in AMBIT from IUCLID server
CEFIC Long-range Research Initiative, CEFIC LRI Project EEM9.4-IC