instantjchem: a flexible chemical database system

Post on 08-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

InstantJChem: a flexible chemical database system. G. Marcou, D. Horvath + Laboratoire d ’ infochimie, Université de Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg. Introduction. The goal is to present InstantJChem for the storage and manipulation of chemical information - PowerPoint PPT Presentation

TRANSCRIPT

1

InstantJChem: a flexible chemical database system

G. Marcou, D. Horvath+Laboratoire d’infochimie, Université de Strasbourg, 1, rue

Blaise Pascal, 67000 Strasbourg

Introduction The goal is to present InstantJChem for the

storage and manipulation of chemical information

1. General presentation2. Database search3. Creation of a database from scratch

What is a database? A database stores data in an ordered form on a

precise subject. A relational database stores information into

tables which possess inter-references A relational database management system

(RDBMS) is a software that manages relational databases

InstantJChem is not a database and is not an RDBMS.

What is InstantJChem? InstantJChem is a friendly interface between a

RDBMS, chemical information and the user.

User

RDBMS

Chemical Information

Key concepts of InstantJChem

ProjectsSchemaDatabases and TablesEntitiesData TreesViews

Exercise 1Create a new project names IJCExercises…

Key concept: Project

Project

contains resources and connections to one or more databases.

icon

Exercise 1

…and import the file SC100.SDF in it….

Key concept: Schema

Schema/Database

Contains connection to a database and special tables (JChemProperties)

icon

Key concept: Database and Tables

Table

Database and tables are managed by the RDBMS.

Actually store information.

icon

What can be storedType Description

Standard tableInteger Long integer: 232 = 4294967296

Text User can specify widths of text fields as large as needed.

Real Real double-precision

Date Allows to store dates.

Boolean Value is True or False

List (Standard) To store a list of database items

JChem table

Chemical terms A list of functions evaluated on chemical structures: logD, pKa, tautomers,...

Structure Chemical structure, automatically created with a Jchem table

Key concept: Entities

Entity

An entity is a representation of data.

icon

It is a unique interface to conceptually different types of tables (Standard, Chemical, SQL, Extractions, etc).

Key concept: Data Trees

Data Tree

A collection of entities and views.

icon

Organize information using a hierarchy (parent-child relationship between entities).

Exercise 1….Customize a browser for it.

Key concept: Views

Views

An interface to data.

icon

For simple data, a spreadsheet view is relevant. For complex relational data, a form is mandatory.

Exercise 2In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search.

Exercise 2In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search.

Substructure search: 20 hitsSimilarity search: 0 hits

Substructure search: 14 hitsSimilarity search: 0 hits

Similarity search uses Chemical Hashed Fingerprints defined at database creation.

Chemical Hashed Fingerprints (CHF)

• Pattern Length: number of bonds of a pattern

• Fingerprint Length: total number of bits to store the fingerprint

• Bits per pattern: number of bits a pattern shall set on

Efficient annotation to accelerate structure search

www.chemaxon.com

Exercise 3Combine molecule 25 and 89 into a pseudo-molecule to perform a superstructure query.

Exercise 4Use compound 46 as a Full and Full fragment query to search the database. Repeat after removing the bromide from the query.

Structure Searches

www.chemaxon.com

Exercise 5Search benzene containing compounds, which name contains “pyrimidin” and annotated as “Good” concerning their aqueous solubility.

Exercise 6Search for compounds with at least one aromatic ring containing at least on Nitrogen atom

Exercise 7Search for compounds which MolWeight > 200 and not containing a benzene ring

Exercise 8Search for compounds with MolWeigh > 200, then for compounds without a benzene ring and search for the union of the hit lists.

Execrise 9Search for compounds possessing more than 4 microspecies at pH=4.0….

Exercise 9… Export your hit list.

Exercise 10Import in your project the file ISICCRsm.RDF…

Exercise 10… Create a Browser for this database

Exercise 11Search for reactions including an imidazole ring into their reactants then into their products.

Exercise 12Add to your Schema a new data tree and structure entity named AlkanBoilingPoint…

Exercise 12… and add a floating point value field named BoilingPoint.

Exercise 13Add to the AlkanBoilingPoint entity the following data.

Exercise 14Add to the AlkanBoilingPoint entity a new date field named Date and fill it.

Exercise 15Add to the AlkanBoilingPoint entity a calculated value of LogP using a Chemicalterm field.

Summary Create a project and schema Import data Search by substructure, superstructure, similarity,

and exact match Search by keyword Combining queries and result lists Export query results Create a new database

Conclusion InstantJChem is a Chemoinformatics layer above a

standard SGDB. Provides many more Chemoinformatics services

(databases overlap, QSPR modeling, plots, enumeration, scripting)

SGDBSGDB InstantJChemInstantJChem

top related