workflow. the software “coralsea“ is a tool to build up the quantitative structure – property...

Post on 25-Dec-2015

239 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Workflow

The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity

relationships (QSPRs/QSARs)

The representation of the molecular structure that is used in the CORALSEA is SMILES

= simplified molecular input-line entry system

For details, please see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

Here we used for the demo of CORALSEA our model from article “THE DEFINITION OF THE MOLECULAR STRUCTURE FOR POTENTIAL ANTI-MALARIA AGENTS BY THE MONTE CARLO METHOD” Struct. Chem. 2013; 24:1369–1381

You can develop a better model , but now please follow our suggestions.

The first action is the preparation of SMILES file which is the input for CORALSEA

+1 COc1ccc2c(c1)NC(C)=C(CCCCCCC)C2=O 7.332+2 COc1ccc2c(c1)NC(C)=CC2=O 4.903+3 O=C1c2ccccc2NC(C)=C1CCCCCCC 6.979+4 O=C1c2ccccc2NC(C)=C1CCCCCCCCC 7.400#5 O=C1c3ccccc3NC(C)=C1C2CCCCC2 5.652-6 O=C1c3ccccc3NC(C)=C1c2ccccc2 6.270+7 O=C2c3ccccc3NC(C)=C2Cc1ccccc1 5.207+8 O=C1c2ccccc2NC(C)=C1Br 7.110-9 O=C1c2ccccc2NC(C)=C1\C=C\CCCCCCC 7.824+10 C=C(CCCCCCC)C=1C(=O)c2ccccc2NC=1C 7.472+12 O=C2c3ccccc3NC(C)=C2/C=C/c1ccccc1 5.827+13 COc1ccc2NC(C)=C(Br)C(=O)c2c1 5.934-14 Cc1ccc2NC(C)=C(Br)C(=O)c2c1 6.583#15 Brc1ccc2NC(C)=C(Br)C(=O)c2c1 6.470+17 Fc1ccc2NC(C)=C(Br)C(=O)c2c1 6.903+18 Clc1ccc2NC(C)=C(C#CCCCC)C(=O)c2c1 4.336#19 COc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.675-21 COc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 5.859-22 COc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.295-23 COc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 6.570+24 COc3cccc1c3NC(C)=C(C1=O)c2ccccc2 5.779-25 Clc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.279#26 Clc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 5.485#28 Clc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.324-29 Clc1ccc2NC(C)=C(C(=O)c2c1)c3ccccc3 6.110-30 Clc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 5.731-31 Clc1ccc2NC(C)=C(C(=O)c2c1Cl)c3ccccc3 5.493#33 Clc1cc2NC(C)=C(C(=O)c2c(Cl)c1)c3ccccc3 5.464#34 COc1ccc3c(c1)C(=O)C(Cc2ccccc2)=C(C)N3C 5.094+35 COc1ccc3c(c1)N(C)C(C)=C(Cc2ccccc2)C3=O 5.106+36 Fc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.081+37 Clc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.815+38 Brc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.602#39 Fc1cc2c(cc1OC)NC(C)=C(CC)C2=O 6.793+41 Brc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.440-44 Clc1cc2c(cc1OC)NC(C)=C(C2=O)C3CCCCC3 6.401+45 Clc1cc3c(cc1OC)NC(C)=C(Cc2ccccc2)C3=O 7.164-46 Clc1cc2c(cc1OC)NC(C)=C(C)C2=O 7.564#47 CC(C)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 6.712+48 CC(CC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.199+49 Clc1cc2c(cc1OC)NC(C)=CC2=O 5.731-50 Clc1cc2c(cc1OC)NC(C)=C(C#CCCCC)C2=O 5.376#53 CC(C)(C)OC(=O)/C=C/C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.271

Each compound should be represented by (1) The type=[+,-,#]; (2) The ID: it can be CAS (chemical abstract service) or a number;(3) SMILES; and (4) Endpoint value.

“+” is indicator of sub-training set;“-” is indicator of calibration set;“#” is indicator of test set.

The role of sub-training set is developer of model; The role of calibration set is critic of model; The role of test set is estimator of model.

MyFile.txt

It is a good idea to reserve some substances as "invisible" validation set for final estimation of the model

10

*11 O=C1c2ccccc2NC(C)=C1C\C=C\CCCCCC 6.728

*16 Clc1ccc2NC(C)=C(Br)C(=O)c2c1 6.900

*20 COc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 4.624

*27 Clc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 4.805

*32 Clc1cc2c(cc1Cl)NC(C)=C(C2=O)c3ccccc3 6.456

*40 Clc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.559

*42 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCC)C2=O 8.530

*43 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCCCC)C2=O 8.779

*51 C=C(CCCCC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.830

*52 Clc1cc2c(cc1OC)NC(C)=C(\C=C\CCCCC)C2=O 7.975

Format of file for this validation is the following:

(1)The number of compounds; (2) list of compounds in the above-mentioned format type-ID-SMILES-Endpoint values.

MyInput.txt

In order to start your work you must download CORALSEA.zip from www.insilico.eu/coral When it is done, you must insert folder "CORALSEA" in your computer:

…and insert your data (i.e. “MyTRNCLBTST.txt”) in folder “MyCORALSEA”:

Containing of MyCORALSEA is the following:

In order to carry out QSPR/QSAR analysis of data represented for CLASSIFICATION MODEL one should do the following:

(i) Insert “#TRNCLBTST-1.txt” in

the folder;

(ii)Insert “#Input-1.txt” in the folder.

(iii)Click CORALSEA.exe. “#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB) ,and test(TST) sets#Input.txt is data which are not visible during building up model

It appears in your screen:

Click Button “Load method”…

It appears in your screen:

Insert name “#TRNCLBTST-1.txt” in text box

1

3

2

It appears in your screen:

Click “ SAVE SYSTEM”

It appears in your screen:

Restart program and Click “Load system”

It appears in your screen:

Click “OK”

It appears in your screen:

This plot relates to the external “invisible” validation set

It appears in your screen:

File “#Output-1.txt contains statistical characteristics for the validation set (#Output-1.txt is placed in folder “Model”)

In order to carry out QSPR/QSAR analysis of data represented for REGRESSION MODEL one should do the following:

(i) Insert “#TRNCLBTST.txt” in the

folder;

(ii)Insert “#Input-1.txt” in the folder.

(iii)Click CORALSEA.exe.

“#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB) ,and test(TST) sets#Input.txt is data which are not visible during building up model

It appears in your screen:

Insert name “#TRNCLBTST-1.txt” in text box. After this, please select “Classic Scheme” or “Balance of Correlation” for your QSPR/QSAR investigation

SELECT

INSERT

It appears in your screen:

Two actions: (1) define Method and (2)Save method

1

2

It appears in your screen:

You can involve graph invariants in addition to SMILES attributes

1

2

It appears in your screen:

You can use “classic scheme”, balance of correlations, and Ideal slopes C1,C1’

It appears in your screen:

You can choice your mode e.g. (1) Define Dstart=0.25 ; (2) Nepoch=20; after this you must do(3) Click “Save method”, otherwise method remains the same

1

1

2

3

It appears in your screen:

Click “Search for preferable model (T*,N*)”

It appears in your screen:

Programm will carry out the Monte Carlo optimization with various threshold and the number of epochs. The preferable values of threshold and the number of epochs one can find in file “Search/BestMDL.txt” when the calculation will be completed.

The containing of file “search/BestMDL.txt” will be approximately the following:

One can see that preferable threshold (T*) is 2, and the preferable number of epochs (N*) is 15.One can use this information to build up robust model.

An attempt to build up robust model…

Create Folder “MyCORALSEA-T2-N15” (copy of “MyCORALSEA”)

Run CORALSEA.exe in this folder “MyCORALSEA-T2-N15”

Click “Load method”

It appears in your screen:

(1) Insert Nepoch=15, (2) Click “Building up preferable model (T*,N*)”

T*=2N*=15

(3)Insert Threshold=2, and (4) Click “Continue”

1

2

3

4

It appears in your screen:

Click “Yes”

Gradually the program will be calculating the model :

When the model will be ready the screen will be the following :

Click “Save system”

Folder “Model” contains parameters of the QSPR/QSAR model

File “#Output-1.txt contains statistics for the invisible validation set

When the model will be ready the screen will be the following :

Click “Load system”

It will appear at the screen

(1) Insert name “MyInput.txt” instead of “#Input-1.txt”

(2) Click “Start of DCW and Endpoint calculation for SMILES input file”

2 MyInput.txt1

It will appear at the screen

After these actions, file “model/Output.txt” will contain results of calculation for compounds from “MyInput.txt”

Click “OK”

It will appear at the screen

You will see graphical representation for sub-training, calibration, test, and validation sets.

The containing of the “model/Output.txt” will be the following:

Last, but not least…

One can calculate model for individual SMILES

(1) Insert SMILES in indicated box;(2) Click “Start of DCW and Endpoint Calculation for Inserted SMILES”

1

2

It appears in your screen:

See file “Model/DemoDesc.txt”

The Containing of “Model/DemoDesc.txt” is the following:

DCW is DCW(2,15) for NC(CCCNC(N)=N)C(O)=O; Endpoint=2.9412.This example is only demo, the NC(CCCNC(N)=N)C(O)=O is apparently out

of Domain of applicability.

These slides have shown the "technology", but to understand "philosophy", please read file

"ReadMe.pdf"

Some definitions

top related