mishra m, sachan s, gupta s, nigam rs, gupta sp. …using multiple regression analysis m mishra 1, s...

32
Licensee OA Publishing London 2014. Creative Commons Attribution License (CC-BY) Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. QSTR with topological indices: Modeling of the acute toxicity of phenylsulfonyl carboxylates to vibrio fischeri using multiple regression analysis. OA Drug Design & Delivery 2014 Feb 25;2(1):3. Competing interests: none declared. Conflict of interests: none declared. All authors contributed to conception and design, manuscript preparation, read and approved the final manuscript. All authors abide by the Association for Medical Ethics (AME) ethical rules of disclosure.

Upload: others

Post on 10-Apr-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

Licensee OA Publishing London 2014. Creative Commons Attribution License (CC-BY)

Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. QSTR with topological indices: Modeling of the acute toxicity of phenylsulfonyl carboxylates to vibrio fischeri using multiple regression analysis. OA Drug Design & Delivery 2014 Feb 25;2(1):3.

Com

petin

g in

tere

sts:

non

e de

clar

ed. C

onfli

ct o

f int

eres

ts: n

one

decl

ared

. Al

l aut

hors

con

trib

uted

to c

once

ptio

n an

d de

sign,

man

uscr

ipt p

repa

ratio

n, re

ad a

nd a

ppro

ved

the

final

man

uscr

ipt.

Al

l aut

hors

abi

de b

y th

e As

soci

atio

n fo

r Med

ical

Eth

ics (

AME)

eth

ical

rule

s of d

isclo

sure

.

Page 2: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

1

Section: Drug Structure-Activity Relationships

QSTR with topological indices: Modeling of the acute toxicity of phenylsulfonyl

carboxylates to vibrio fischeri using multiple regression analysis

M Mishra1, S Sachan2, S Gupta3, RS Nigam4, SP Gupta5*

1 Department of Chemistry, Govt. Auto. P. G. College, Satna-485001, India

2 Department of Chemistry, Govt. New Science College, Rewa-486001, India.

3Department of Chemistry, A.P.S. University, Rewa -486001, India

4Rajiv Gandhi College, Sherganj, Panna Road, Satna (MP)-485001

5Rajiv Gandhi Institute of Pharmacy, Sherganj, Panna Road, Satna (MP)-485001

*E-Mail: [email protected]

Abstract

The present paper deals with modeling of the acute toxicity of 56 phenylsulfonyl

carboxylates to Vibrio fischeri. Multiple regression analysis (MLR) has been used as the data-

processing step for the selection of independent variables. The statistical quality of the best

model (without deleting outliers) using topological & indicator parameters is as follows:

N=56, R=0.8802, AR2=0.7570, MSE=0.0516, F=43.834 & Q=17.0576 and statistical quality

of the best model (after deleting outliers) is as follows: N=53, R=0.9397, AR2=0.8733,

MSE=0.0254, F=90.577 & Q=36.9953. Use of the topological and indicator parameters has

suggested that negative contributions of steric bulk, branching, functionality of C10,

functionality of chloro substituent at X1 position and presence of unsaturation at the

substituent(s) on C10, and positive contributions of functionality of O13 and presence of

substituent’s with electronegative atoms at R2 and R3 positions. Cross-validation analysis of

obtained models has been checked by employing the leave one out (LOO) method.

Page 3: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

2

Keywords: QSAR, QSTR, Topological indices, MLR, LOO.

Introduction

Quantitative structure-activity relationship (QSAR) analysis has become an

indispensable tool in ecotoxicological risk assessments, which are used in formulating

regulatory decisions of environmental protection agencies.1-3 Due to shortage of experimental

data, QSAR estimates for the selection of persistent, bio-accumulative and toxic (PBT)

substances appear as an attractive alternative.4 It has been argued that all new chemicals

should be assessed using a consistent and transparent methodology that uses chemical

property data derived from QSARs, or experimental determination when possible and applies

evaluative or regio-specific environmental models.5 QSAR methods routinely result in

ecotoxicity estimations of acute and chronic toxicity to various organisms, and in fate

estimations of physical/chemical properties, degradation, and bio-concentration.6 It is now

possible to predict accurately potential of organic chemicals to cause diverse effect to a range

of organisms and degrade or partition within the environment.7 QSARs have also been used

in exploring the mechanism of toxic actions of chemicals.8

Many QSAR approaches and statistical methods have been adopted to explore

ecotoxicological modeling of diverse categories of organic compounds. Cui et al have

reported holographic QSAR for toxicity data of 83 benzene derivatives to the autotrophic

Chlorella vulgaris.9 Comparative molecular field analysis (CoMFA) was used to model acute

toxicity of 56 phenylsulfonyl carboxylates on vibrio fischeri.10 The acute toxicity data of 20

alpha-substituted phenylsulfonyl acetates against Daphnia magna was modeled using

theoretical linear solvation energy relationships and charge model descriptors.11 The joint

toxicity of 2,4-dinitrotoluene with aromatic compounds with vibrio fischeri was subjected to

QSAR study using the energy of lowest unoccupied molecular orbital.12 Partial least squares

and multiple regression analyses were used for modeling toxicity of aromatic compounds to

Page 4: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

3

Chlorella vulgaris.13 Different classification techniques were applied on 235 pesticides using

153 descriptors by Mazzatorta et al. for the toxicity prediction.14

Recently the present group of authors have introduced topological and indicator

parameters to explored quantitative structure-toxicity relationship of compounds of different

chemical groups. In continuation of such effort, the present paper deals with modeling of the

acute toxicity of phenylsulfonyl carboxylates to vibrio fischeri. Aromatic sulfones being

extensively used as intermediates in the manufacture of pesticides, herbicides and

anthelmentics and also as floatation agents and extractants in the petrochemical and

metallurgical industries, modeling QSTR of these compounds appears to be of timely need in

order to predict the ecological effects of the compounds in case of their accidental discharge.

Methodology

Calculation of Molecular Descriptors

Experimentally observed toxicity (pC) of phenylsulfonyl carboxylates to Vibrio

fischeri for 56 substituted aromatic sulfones have been collected by literature.15 We calculate

manually the various molecular descriptors such as Weiner index (W), Path number (P2, P3

& P3-P2), Equalized electronegativity (χeq), Molecular Redundancy Index (MRI),

Negentrophy (N), Szeged index (Sz), and Molecular Id number (Id) at the basis of a fully

optimization of the molecular geometry. During the course of regression analysis we

observed the need for indicator parameter for obtaining better result. Therefore we have used

five indicator parameters:

* IP1=1 If aromatic ring not present otherwise 0.

* IP2=1 If (CH2)2 present at R2 position otherwise 0.

* IP3=1 If R1=CH3, R2=(CH2)2 otherwise 0.

* IP4=1 If NO2 present otherwise 0.

Page 5: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto
Page 6: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

5

Structural Features Toxicity to

Vibrio

fischeri (pC)

Sl. R1 R2 R3 X1 X2 pC

1 CH3 -(CH2)2- H H 2.28

2 CH3 -(CH2)3- H H 2.12

3 CH3 -(CH2)4- H H 1.91

4 CH3 -(CH2)5- H H 1.81

5 CH3 -(CH2)2- H NO2 2.12

6 CH(CH3)2 -(CH2)2- H NO2 1.78

7 CH(CH3)2 -(CH2)3- H NO2 1.81

8 CH(CH3)2 -(CH2)5- H NO2 1.45

9 CH(CH3)2 -(CH2)6- H NO2 1.05

10 CH3 -(CH2)2- H Br 1.89

11 CH3 -(CH2)3- H Br 1.76

12 CH3 -(CH2)4- H Br 1.60

13 CH3 -(CH2)5- H Br 1.31

14 CH3 -(CH2)2- H Cl 1.96

15 CH3 -(CH2)3- H Cl 1.92

16 CH(CH3)2 -(CH2)2- H Cl 1.86

17 CH2(CH2)2CH3 -(CH2)2- H Cl 1.70

18 CH(CH3)2 -(CH2)4- H Cl 1.51

19 CH(CH3)2 -(CH2)5- H Cl 1.32

20 CH(CH3)2 -(CH2)6- H Cl 0.90

Page 7: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

6

21 CH(CH3)2 -(CH2)2- H CH3 1.96

22 CH(CH3)2 -(CH2)3- H CH3 1.46

23 CH3 -(CH2)2- H CH3 2.22

24 CH2CH3 -(CH2)2- H CH3 1.92

25 CH2CH3 -(CH2)3- H CH3 1.68

26 CH(CH3)2 -(CH2)4- H CH3 1.22

27 CH(CH3)2 -(CH2)5- H CH3 1.09

28 CH3 -(CH2)5- H CH3 1.40

29 CH3 H H H NO2 1.29

30 CH(CH3)2 H H H NO2 1.29

31 CH3 H H Cl NO2 0.44

32 CH(CH3)2 H H Cl NO2 1.13

33 CH3 H H NO2 H 1.49

34 CH(CH3)2 H H NO2 H 1.34

35 CH3 H H NO2 Cl 1.33

36 CH(CH3)2 H H NO2 Cl 1.45

37 CH3 H CH3 H NO2 1.48

38 CH3 CH3 CH3 H NO2 1.42

39 CH3 CH2CH3 CH2CH3 H NO2 1.36

40 CH3 CH2(CH2)2CH3 CH2(CH2)2CH3 H NO2 1.10

41 CH3 CH2Ph CH2Ph H NO2 0.60

42 CH2CH3 CH2(CH2)2CH3 CH2(CH2)2CH3 H NO2 1.08

43 CH2CH3 CH3 CH2Ph H NO2 0.98

44 CH2CH3 CH3 CH2CH=CH2 H NO2 1.12

Page 8: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

7

45 CH2CH3 CH3 CH2-1-Naph H NO2 0.83

46 CH(CH3)2 CH2(CH2)2CH3 CH2(CH2)2CH3 H NO2 1.05

47 Cyclohexyl H CH3 H NO2 1.19

48 CH3 H CH2CO2CH2CH3 H NO2 1.00

49 CH(CH3)2 H CH2CO2CH(CH3)2 H NO2 0.92

50 CH(CH3)2 CH2CO2CH2CH3 CH2CO2CH2CH3 H NO2 0.66

51 CH3 =CHPh H NO2 0.82

52 CH2CH3 =CHPh H NO2 0.75

53 CH(CH3)2 =CHPh H NO2 0.64

54 CH2CH(CH3)2 =CHPh H NO2 0.66

55 CH(CH3)2 =CHPh H CH3 0.89

56 CH(CH3)2 =CHPh H H 0.80

Result and Discussion

In order to understand the experimental toxicity data of 56 compounds on theoretical

basis, we established a quantitative-structure toxicity relationship (QSTR) between their in

vitro toxicity and topological descriptors of the molecules under consideration using multiple

regression analysis (MLR). Developing a QSTR model requires a diverse set of data, and,

thereby a large number of descriptors have to be considered.

Descriptors have numerical values that encode different structural features of the

molecules. Selection of a set of appropriate descriptors from a large number of them requires

a method, which is able to discriminate between the parameters. The different topological

molecular descriptors (Independent variables) Weiner index(W), Path number P2, P3 & P3-

P2, Negentropy (N), Molecular redundancy index (MRI), Molecular equalized

Page 9: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

8

electronegativity (χeq), Szeged index (Sz), Intimation theoretical index (Id) along with

indicator parameters presented in Table 2.

Table 2: Calculated Topological Descriptors & Indicator Parameter

Com

p.

No.

W P2 P3 P3-

P2

χeq N MRI Sz Id IP

1

IP

2

IP

3

IP

4

IP

5

1. 405 56.

00

86.0

0

78.0

0

2.4

02

28.5

04

-

0.01

2

553.0

0

-

46.639

1 1 1 0 1

2. 470 48.

00

90.0

0

84.0

0

2.4

56

21.1

49

-

0.14

8

656.0

0

-

50.422

1 0 0 0 0

3. 537 56.

00

96.0

0

80.0

0

2.3

67

33.4

56

-

0.01

0

771.0

0

-

54.263

1 0 0 0 0

4. 622 58.

00

105.

00

94.0

0

2.3

55

35.4

83

-

0.02

6

910.0

0

-

58.161

1 0 0 0 0

5. 693 56.

00

102.

00

92.0

0

2.4

82

34.5

00

-

0.10

2

964.0

0

-

58.735

1 1 1 1 0

6. 931 62.

00

105.

50

87.0

0

2.4

34

42.3

36

-

0.11

3

1242.

00

-

66.978

1 1 0 1 0

Page 10: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

9

7. 103

4

66.

00

114.

00

96.0

0

2.4

16

45.3

18

-

0.10

2

1378.

00

-

71.029

1 0 0 1 0

8. 125

6

74.

00

129.

00

120.

00

2.3

88

50.2

65

-

0.07

1

1780.

00

-

79.269

1 0 0 1 0

9. 139

8

76.

00

138.

50

131.

00

2.3

76

52.4

64

-

0.05

5

1986.

00

-

83.454

1 0 0 1 0

10 489 50.

00

90.0

0

80.0

0

2.4

08

29.6

72

-

0.15

6

678.0

0

-

50.642

1 1 1 0 0

11. 562 54.

00

96.0

0

84.0

0

2.4

05

33.2

63

-

0.04

9

806.0

0

-

54.551

1 0 0 0 0

1

2.

637 60.

00

102.

00

86.0

0

2.3

87

35.6

66

-

0.03

2

922.0

0

-

58.448

1 0 0 0 0

13. 714 61.

00

111.

00

98.0

0

2.3

73

37.6

29

-

0.01

1

1076.

00

-

62.399

1 0 0 0 0

14. 489 50.

00

90.0

0

78.0

0

2.4

32

30.6

60

-

0.06

6

738.0

0

-

50.711

1 1 1 0 0

15. 562 53. 96.0 86.0 2.4 33.2 - 794.0 - 1 0 0 0 0

Page 11: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

10

00 0 0 09 94 0.04

9

0 54.551

16. 683 55.

00

96.0

0

80.0

0

2.3

91

28.2

90

-

0.10

8

968.0

0

-

58.736

1 1 0 0 0

17. 832 56.

00

99.0

0

86.0

0

2.3

76

40.3

67

-

0.05

8

1135.

00

-

62.399

1 1 0 0 0

18. 855 65.

00

108.

00

77.0

0

2.3

63

42.4

40

-

0.03

8

1176.

00

-

66.690

1 0 0 0 0

19. 963 68.

00

127.

00

98.0

0

2.3

53

46.3

11

-

0.04

7

1350.

00

-

70.742

1 0 0 0 0

20. 107

4

70.

00

127.

00

115.

00

2.3

43

46.9

66

-

0.01

3

1524.

00

-

74.839

1 0 0 0 0

21. 683 56.

00

94.5

0

77.0

0

2.3

55

41.5

88

-

0.12

4

920.0

0

-

48.251

1 1 0 0 0

22. 768 60.

00

100.

50

81.0

0

2.3

44

103.

28

-

0.98

8

1032.

00

-

62.688

1 0 0 0 0

23. 486 50.

00

90.0

0

71.0

0

2.3

83

34.9

68

-

0.08

738.0

0

-

50.711

1 1 1 0 1

Page 12: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

11

6

24. 585 52.

00

91.5

0

82.0

0

2.3

67

38.6

92

-

0.09

0

852.0

0

-

54.551

1 1 0 0 0

25. 664 56.

00

99.0

0

82.0

0

2.3

55

41.5

51

-

0.07

8

912.0

0

-

58.449

1 0 0 0 0

26. 855 66.

00

108.

00

82.0

0

2.2

35

47.2

14

-

0.06

0

1176.

00

-

66.690

1 0 0 0 0

27. 963 68.

00

117.

00

98.0

0

2.3

27

49.5

88

-

0.04

7

1350.

00

-

70.742

1 0 0 0 0

28. 730 62.

00

111.

00

98.0

0

2.3

44

42.3

20

-

0.03

6

1076.

00

-

62.399

1 0 0 0 0

29. 550 48.

00

72.0

0

48.0

0

2.5

16

29.3

28

-

0.09

0

748.0

0

-

50.745

1 0 0 1 0

30. 768 54.

00

78.0

0

50.0

0

2.4

54

35.6

80

-

0.07

6

1002.

00

-

58.764

1 0 0 1 0

31. 617 49.

00

81.0

0

60.0

0

2.5

51

31.3

30

-

0.14

5

859.0

0

-

54.869

1 0 0 1 0

Page 13: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

12

32. 852 58.

00

87.0

0

58.0

0

2.4

81

37.6

00

-

0.11

6

1111.

00

-

63.000

1 0 0 1 0

33. 502 51.

00

75.0

0

54.0

0

2.5

16

29.3

02

-

0.08

9

652.0

0

-

50.745

1 0 0 1 0

34. 708 54.

00

79.5

0

51.0

0

2.4

54

35.6

16

-

0.07

5

882.0

0

-

58.764

1 0 0 1 0

35. 117

2

52.

00

81.0

0

56.0

0

2.5

51

27.2

22

-

0.03

3

775.0

0

-

54.869

1 0 0 1 0

36. 812 58.

00

87.0

0

52.0

0

2.4

81

37.0

00

-

0.11

6

1031.

00

-

63.000

1 0 0 1 0

37. 621 55.

00

87.0

0

70.0

0

2.4

82

33.1

18

-

0.09

7

837.0

0

-

54.869

1 0 0 1 0

38. 691 58.

00

100.

50

85.0

0

2.4

54

36.1

28

-

0.08

6

928.0

0

-

59.169

1 0 0 1 0

39. 882 62.

00

120.

00

116.

00

2.4

14

44.8

78

-

0.11

5

1152.

00

-

67.118

1 0 0 1 0

40. 141 70. 132. 131. 2.3 55.2 - 1764. - 1 0 0 1 0

Page 14: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

13

7 00 00 00 64 50 0.06

2

00 83.587

41. 244

6

94.

00

162.

00

136.

00

2.4

02

58.8

64

-

0.07

7

3400.

00

-

109.21

1

0 0 0 1 0

42. 156

6

72.

00

135.

00

128.

00

2.3

55

54.5

37

-

0.01

7

1941.

00

-

87.811

1 0 0 1 0

43. 160

7

78.

00

135.

00

126.

00

2.4

07

52.5

60

-

0.10

2

2174.

00

-

87.679

0 0 0 1 0

44. 102

9

63.

00

117.

00

112.

00

2.4

16

48.1

65

-

0.14

8

1317.

00

-

71.168

1 0 0 1 0

45. 232

0

94.

00

163.

50

139.

00

2.4

00

62.9

85

-

0.13

8

3407.

00

-

105.08

9

0 0 0 1 0

46. 174

7

76.

00

136.

50

121.

00

2.3

47

58.0

72

-

0.02

1

2122.

00

-

92.362

1 0 0 1 0

47. 127

6

68.

00

108.

00

80.0

0

2.4

01

46.2

42

-

0.06

2

1762.

00

-

74.722

1 0 0 1 0

48. 121

2

65.

00

109.

50

89.0

0

2.4

73

48.7

54

-

0.17

1518.

00

-

75.145

1 0 0 1 0

Page 15: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

14

9

49. 164

7

74.

00

115.

50

83.0

0

2.4

20

51.6

06

-

0.05

9

2047.

00

-

88.268

1 0 0 1 0

50. 247

6

88.

00

154.

50

133.

00

2.4

25

65.2

40

-

0.09

4

2926.

00

-

110.35

4

1 0 0 1 0

51. 134

4

70.

00

114.

00

88.0

0

2.4

60

38.7

39

-

0.02

9

1817.

00

-

78.864

0 0 0 1 0

52. 148

6

73.

00

114.

00

82.0

0

2.4

40

42.8

40

-

0.04

4

2042.

00

-

83.048

0 0 0 1 0

53. 166

2

75.

00

118.

50

87.0

0

2.4

23

45.9

76

-

0.04

2

2229.

00

-

87.562

0 0 0 1 0

54. 187

0

78.

00

123.

00

58.0

0

2.3

99

49.4

96

-

0.04

6

2464.

00

-

91.931

0 0 0 1 0

55. 130

6

70.

00

106.

50

73.0

0

2.3

57

43.0

32

-

0.01

3

1733.

00

-

78.864

0 0 0 0 0

56. 130

4

70.

00

108.

00

76.0

0

2.3

88

41.9

02

-

0.01

4

1733.

00

-

78.864

0 0 0 0 0

Page 16: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

15

Table 3: Correlation Matrix

pC W P2 P3 P3-P2 χeq Sz Id IP1 IP2 IP3 IP5

pC 1

W -0.7465 1

P2 -0.6892 0.9392 1

P3 -0.5342 0.8447 0.9302 1

P3-P2 -0.2591 0.5904 0.6986 0.8913 1

Page 17: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

16

In the present paper, a data set of 56 compounds have subjected to multiple regression

analysis for model generation. Preliminary analysis has carried out in terms of correlation

analysis (Table 3). Maximum correlation has been obtained between pC and Id (0.7555). The

high interrelationship has observed between W and Sz (r=0.9772) as well as low

interrelationship has been observed between χeq and IP3 (r=0.0717). The correlation matrix

indicated the predominance of topological parameters in describing the acute toxicity of

phenylsulfonyl carboxylates to Vibrio fischeri

It is well known that there are three important components in any QSAR study:

1. Development of models,

2. Validation of models and

3. Utility of developed models.

Validation is a crucial aspect of any QSAR analysis. The statistical quality of the resulting

models, as depicted in Table 4, has determined by R2 (Regression Coefficient), M.S.E. (Mean

Square Error), F-ratio and Q=R/MSE (Quality Factor). It is noteworthy that all these

equations have derived using the entire data set of compounds (N=56). We performed single

linear regression analysis after that multiple regression analysis and after performing

χeq -0.0758 -0.0975 -0.3134 -0.4235 -0.4368 1

Sz -0.7284 0.9772 0.9676 0.8828 0.6317 -0.1722 1

Id 0.7555 -0.9747 -0.9679 -0.8984 -0.6608 0.2064 -0.9782 1

IP1 0.5601 -0.6005 -0.5919 -0.4161 -0.1463 -0.0021 -0.6497 0.5767 1

IP2 0.6222 -0.3589 -0.3814 -0.2941 -0.1424 -0.0416 -0.3361 0.4027 0.2041 1

!P3 0.5035 -0.3128 -0.3106 -0.2406 -0.1139 0.0717 -0.2944 0.3373 0.1372 0.6715 1

IP5 0.3751 -0.2178 -0.1803 -0.1814 -0.1129 -0.0521 -0.2068 0.2419 0.0842 0.4127 0.6146 1

Page 18: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

17

regression analysis; we have adopted maximum-R2 method and followed stepwise regression

analysis. The results have shown that for the set of 56 compounds mono-parametric

regressions starts giving statistically significant model.

Single Linear Regression Analysis

Though single regression analysis, many regression equations have been obtained, but we can

find that, for single regression analysis, three equations have satisfactory with R2 larger than

0.5. These regression equations have listed in Table 4.

In these three equations (Eqn. 1, 2 and 3) highest value of R2 is obtained with Intimation

theoretical index, Id (Eqn.1), this means Id is the largest correlated descriptors with pC than

any other descriptors. This correlation is showed in Table 3, and eqn.2 and 3 are less

significant because of low values of R2, AR2, F-ratio and Q-test.

ONE-PARAMETRIC MODEL

pC=2.8659+0.0220(±0.0026)Id …………………..(1)

N=56, R2=0.5708, AR2=0.5629, MSE=0.0929, F=71.829, Q=8.1325

pC=2.0476 -0.0007(±0.0001)W …..……………….(2)

N=56, R2=0.5574, AR2=0.5492, MSE=0.0958, F=68.002,

Q=7.7932

pC=2.0428-0.0005(±0.0001)Sz …………………..(3)

N=56, R2=0.5306, AR2=0.5219, MSE=0.1016, F=61.049,

Q=7.1695

Multilinear Regression Analysis

In order to improve the quality of QSTR model, multiple linear regression analysis has been

performed. As we know, models with variables correlated with each other have of no

significance. Successive regression analysis resulted into several binary combinations of

Page 19: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

18

Id with the Path number P3, P3-P2, and with indicator parameter IP3 and IP2 used. The best

bi-parametric model contained Id and IP2 (Eqn.7).

TWO-PARAMETRIC MODEL

pC=2.4289+0.0166(±0.0039)P3+0.0417(±0.0052)Id …………………..(4)

N=56, R2=0.6792, AR2=0.6671, MSE=0.0707,

F=56.107, Q=11.6568

pC=2.6947+0.0083(±0.0020)P3-P2+0.0303(±0.0031)Id …………………..(5)

N=56, R2=0.6732, AR2=0.6608, MSE=0.0720,

F=54.585, Q=11.3957

pC=2.6370+0.0193(±0.0026)Id+0.4495(±0.1402)IP3 …………………..(6)

N=56, R2=0.6406, AR2=0.6270, MSE=0.0793,

F=47.230, Q=10.0931

pC=2.4803+0.0176(±0.0024)Id+0.4527(±0.0994)IP2 …………………..(7)

N=56, R2=0.6915, AR2=0.6798, MSE=0.0681,

F=59.391, Q=12.2109

Here, In all these four models, Intimation theoretical index have positive coefficient, and

therefore, with increasing the value of Id, toxicity also increases. The regression parameters

and the quality of model expressed by Eqn. 7, which indicates that addition of IP2, improves

the value of variance (R2) increases from 0.53 to 0.69.

In best tri-parametric equation contains the following independent variables: P3, Id

and IP2.

THREE-PARAMETRIC MODEL

Page 20: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

19

pC=2.0435+0.0259(±0.0123)P2+0.0351(±0.0087)Id+0.4453(±0.0965)IP2

…………..…..(8)

N=56, R2=0.7156, AR2=0.6992, MSE=0.0639, F=43.606,

Q=13.2384

pC=2.3912+0.0069(±0.0018)P3-P2+0.0250(±0.0029)Id+0.3902(±0.0900)IP2

….…………..(9)

N=56, R2=0.7599, AR2=0.7461, MSE=0.0540, F=54.866,

Q=16.1431

pC=6.3321-1.5535 (±0.5904)χeq+0.0191(±0.0024)Id+0.4177(±0.0952)IP2

……………..(10)

N=56, R2=0.7277, AR2=0.7120, MSE=0.0612, F=46.327,

Q=13.9388

pC=2.0285+0.0141(±0.0952)Id+0.2522(±0.1120)IP1+0.4608(±0.0959)IP2

…………………..(11)

N=56, R2=0.7189, AR2=0.7029, MSE=0.0632, F=44.323,

Q=13.4158

pC=2.1626+0.0140(±0.0034)P3+0.0348(±0.0047)Id+0.3915(±0.0885)IP2 ……..(12)

N=56, R2=0.7669, AR2=0.7534, MSE=0.0524, F=57.015,

Q=16.7124

All regression coefficients have positive sign which indicates that with increasing the value

of coefficient of P3, Id and IP2, toxicity also increases. In our best tri-parametric model the

regression parameters and the quality of model expressed by Eqn. 12, which indicates that

Page 21: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

20

addition of the path number P3, significantly improves the correlation coefficient and R2

increases from 0.69 to 0.76. Also, the quality factor Q increases from 12.2109 to 16.7124.

When Intimation theoretical index, Path number and indicator parameters have been tried, a

four parametric model is obtained. The best 4-parametric model (Eqn. 15) contains P3, Id, IP1

& IP2. The adjusted R2 and value of quality factor have in favour of this combination, slightly

improvement has been observed in the variance.

FOUR-PARAMETRIC MODEL

pC=1.8628+0.0140(±0.0034)P3-0.0002(±0.0002)Sz+0.0266(±0.0111)Id+0.4144(±0.0111)IP2

…………..(13)

N=56, R2=0.7699, AR2=0.7518, MSE=0.0528, F=42.652,

Q=16.7124

pC=2.1498+0.0139(±0.0034)P3+0.0345(±0.0047)Id+0.3558(±0.0047)IP2+0.1980(±0.1815)I

P5……………(14)

N=56, R2=0.7722, AR2=0.7543, MSE=0.0522, F=43.215,

Q=16.8343

pC=1.9418+0.0126(±0.0036)P3+0.0311(±0.0054)Id+0.1409(±0.0054)IP1+0.4021(±0.0883)I

P2…………..(15)

N=56, R2=0.7747, AR2=0.7570, MSE=0.0516, F=43.834,

Q=17.0576

From residual report of model no. 15, we find high residual value for three compounds (nos.

7, 20 & 31), when these compounds deleted as a outlier we obtained our best tetra-parametric

model (Eqn. 16) which contains P3, Id, IP1 & IP2.

Page 22: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

21

As in case of Eq. 16, high regression coefficient and Q-value supported the validity of

developed QSAR models. The model described by Eq. 16 demonstrated the importance of

different topological and indicator parameter.

FOUR-PARAMETRIC MODEL AFTER DELETION OF COMPOUND NO. 7, 20 &

31

pC=2.0689+0.0113(±0.0025)P3+0.0307(±0.0038)Id+0.1500(±0.0747)IP1+0.3655(±0.0624)I

P2……………(16)

N=53, R2=0.8830, AR2=0.8733, MSE=0.0254, F=90.577,

Q=36.9953

Table 4: Regression Equations

Model

No.

Parameter

Used

Ai ( I

=1,2,3)

Intercept M.S.E. R2 AR2 R F-

Ratio

Q=R/MSE

1 Id A1=0.0220 2.8659 0.0929 0.5708 0.5629 0.7555 71.829 8.1325

2 W A1= -

0.0007

2.0476 0.0958 0.5574 0.5492 0.7466 68.002 7.7932

3 Sz A1= -

0.0005

2.0428 0.1016 0.5306 0.5219 0.7284 61.049 7.1695

4 P3 A1=0.0166 2.4289 0.0707 0.6792 0.6671 0.8241 56.107 11.6568

Id A2=0.0417

5 P3-P2 A1=0.0083 2.6947 0.0720 0.6732 0.6608 0.8205 54.585 11.3957

Page 23: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

22

Id A2=0.0303

6 Id A1=0.0193 2.6370 0.0793 0.6406 0.6270 0.8004 47.230 10.0931

IP3 A2=0.4495

7 Id A1=0.0176 2.4803 0.0681 0.6915 0.6798 0.8316 59.391 12.2109

IP2 A2=0.4527

8 P2 A1=0.0259 2.0435 0.0639 0.7156 0.6992 0.8459 43.606 13.2384

Id A2=0.0351

IP2 A3=0.4453

9 P3-P2 A1=0.0069 2.3912 0.0540 0.7599 0.7461 0.8717 54.866 16.1431

Id A2=0.0250

IP2 A3=0.3902

10 χeq A1= -

1.5535

6.3321 0.0612 0.7277 0.7120 0.8531 46.327 13.9388

Id A2=0.0191

IP2 A3=0.4177

11 Id A1=0.0141 2.0285 0.0632 0.7189 0.7029 0.8479 44.323 13.4158

IP1 A2=0.2522

IP2 A3=0.4608

12 P3 A1=0.0140 2.1626 0.0524 0.7669 0.7534 0.8757 57.015 16.7124

Id A2=0.0348

IP2 A3=0.3915

13 P3 A1=0.0140 1.8628 0.0528 0.7699 0.7518 0.8757 42.652 16.7124

Sz A2= -

0.0002

Page 24: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

23

Id A3=0.0266

IP2 A4=0.4144

14 P3 A1=0.0139 2.1498 0.0522 0.7722 0.7543 0.8787 43.215 16.8343

Id A2=0.0345

IP2 A3=0.3558

IP5 A4=0.1980

15 P3 A1=0.0126 1.9418 0.0516 0.7747 0.7570 0.8802 43.834 17.0576

Id A2=0.0311

IP1 A3=0.1409

IP2 A4=0.4021

Four-Parametric Model after Deletion of Compound No.7, 20 & 31

Model

No.

Parameter

Used

Ai ( I

=1,2,3)

Intercept M.S.E. R2 AR2 R F-

Ratio

Q=R/MSE

16 P3 A1=0.0126 2.2908 0.0263 0.8787 0.8686 0.9374 86.908 35.6423

Id A2=0.0342

IP2 A3=0.3216

IP5 A4=0.1899

17 P3 A1=0.0113 2.0689 0.0254 0.8830 0.8733 0.9397 90.577 36.9953

Id A2=0.0307

IP1 A3=0.1500

IP2 A4=0.3655

For the testing the validity of the predictive power of selected MLR models the LOO

technique has been used. The developed models have validated by the calculation of

Page 25: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

24

following statistical parameters: PRESS/SSY, SPRESS , R2CV, R2A, PSE (Table 5). These

parameters have been calculated.

PRESS is used to validate a regression model with regards to predictability. The smaller

PRESS is, the better the predictability of the model. Its value being less than SSY Points out

that the model predicts better than chance and can be considered statistically significant. SSY

are the sum of squares associated with the corresponding sources of validation. These values

are in term of dependent variable.

The PRESS value above can be used to compute an R2CV statistic, called R2 cross-

validation, which reflects the prediction ability of the model. This is a good way to validate

the prediction of a regression model without selecting another sample or splitting your data. It

is very possible to have a high R2 and a very low R2CV. When this occurs, it implies that the

fitted model is data dependent. This R2CV ranges from below zero to above one. When

outside the range of zero to one, it is truncated to stay within this range.

Table 5: Cross-Validation Parameters

Mode

l No.

N Parameter PRES

S

SSY PRESS/SS

Y

R2CV AR2 SPRESS PSE

1 56 Id 5.0168 6.673

1

0.7518 0.248

2

0.562

9

0.304

8

0.299

3

2 56 Id, IP2 3.6067 8.083

2

0.4462 0.553

8

0.679

8

0.260

9

0.253

8

3 56 P3, Id, IP2 2.7253 8.964

6

0.3040 0.696

0

0.753

4

0.228

9

0.220

6

4 56 P3, Id ,IP1

,IP2

2.6341 9.055

9

0.2909 0.709

1

0.757

0

0.227

3

0.216

9

Page 26: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

25

For the best model the value of PRESS/SSY should be smaller than 0.4. The value smaller

than 0.1 indicates the excellent model. All these model which are given in Table 5 are good

but in these models model no. 6 is the best. In many cases R2CV and Q=Quality Factor is

taken as a proof of the high predictive ability of QSAR models. A high value of these

statistical characteristic (>0.7) is considered as a proof of the high predictive ability of the

model. Besides high R2CV, a reliable model should be also characterized by a high correlation

coefficient between the predicted and observed toxicities of pollutants from a set of molecules

that was not used to develop the models.

Perusal of Table 5 shows that PSE, i.e. the predictive squared error, can be used successfully

for deciding uncertainty of the prediction. The PSE is found to be the lowest for the model no.

6; showing that this model has excellent correlation as well as predictive potential.

From the data presented in Table 6, it is shown that high agreement between experimental and

predicted toxicity values have been obtained (the residual values are small) indicating the

good predictability of the established models. According to the reference, without the

validation of the QSAR models by using the external set, we could not have come to a right

conclusion about high predictive ability of derived models.

Table 6: Observed (Obs.), Predicted (Pre.) Residual Value Obtained using Equation 16

5 53 P3, Id, IP2,

IP5

1.2636 9.151

0

0.1381 0.861

9

0.868

6

0.162

2

0.154

4

6 53 P3, Id, IP1,

IP2

1.2183 9.196

2

0.1325 0.867

5

0.873

3

0.159

3

0.151

6

Row Actual Predicted Residual

Page 27: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

26

1 2.28 2.144 0.136

2 2.12 1.716 0.404

3 1.91 1.648 0.262

4 1.81 1.629 0.181

5 2.12 1.944 0.176

6 1.78 1.695 0.085

7 1.45 1.212 0.238

8 1.05 1.189 -0.139

9 1.89 2.058 -0.168

10 1.76 1.643 0.117

11 1.60 1.584 0.016

12 1.31 1.562 -0.252

13 1.96 2.059 -0.099

14 1.92 1.644 0.276

15 1.86 1.851 0.009

16 1.70 1.761 -0.061

17 1.51 1.371 0.139

18 1.32 1.479 -0.159

19 1.96 2.193 -0.233

20 1.46 1.409 0.051

21 2.22 2.052 0.168

22 1.92 1.935 -0.015

23 1.68 1.539 0.141

24 1.22 1.353 -0.133

Page 28: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

27

25 1.09 1.343 -0.253

26 1.40 1.558 -0.158

27 1.29 1.476 -0.186

28 1.29 1.266 0.024

29 1.13 1.240 -0.110

30 1.49 1.516 -0.026

31 1.34 1.286 0.054

32 1.33 1.456 -0.126

33 1.45 1.240 0.210

34 1.48 1.525 -0.045

35 1.42 1.548 -0.128

36 1.36 1.522 -0.162

37 1.10 1.097 0.003

38 0.60 0.603 -0.003

39 1.08 0.988 0.092

40 0.98 1.000 -0.020

41 1.12 1.341 -0.221

42 0.83 0.766 0.064

43 1.05 0.847 0.203

44 1.19 1.096 0.094

45 1.00 1.111 -0.111

46 0.92 0.724 0.196

47 0.66 0.467 0.193

48 0.82 1.039 -0.219

Page 29: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto
Page 30: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

29

very good fit with R2=0.8830. It indicates that the model 16 can be successfully applied to

predict the toxicity of these classes of molecules.

The applicability domain of the derived QSAR models has the different substituted

compounds. It is possible because similar molecules can show significantly different

biological toxicities. For these molecules, toxicities are often mispredicted, even when the

overall predictivity of the models is high.

Acknowledgement

The authors are grateful to Prof. Vijay K Agrawal, Director, NITTTR, Bhopal for his

unforgettable support.

Page 31: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

30

References

1. Russom CL, Anderson EB, Greenwood BE, Pilli A. ASTER: an integration of the

AQUIRE data base and the QSAR system for use in ecological risk assessments. Sci.

Total Environ, 1991, 109-110, 667-670.

2. Hulzebos EM, Posthumus R, (Q)SARs: gatekeepers against risk on chemicals?. SAR

QSAR Environ Res, 2003, 14: 285-316.

3. Cronin MT, Jaworska JD, Walker JD, Comber MH, Watts CD, Worth AP, Use of

QSARs in international decision-making frameworks to predict health effects of

chemical substances. Environ. Health Perspect, 2003, 111 (10), 1391-1401.

4. Carlsen L., Walker J D, QSARs for prioritizing PTB substances to promote pollution

prevention. QSAR Comb. Sci, 2003, 22: 49-57.

5. Mackay D, Webster E, A perspective on environmental models and QSARs. SAR

QSAR Environ Res, 2003, 14 (1): 7-16.

6. Zeeman M, Auer CM, Clements RG, Nabholz JV, Boethling RS, U.S. EPA regulatory

perspectives on the use of QSAR for new and existing chemical evaluations. SAR

QSAR Environ Res, 1995, 3: 179-201.

7. Comber MHI, Walker JD, Watts C, Hermens J, Quantitative structure-activity

relationships for predicting potential ecological hazard of organic chemicals for use in

regulatory risk assessments. Environ. Toxicol. Chem., 2003, 22 (8): 1822-1828.

8. Ren S, Determining the mechanisms of toxic action of phenols to tetrahymena

pyriformis. Environ. Toxicol. Chem, 2002, 17: 119-127.

9. Cui S, Wang X, Liu S, Wang L, Predicting toxicity of benzene derivatives by

molecular hologram derived quantitative structure-activity relationships (QSARS).

SAR QSAR Environ Res, 2003, 14 (3): 223-231.

Page 32: Mishra M, Sachan S, Gupta S, Nigam RS, Gupta SP. …using multiple regression analysis M Mishra 1, S Sachan 2, S Gupta 3, RS Nigam 4, SP Gupta 5* 1 Department of Chemistry, Govt. Auto

31

10. Liu X, Yang Z, Wang L, CoMFA of the acute toxicity of phenylsulfonyl carboxylates

to Vibrio fischeri. SAR QSAR Environ Res. 2003, 14 (3): 183-190.

11. Liu X, Wang B, Huang Z, Han S, Wang L, Acute toxicity and quantitative structure-

activity relationships of alpha-branched phenylsulfonyl acetates to Daphnia magna.

Chemosphere, 2003, 50 (3): 403-408.

12. Yuan X, Lu G, Zhao J, QSAR study on the joint toxicity of 2,4-dinitrotoluene with

aromatic compounds to Vibrio fischeri. J Environ. Sci. Health Part A. Tox. Hazard

Subst. Environ. Eng, 2002, 37 (4): 573-578.

13. Netzeva TI, Dearden JC, Edwards R, Worgan AD, Cronin MT, QSAR analysis of the

toxicity of aromatic compounds to Chlorella vulgaris in a novel short-term assay. J.

Chem Inf. Comput. Sci, 2004, 44 (1): 258-265.

14. Mazzatorta P, Benfenati E, Lorenzini P, Vighi M, QSAR in ecotoxicity: an overview

of modern classification techniques, J. Chem Inf. Comput. Sci, 2004, 44 (1): 105-

112.

15. Roy K, Ghosh G, QSTR with Extended Topochemical Atom Indices. Modeling of the

Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri Using Principal

Component Factor Analysis and Principal Component Regression Analysis, QSAR

Comb. Sci, 2004, 23: 526-535.