the model for bioconcentration factor (bcf) in fish · 2009. 3. 30. · the model for...

The model for bioconcentration factor

(BCF) in fish

Alessandra Roncaglioni

Istituto di Ricerche Farmacologiche “Mario Negri”

CAESAR workshop on QSAR models for REACH Mario Negri Institute, Milan, Italy

March 10-11, 2009

Outline

• Bioconcentration factor (BCF)

• BCF & REACH

• Data availability & variability

• Modelling BCF in CAESAR

• Comparison with other approaches

• Applet for the BCF model

• Conclusions

2

Bioconcentration factor (BCF)

Bioconcentration is the uptake of the test substance in an

organism relative to the concentration of test substance in the

surrounding medium leading to an increase in concentration.

BCF = Cf / Cw = k1 / k2

Experimental test preferred standard: OECD 305 (Bioconcentration flow-through fish test)

→ Test duration: 44-116 days

→ Number of likely fish recommended for the test: 132-240 fish

→ Cost for each experiment: 50-100 k€

3

k1 = uptake rate constantk2 = depuration rate constantCf = concentration at steady state conditionsCw = concentration at at steady state conditions

BCF in REACH

• Potential use of BCF information in REACH is for:• C&L• Prioritization (PBT, vPvB)• Chemical Safety Assessment (CSA)

• Quantitative and qualitative (classification) modelling– PBT– vPvB

o B BCF > 2000 L/kg = 3.3 in Log unito vB BCF > 5000 L/kg = 3.7 in Log unit

4

tonn/year C&L B and vB CSA BCF value

> 1 X X> 10 X X X

> 100 X X X X

Experimental variability

• According to Dimitrov et al.1: 0.75 Log units

• Assessed in other compilations:

→ EURAS database2

– Considered the “golden” standard for BCF

– Reliability scores assigned to judge the quality of the experiments

5

1SAR QSAR Environ. Res. 16, 2005, 531-5542http://www.euras.be/eng/project.asp?ProjectId=92

Data without reliablity score

-2.00

-1.00

0.00

1.00

2.00

3.00

4.00

5.00

Substances

Lo

gB

CF

EURAS database2

Data without reliability score

Data with high reliability score (1, 2)

-1.00

0.00

1.00

2.00

3.00

4.00

5.00

6.00

Substances

Lo

gB

CF

EURAS database2

Data with high reliability score (1 & 2)

Experimental variability

• Assessed in other compilations:

→ Canadian database3

– Large compilation of bioaccumulation data

– Reliability scores assigned to judge the quality of the experiments

• but … B – vB range = only 0.4 log units6 3Arnot et al. Environ. Rev. 14, 2006, 257-297

logBCF - OECD fish, all score

-3.00

-2.00

-1.00

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

substances

log

BC

F

logBCF - Oncorhynchus mikiss , score 1

0

1

2

3

4

5

substances

log

BC

F

Canadian databaseall OECD fish; reliability score = 1

Canadian databaser. trout; reliability score = 1

Outline


• BCF & REACH





• Conclusions

7

CAESAR modelling for BCF (I)

• Dataset– Dimitrov et al: data according to official guidelines, widest

collection available (~ 500 compounds)

– Structure check and error pruning (removing ~ 50 comp.)

• Descriptors– 2D descr. & lipophilicity: DRAGON, CODESSA, ACD, Pallas, MDL

– Tautomerism issue (example next slide)

• Models– Descriptor selection: GA, heuristic method

– Classification: AFP

– Quantitative: MLR, NN (SVM, CP, MLP), GMDH

8

Tautomers behaviour in the BCF dataset

9

02468

101214161820222426

Moriguchi LogP KowWin LogP

> 10%

5% to 10%

1% to 5%

< 1%

Lipophilicity descriptor variation

26

447

BCF dataset

Tautomeric forms

Non tautomeric forms

Tautomers behaviour in the BCF dataset

10

02468

101214161820222426

Moriguchi LogP KowWin LogP

> 10%

5% to 10%

1% to 5%

< 1%

Lipophilicity descriptor variation

0.00

0.50

1.00

1.50

2.00

2.50

ID 3

58

ID 3

71

ID 3

72

ID 3

99

ID 4

14

ID 4

29

ID 4

30

ID 4

36

ID 4

43

ID 4

45

ID 4

46

ID 4

49

ID 4

52

ID 4

55

ID 4

64

ID 4

72

ID 4

75

ID 4

85

ID 4

95

ID 5

07

Mean St. Dev.

Predicted values for BCF model (log units)26

447

BCF dataset

Tautomeric forms

Non tautomeric forms

-1

0

1

2

3

4

5

-1 0 1 2 3 4 5

Pre

dic

ted

act

ivit

y

Observed activity

CAESAR modelling for BCF (II)

• Hybrid model– If/then rules in different area of the relation

(increase the slope and reduce Y intercept)

– GMDH – self organization

• Validation– Training / test splitting based on the chemical composition

(atomic fragments)

Training set n = 378 Test set n = 95

– Cross-validation & test set prediction

11

Results of CAESAR modelling

12

Modellingmethod

Nr. ofvariables

Descriptorsused (SW)

Acc./R2

training setAcc./R2 cv (loo)

training setAcc./R2

test set

SAR (AFP) 3Dragon ACD

0.86 0.74 0.81

QSAR (RBF) 5Dragon MDL

0.81 0.78 0.77

QSAR (CP-NN) 8Dragon MDLACD

0.95 0.70 0.76

QSAR (MLP) 5 Dragon 0.80 0.80 0.79

QSAR (GMDH) 4Dragon MDL

0.76 0.76 0.77

HM (2 models) 8Dragon MDL

0.83 0.82 0.80

HM (5 models) 36Dragon MDL KowWin

0.85 0.85 0.80

In common in M1 and M2

Description of selected model

13

• Combination of 2 RBF-NN models with 5 descriptors each

MlogP Moriguchi log of the octanol–water partition coefficient (logP)

BEHp2 Highest eigenvalue n. 2 of Burden matrix/weighted by atomic polarizabilities

AEige Absolute eigenvalue sum from electronegativity weighted distance matrix

GATS5v Geary autocorrelation – lag 5/weighted by atomic van der Waals volumes

Cl-089 Cl attached to C1(sp2)

X0sol Solvation connectivity index chi-0

MATS5v Moran autocorrelation – lag 5/weighted by atomic van der Waals Volumes

SsCl Sum of all (–Cl) E-State values in molecule

In common in M1 and M2


14

• Combination of 2 RBF-NN models with 5 descriptors each

MlogP Moriguchi log of the octanol–water partition coefficient (logP)

BEHp2 Highest eigenvalue n. 2 of Burden matrix/weighted by atomic polarizabilities

AEige Absolute eigenvalue sum from electronegativity weighted distance matrix

GATS5v Geary autocorrelation – lag 5/weighted by atomic van der Waals volumes

Cl-089 Cl attached to C1(sp2)

X0sol Solvation connectivity index chi-0

MATS5v Moran autocorrelation – lag 5/weighted by atomic van der Waals Volumes

SsCl Sum of all (–Cl) E-State values in molecule

-1

0

1

2

3

4

5

-1 0 1 2 3 4 5

Pre

dic

ted

com

bin

ee

dac

tivi

ty

M1, M2 activity

Pred Comb. = 1.05 mean (M1,M2) – 0.065

2.41

1.355

Pred Comb. = 0.996 min (M1,M2) + 0.042

Pred Comb. = 0.936 mean (M1,M2) – 0.123

-2

-1

0

1

2

3

4

5

-2 -1 0 1 2 3 4 5

Pre

dic

ted

logB

CF

valu

es

Experimental LogBCF values

Training Set

Test set


15

Outline


• BCF & REACH





• Conclusions

16

LogP based BCF estimations

17

-2

-1

0

1

2

3

4

5

6

7

-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

Exp

erie

mn

tal L

ogB

CF

Experimental LogP

Correlation between LogP and LogBCF

Canadian db

EURAS

Dimitrov

vBB

LogP based BCF estimations

18

-2

-1

0

1

2

3

4

5

6

7

-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

Exp

erie

mn

tal L

ogB

CF

Experimental LogP

Correlation between LogP and LogBCF

Canadian db

EURAS

Dimitrov

vBB

LogP

≤ 4.5 > 4.5

LogBCF< 3.3 68.98% 14.66%

≥ 3.3 2.26% 14.10%

External validation & EPI Suite comparison

19

EPI Suite v4.0 CAESAR

Training set R2 = 0.72 (451) R2 = 0.83 (368)

Test set R2 = 0.67 (112) R2 = 0.82 (91)

Externalcompounds

R2 = 0.56 (71) R2 = 0.61 (184)

• Experimental BCF values (median) coming from:– Dimitrov et al.

– Canadian

– EURAS

• Tested models– CAESAR BCF model

– EPI Suite v4.0*

• Three series analyzed– Training set

– Test set

– Compounds not contained in the databases used respectively to develop the model

*http://www.epa.gov/opptintr/exposure/pubs/episuitedl.htm

CAESAR model refinement

20

• Rules for identifing compounds associated withgreater uncertainty in CAESAR model predictions

CxHy…Cl6

CxHy…F10

CxHy…Si

CxHy…Sn

Ar-O and Ar-[Br,Cl]3

Ar-(NO2)3

Ar-(tBu)2

N

NAr

ArO

OR1

R2

P

S

O

OS R3

R2

R1

N

R1

OP

O

OS

R2

R3

-1

0

1

2

3

4

5

6

-1 0 1 2 3 4 5 6

Pre

dic

ted

Lo

gBC

F

Experimental LogBCF

Training

Test

External

R2 = 0.84

R2 = 0.81

R2 = 0.71

Classification

21

EPI Suite v4.0Estimated LogBCF

nB B vB

ExperimentalLogBCF

nB 76.03% 2.68% 3.00%

B 4.89% 1.10% 1.26%

vB 2.05% 2.37% 6.62%

CAESAR BCF model(entire dataset)

Estimated LogBCF

nB B vB

ExperimentalLogBCF

nB 79.63% 1.09% 0.47%

B 5.29% 1.40% 0.93%

vB 2.80% 2.02% 6.38%

CAESAR BCF model(pruned dataset)

Estimated LogBCF

nB B vB

ExperimentalLogBCF

nB 83.39% 1.29% 0.37%

B 4.80% 1.66% 0.55%

vB 0.74% 1.48% 5.72%

542 compounds

634 compounds

643 compounds

Outline


• BCF & REACH





• Conclusions

22

CAESAR BCF model applet

23

http://www.caesar-project.eu/

Conclusions

• New integrated models for BCF with better

performance than available methods

• Final model fully implemented and appropriate

documentation (QMRF) ensures transparency and

reproducibility

• Appreciation of similarity and confidence in prediction

• Feasible to use output as a definitive value or in

classification

• Experimental data & quality check

• Use within ITS in collaboration with OSIRIS project24

Acknowledgments

• Lab. team & sw developerso E. Benfenati

o C. Zhao

o E. Boriani

o A. Lombardo

o A. Chana

o O. Schifanella

o C. Milan

o R. Gonella Diaza

o A. Manganaro

o A. Gomez Delgado

o D. Bigoni

o A. Cassano

25

• CAESAR partnerso P2 CSL DEFRA

o P3 BCX

o P4 POLIMI

o P5 KM

o P6 LJMU

o P7 UFZ

o P8 NIC-LJ

o P9 TNO

• OSIRIS project

the model for bioconcentration factor (bcf) in fish · 2009. 3. 30. · the model for...

Documents