the model for bioconcentration factor (bcf) in fish · 2009. 3. 30. · the model for...
TRANSCRIPT
-
The model for bioconcentration factor
(BCF) in fish
Alessandra Roncaglioni
Istituto di Ricerche Farmacologiche “Mario Negri”
CAESAR workshop on QSAR models for REACH Mario Negri Institute, Milan, Italy
March 10-11, 2009
-
Outline
• Bioconcentration factor (BCF)
• BCF & REACH
• Data availability & variability
• Modelling BCF in CAESAR
• Comparison with other approaches
• Applet for the BCF model
• Conclusions
2
-
Bioconcentration factor (BCF)
Bioconcentration is the uptake of the test substance in an
organism relative to the concentration of test substance in the
surrounding medium leading to an increase in concentration.
BCF = Cf / Cw = k1 / k2
Experimental test preferred standard: OECD 305 (Bioconcentration flow-through fish test)
→ Test duration: 44-116 days
→ Number of likely fish recommended for the test: 132-240 fish
→ Cost for each experiment: 50-100 k€
3
k1 = uptake rate constantk2 = depuration rate constantCf = concentration at steady state conditionsCw = concentration at at steady state conditions
-
BCF in REACH
• Potential use of BCF information in REACH is for:• C&L• Prioritization (PBT, vPvB)• Chemical Safety Assessment (CSA)
• Quantitative and qualitative (classification) modelling– PBT– vPvB
o B BCF > 2000 L/kg = 3.3 in Log unito vB BCF > 5000 L/kg = 3.7 in Log unit
4
tonn/year C&L B and vB CSA BCF value
> 1 X X> 10 X X X
> 100 X X X X
-
Experimental variability
• According to Dimitrov et al.1: 0.75 Log units
• Assessed in other compilations:
→ EURAS database2
– Considered the “golden” standard for BCF
– Reliability scores assigned to judge the quality of the experiments
5
1SAR QSAR Environ. Res. 16, 2005, 531-5542http://www.euras.be/eng/project.asp?ProjectId=92
Data without reliablity score
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
Substances
Lo
gB
CF
EURAS database2
Data without reliability score
Data with high reliability score (1, 2)
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
6.00
Substances
Lo
gB
CF
EURAS database2
Data with high reliability score (1 & 2)
-
Experimental variability
• Assessed in other compilations:
→ Canadian database3
– Large compilation of bioaccumulation data
– Reliability scores assigned to judge the quality of the experiments
• but … B – vB range = only 0.4 log units6 3Arnot et al. Environ. Rev. 14, 2006, 257-297
logBCF - OECD fish, all score
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
substances
log
BC
F
logBCF - Oncorhynchus mikiss , score 1
0
1
2
3
4
5
substances
log
BC
F
Canadian databaseall OECD fish; reliability score = 1
Canadian databaser. trout; reliability score = 1
-
Outline
• Bioconcentration factor (BCF)
• BCF & REACH
• Data availability & variability
• Modelling BCF in CAESAR
• Comparison with other approaches
• Applet for the BCF model
• Conclusions
7
-
CAESAR modelling for BCF (I)
• Dataset– Dimitrov et al: data according to official guidelines, widest
collection available (~ 500 compounds)
– Structure check and error pruning (removing ~ 50 comp.)
• Descriptors– 2D descr. & lipophilicity: DRAGON, CODESSA, ACD, Pallas, MDL
– Tautomerism issue (example next slide)
• Models– Descriptor selection: GA, heuristic method
– Classification: AFP
– Quantitative: MLR, NN (SVM, CP, MLP), GMDH
8
-
Tautomers behaviour in the BCF dataset
9
02468
101214161820222426
Moriguchi LogP KowWin LogP
> 10%
5% to 10%
1% to 5%
< 1%
Lipophilicity descriptor variation
26
447
BCF dataset
Tautomeric forms
Non tautomeric forms
-
Tautomers behaviour in the BCF dataset
10
02468
101214161820222426
Moriguchi LogP KowWin LogP
> 10%
5% to 10%
1% to 5%
< 1%
Lipophilicity descriptor variation
0.00
0.50
1.00
1.50
2.00
2.50
ID 3
58
ID 3
71
ID 3
72
ID 3
99
ID 4
14
ID 4
29
ID 4
30
ID 4
36
ID 4
43
ID 4
45
ID 4
46
ID 4
49
ID 4
52
ID 4
55
ID 4
64
ID 4
72
ID 4
75
ID 4
85
ID 4
95
ID 5
07
Mean St. Dev.
Predicted values for BCF model (log units)26
447
BCF dataset
Tautomeric forms
Non tautomeric forms
-
-1
0
1
2
3
4
5
-1 0 1 2 3 4 5
Pre
dic
ted
act
ivit
y
Observed activity
CAESAR modelling for BCF (II)
• Hybrid model– If/then rules in different area of the relation
(increase the slope and reduce Y intercept)
– GMDH – self organization
• Validation– Training / test splitting based on the chemical composition
(atomic fragments)
Training set n = 378 Test set n = 95
– Cross-validation & test set prediction
11
-
Results of CAESAR modelling
12
Modellingmethod
Nr. ofvariables
Descriptorsused (SW)
Acc./R2
training setAcc./R2 cv (loo)
training setAcc./R2
test set
SAR (AFP) 3Dragon ACD
0.86 0.74 0.81
QSAR (RBF) 5Dragon MDL
0.81 0.78 0.77
QSAR (CP-NN) 8Dragon MDLACD
0.95 0.70 0.76
QSAR (MLP) 5 Dragon 0.80 0.80 0.79
QSAR (GMDH) 4Dragon MDL
0.76 0.76 0.77
HM (2 models) 8Dragon MDL
0.83 0.82 0.80
HM (5 models) 36Dragon MDL KowWin
0.85 0.85 0.80
-
In common in M1 and M2
Description of selected model
13
• Combination of 2 RBF-NN models with 5 descriptors each
MlogP Moriguchi log of the octanol–water partition coefficient (logP)
BEHp2 Highest eigenvalue n. 2 of Burden matrix/weighted by atomic polarizabilities
AEige Absolute eigenvalue sum from electronegativity weighted distance matrix
GATS5v Geary autocorrelation – lag 5/weighted by atomic van der Waals volumes
Cl-089 Cl attached to C1(sp2)
X0sol Solvation connectivity index chi-0
MATS5v Moran autocorrelation – lag 5/weighted by atomic van der Waals Volumes
SsCl Sum of all (–Cl) E-State values in molecule
-
In common in M1 and M2
Description of selected model
14
• Combination of 2 RBF-NN models with 5 descriptors each
MlogP Moriguchi log of the octanol–water partition coefficient (logP)
BEHp2 Highest eigenvalue n. 2 of Burden matrix/weighted by atomic polarizabilities
AEige Absolute eigenvalue sum from electronegativity weighted distance matrix
GATS5v Geary autocorrelation – lag 5/weighted by atomic van der Waals volumes
Cl-089 Cl attached to C1(sp2)
X0sol Solvation connectivity index chi-0
MATS5v Moran autocorrelation – lag 5/weighted by atomic van der Waals Volumes
SsCl Sum of all (–Cl) E-State values in molecule
-1
0
1
2
3
4
5
-1 0 1 2 3 4 5
Pre
dic
ted
com
bin
ee
dac
tivi
ty
M1, M2 activity
Pred Comb. = 1.05 mean (M1,M2) – 0.065
2.41
1.355
Pred Comb. = 0.996 min (M1,M2) + 0.042
Pred Comb. = 0.936 mean (M1,M2) – 0.123
-
-2
-1
0
1
2
3
4
5
-2 -1 0 1 2 3 4 5
Pre
dic
ted
logB
CF
valu
es
Experimental LogBCF values
Training Set
Test set
Description of selected model
15
-
Outline
• Bioconcentration factor (BCF)
• BCF & REACH
• Data availability & variability
• Modelling BCF in CAESAR
• Comparison with other approaches
• Applet for the BCF model
• Conclusions
16
-
LogP based BCF estimations
17
-2
-1
0
1
2
3
4
5
6
7
-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
Exp
erie
mn
tal L
ogB
CF
Experimental LogP
Correlation between LogP and LogBCF
Canadian db
EURAS
Dimitrov
vBB
-
LogP based BCF estimations
18
-2
-1
0
1
2
3
4
5
6
7
-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
Exp
erie
mn
tal L
ogB
CF
Experimental LogP
Correlation between LogP and LogBCF
Canadian db
EURAS
Dimitrov
vBB
LogP
≤ 4.5 > 4.5
LogBCF< 3.3 68.98% 14.66%
≥ 3.3 2.26% 14.10%
-
External validation & EPI Suite comparison
19
EPI Suite v4.0 CAESAR
Training set R2 = 0.72 (451) R2 = 0.83 (368)
Test set R2 = 0.67 (112) R2 = 0.82 (91)
Externalcompounds
R2 = 0.56 (71) R2 = 0.61 (184)
• Experimental BCF values (median) coming from:– Dimitrov et al.
– Canadian
– EURAS
• Tested models– CAESAR BCF model
– EPI Suite v4.0*
• Three series analyzed– Training set
– Test set
– Compounds not contained in the databases used respectively to develop the model
*http://www.epa.gov/opptintr/exposure/pubs/episuitedl.htm
-
CAESAR model refinement
20
• Rules for identifing compounds associated withgreater uncertainty in CAESAR model predictions
CxHy…Cl6
CxHy…F10
CxHy…Si
CxHy…Sn
Ar-O and Ar-[Br,Cl]3
Ar-(NO2)3
Ar-(tBu)2
N
NAr
ArO
OR1
R2
P
S
O
OS R3
R2
R1
N
R1
OP
O
OS
R2
R3
-1
0
1
2
3
4
5
6
-1 0 1 2 3 4 5 6
Pre
dic
ted
Lo
gBC
F
Experimental LogBCF
Training
Test
External
R2 = 0.84
R2 = 0.81
R2 = 0.71
-
Classification
21
EPI Suite v4.0Estimated LogBCF
nB B vB
ExperimentalLogBCF
nB 76.03% 2.68% 3.00%
B 4.89% 1.10% 1.26%
vB 2.05% 2.37% 6.62%
CAESAR BCF model(entire dataset)
Estimated LogBCF
nB B vB
ExperimentalLogBCF
nB 79.63% 1.09% 0.47%
B 5.29% 1.40% 0.93%
vB 2.80% 2.02% 6.38%
CAESAR BCF model(pruned dataset)
Estimated LogBCF
nB B vB
ExperimentalLogBCF
nB 83.39% 1.29% 0.37%
B 4.80% 1.66% 0.55%
vB 0.74% 1.48% 5.72%
542 compounds
634 compounds
643 compounds
-
Outline
• Bioconcentration factor (BCF)
• BCF & REACH
• Data availability & variability
• Modelling BCF in CAESAR
• Comparison with other approaches
• Applet for the BCF model
• Conclusions
22
-
CAESAR BCF model applet
23
http://www.caesar-project.eu/
-
Conclusions
• New integrated models for BCF with better
performance than available methods
• Final model fully implemented and appropriate
documentation (QMRF) ensures transparency and
reproducibility
• Appreciation of similarity and confidence in prediction
• Feasible to use output as a definitive value or in
classification
• Experimental data & quality check
• Use within ITS in collaboration with OSIRIS project24
-
Acknowledgments
• Lab. team & sw developerso E. Benfenati
o C. Zhao
o E. Boriani
o A. Lombardo
o A. Chana
o O. Schifanella
o C. Milan
o R. Gonella Diaza
o A. Manganaro
o A. Gomez Delgado
o D. Bigoni
o A. Cassano
25
• CAESAR partnerso P2 CSL DEFRA
o P3 BCX
o P4 POLIMI
o P5 KM
o P6 LJMU
o P7 UFZ
o P8 NIC-LJ
o P9 TNO
• OSIRIS project