overview of activities within wp 3 development and...
TRANSCRIPT
Overview of activities within WP 3
Development and validation of QSAR models
Overview of activities within WP 3Overview of activities within WP 3
Development and validation of QSAR modelsDevelopment and validation of QSAR models
Paola Paola GramaticaGramatica
WP3 leaderWP3 leader
QSAR Research Unit in Environmental Chemistry and QSAR Research Unit in Environmental Chemistry and EcotoxicologyEcotoxicologyDBSF DBSF --University of University of InsubriaInsubria, Varese , Varese -- ItalyItaly
EE--mail: mail: [email protected]@uninsubria.ithttp://www.qsar.ithttp://www.qsar.it
EmergingEmerging PollutantsPollutants –– CADASTER CADASTER classesclasses
TriazolesTriazoles and and BenzotriazolesBenzotriazoles
(B)(B)TAZsTAZs
PolyBrominatedPolyBrominatedDiPhenylDiPhenyl EthersEthers (PBDE) (PBDE) and and otherother BrominatedBrominatedFlameFlame RetardantsRetardants
((BFRsBFRs))
SubstitutedSubstituted musksmusks//FragrancesFragrances
PolyPoly-- and and PerfluorinatedPerfluorinatedcompoundscompounds ((PFCsPFCs))
WP3 PartnersWP3 WP3 PartnersPartners
• University of Insubria (Italy)- P. Gramatica• Linneus University (Sweden)- T. Oberg• IVL Swedish Environ. Res. Inst. (Sweden)- M. Rahmberg• IDEAConsult Ltd. (Bulgaria) - N. Jeliazkova• Helmholtz Zentrum München (Germany) - I.Tetko
WP3 TasksWP3 WP3 TasksTasks
•Task 3.1 - Preparation of the chemical structures and molecular descriptors
database for the chemicals of the four selected classes.
•Task 3.2 - Evaluation of existing QSARs.
•Task 3.3 - Gap analysis.
•Task 3.4 - Prioritization for experimental tests.
•Task 3.5 - Development of new QSARs.
•Task 3.6 - Development of multi-model approaches (consensus modeling).
•Task 3.7 - External Validation of QSAR models with experimental data from
the Project.
CADASTER CADASTER PartnersPartners at 21at 21°° SETAC meeting SETAC meeting May 2011, Milan (Italy)May 2011, Milan (Italy)
in Environmental Chemistryin Environmental Chemistry
and and EcotoxicologyEcotoxicology
Staff Staff
Prof.Prof. Paola GramaticaPaola Gramatica
Dr. Ester Papa, Dr. Ester Papa, Ph.DPh.D
Dr. Simona KovarichDr. Simona Kovarich
Dr. Mara Dr. Mara LuiniLuini
Dr. Dr. BarunBarun BhhataraiBhhatarai, , Ph.DPh.D
Dr. Dr. JiazhongJiazhong Li, Li, Ph.DPh.D
http://http://www.qsar.itwww.qsar.it
DBSF DBSF -- University of InsubriaUniversity of Insubria
Varese Varese -- ItalyItaly
Task 3.1 Task 3.1 -- Preparation of the chemical structures and molecular Preparation of the chemical structures and molecular descriptors database for the chemicals of the 4 selected classesdescriptors database for the chemicals of the 4 selected classes..
a) Minimization of Chemical Structures by Hyperchem
b) Calculation of DRAGON molecular descriptors
Database
c) Calculation of CADASTER online molecular descriptors
To facilitate the consideration of a QSAR model for regulatory pTo facilitate the consideration of a QSAR model for regulatory purposes, it should urposes, it should be associated with the following informationbe associated with the following information::
�� a defined endpoint a defined endpoint
�� an unambiguous algorithman unambiguous algorithm
�� a defined domain of applicabilitya defined domain of applicability
�� appropriate measures of goodness of fit, robustness and appropriate measures of goodness of fit, robustness and predictivitypredictivity
�� a mechanistic interpretation, if possiblea mechanistic interpretation, if possible
OECD Principles for QSAR models in REACHOECD Principles for QSAR models in REACH
Task 3.2 Task 3.2 -- Evaluation of existing Evaluation of existing QSARsQSARs Task 3.3 Task 3.3 -- Gap analysisGap analysis
--
• In literature, very few local models for the specific CADASTER classes, mainly
not validated and without information on their Applicability Domain (AD)
• Only general models (as EPISUITE), mainly without AD
• Very few experimental data for SIDS end-points, useful for Risk Assessment
• Need for specific local models for CADASTER chemicals
verifying external validation and AD
PrioritizationPrioritization through through structuralstructural similaritysimilarity analysisanalysis and and ranking ranking methodsmethods, , basedbased on on availableavailable toxicitytoxicity datadata
Task 3.4 Task 3.4 -- Prioritization for experimental testsPrioritization for experimental tests..
BFRsprioritization based on toxicityprofile (exp+pred ED potency) (UI), structural diversity (LNU/UI)
BFRsBFRsprioritizationprioritization basedbased on on toxicitytoxicityprofileprofile ((exp+predexp+pred ED ED potencypotency) (UI), ) (UI), structuralstructural diversitydiversity (LNU/UI) (LNU/UI)
PFCsprioritization based on toxicityprofile (exp+pred mammaliantoxicity) (UI), structuralrepresentativity (LNU/UI)
PFCsPFCsprioritizationprioritization basedbased on on toxicitytoxicityprofileprofile ((exp+predexp+pred mammalianmammaliantoxicitytoxicity) (UI), ) (UI), structuralstructuralrepresentativityrepresentativity (LNU/UI(LNU/UI))
FRAGRANCESprioritization based on toxicityprofile (exp+pred mammalian and cyto-toxicity), structural representativity (UI).
FRAGRANCESFRAGRANCESprioritizationprioritization basedbased on on toxicitytoxicityprofileprofile ((exp+predexp+pred mammalianmammalian and and cytocyto--toxicitytoxicity), ), structural structural representativityrepresentativity (UI). (UI).
(B)TAZsprioritization based on toxicityprofile (exp aquatic toxicity) (UI) and structural diversity (UI-LNU).
(B)(B)TAZsTAZsprioritizationprioritization basedbased on on toxicitytoxicityprofileprofile ((expexp aquaticaquatic toxicitytoxicity) (UI) ) (UI) and and structural diversity structural diversity (UI(UI--LNU).LNU).
and and compoundcompound’’s commercial s commercial availabilityavailability..
Task 3.4 : Task 3.4 : Prioritization for experimental tests in the ProjectPrioritization for experimental tests in the Project
PrioritizationPrioritization basedbased on on toxicitytoxicity profileprofile((exp+predexp+pred mammalianmammalian toxicitytoxicity: : ratrat and mouse and mouse oraloral and and InhalationInhalation toxicitytoxicity) )
(UI)(UI)
PFOA and PFOSHPFOA and PFOSHamongamong the more the more toxictoxic
QSAR prediction of QSAR prediction of physicophysico--chemical properties and chemical properties and
biological activitiesbiological activities
Task 3.5 Task 3.5 -- Development of new Development of new QSARsQSARs..
��DevelopmentDevelopment of QSAR of QSAR modelsmodels forfor availableavailable endend--pointspoints ((sufficientsufficient
experimentalexperimental data data forfor modelingmodeling), ), payingpaying attentionattention toto externalexternal validationvalidation
and and structuralstructural applicabilityapplicability domaindomain analysisanalysis..
��IdentificationIdentification of more of more toxictoxic and and environmentalenvironmental hazardoushazardous chemicalschemicals
basedbased on the on the studiedstudied endend--pointspoints
SummarySummary on on BFRsBFRs
XX
LNULNU
ccDragon 1.11Dragon 1.11PLSRPLSRBCFBCF
PublishedPublished + +
CAD. CAD. Dat.Dat.
CADASTER CADASTER
DatabaseDatabase
PublishedPublished
++
CADASTER CADASTER
webweb
++
CADASTER CADASTER
DatabaseDatabase
Data setData set
� �� �
PublishedPublished
++
uploadeduploaded
in in CAD.CAD.
DatabaseDatabase
(243 (243 BFRsBFRs))
PredPred & AD& AD
XX
XX
FateFate
XX
XX
XX
XX
XX
PhysPhys--chemchem
SIDSSIDS
dd--eeDragon 5.5Dragon 5.5MLR/KMLR/K--NN NN
classclassED ED ActivityActivity
((~~20)20)
� �� �MLRMLRHLpHLp (15)(15)
� �� �MLRMLRKpKp (15)(15)
aa--bb� �� �MLR MLR **logKlogKOWOW**(20)(20)
aa--bb� �� �MLR MLR **logKlogKOAOA**(30)(30)
aa� �� �MLR MLR WS WS * * (12)(12)
aa� �� �MLR MLR **VP VP ** (34)(34)
aa--bb� �� �MLR MLR **MP MP ** (25)(25)
aaDragon 5.5Dragon 5.5MLR MLR **H H ** (7)(7)
((**onon--line) UIline) UI((NNobjobj))
RefRefDescDescModel Model D 3.5D 3.5EndpointEndpoint
aa Papa, E.; Papa, E.; KovarichKovarich, S.; , S.; GramaticaGramatica P., P., QSAR Comb. Sci.QSAR Comb. Sci. 2009, 28, 7902009, 28, 790--796.796.bb Papa, E.; Papa, E.; KovarichKovarich, S.; , S.; GramaticaGramatica P., P., Molecular Informatics Molecular Informatics 2011 , 30, 2322011 , 30, 232--240 (Proceedings issue of Euro240 (Proceedings issue of Euro--QSAR2010).QSAR2010).cc ObergOberg T., T., EnvironEnviron Sci & Sci & PollutPollut ResRes 2002, 9, 4052002, 9, 405--411.411.dd Papa, E.; Papa, E.; KovarichKovarich, S.; , S.; GramaticaGramatica, P., P., Chem. Res. , Chem. Res. ToxicolToxicol. . 2010, 23, 9462010, 23, 946--954.954.ee KovarichKovarich, S.; Papa, E.; , S.; Papa, E.; GramaticaGramatica, P., , P., J.J. Hazardous Mat.Hazardous Mat. 2011,190, 1062011,190, 106--112.112.
PredictionPrediction of SIDS of SIDS endpointsendpointsand and characterizationcharacterization of of environmentalenvironmental behaviourbehaviour
* * PredictionsPredictions comparedcompared withwith EPI Suite softwareEPI Suite software
ExpExp and and PredPred data data usedused forforPrioritizationPrioritization (Del 3.4)(Del 3.4)�� experimentalexperimental testingtesting
SummarySummary on on FragrancesFragrances
Data set Data set
publishedpublished + +
CADASTER CADASTER
web / web /
DatabaseDatabase
ExpExp data data
uploadeduploaded in in
CADASTER CADASTER
web / web /
DatabaseDatabase
Data setData set
PublishedPublished
(79 (79 fragfrag))
PredPred & AD& AD
� �� �
� �� �
� �� �
� �� �
Dragon 5.5Dragon 5.5
DescriptorsDescriptors
XX
ToxTox
XX
XX
XX
PhysPhys--chemchem
SIDSSIDS
� �� �CytotoxicityCytotoxicity
((~~20)20)
bb
� �� �
RodentRodent
acute acute
toxicitytoxicity (23)(23)
� �� �logKowlogKow** (52)(52)
� �� �WSWS** (37)(37) aa
MLRMLRVPVP** (37)(37)
((* * onon--line) UIline) UI((NNobjobj))
RefRefModel Model Del 3.5Del 3.5EndpointEndpoint
aa Papa, E.; Papa, E.; LuiniLuini, M.; , M.; GramaticaGramatica, POSTER presented at SETAC, POSTER presented at SETAC--Europe 2009, Europe 2009, GGööteborgteborg, Sweden, 31 May , Sweden, 31 May –– 4 June 2009.4 June 2009.
bb Papa, E.; Papa, E.; LuiniLuini, M.; , M.; GramaticaGramatica, P. , P. SAR QSAR Environ. ResSAR QSAR Environ. Res., 2009, 20, 767., 2009, 20, 767––779.779.
PredictionPrediction of SIDS of SIDS endpointsendpoints and and
characterizationcharacterization of of environmentalenvironmental behaviourbehaviour
* * PredictionsPredictions comparedcompared withwith EPI Suite softwareEPI Suite softwareExpExp and and PredPred data data usedused forforPrioritizationPrioritization (Del 3.4)(Del 3.4)�� experimentalexperimental testingtesting
SummarySummary on on PFCsPFCs
ff
dd--ee
bb
aa
RefRef
PublishedPublished
++
CAD. webCAD. web
++
CAD. CAD.
DatabaseDatabase
Data setData set
PublishedPublished in in
respectiverespective
paperspapers
(221 (221 PFCsPFCs))
PredPred & AD& AD
LNU (PLS) LNU (PLS) refref cc
LNU, IDEA, LNU, IDEA,
HMGUHMGU
ModelsModels byby
WP3 WP3 PartnersPartners
MLR/KMLR/K--NNNN
Drag5.5Drag5.5
MLRMLR--Drag5.5Drag5.5
MLRMLR--Drag5.5Drag5.5
MLRMLR**--Drag5.5Drag5.5
MLRMLR--Drag5.5Drag5.5
MLRMLR**--Drag5.5Drag5.5
MLRMLR--Drag5.5Drag5.5
((* * onon--line) UIline) UI
Model & Model & DescDesc
XXRodentRodent acute acute
toxicitytoxicity ((~~55)55)
CMC (10)CMC (10)
ED ED activityactivity
(24)(24)
XXWSWS** (20)(20)
XXVPVP** (35)(35)
XXXXBPBP** (93)(93)
XXXXMPMP** (94)(94)
ConsCons
ensusensusToxToxPhPh--chch
SIDSSIDS
((NNobjobj))
EndpointEndpoint
aa BhhataraiBhhatarai, B. et al. (, B. et al. (WP3 partnersWP3 partners), ), Molecular InformaticsMolecular Informatics , 2011 30, 189, 2011 30, 189--204 (Proceedings issue of Euro204 (Proceedings issue of Euro--QSAR2010).QSAR2010).
bb BhhataraiBhhatarai, B.; , B.; GramaticaGramatica, P., , P., Environ. Sci. Technol.,Environ. Sci. Technol., 2010, Online first (2011 45: Special issue on 2010, Online first (2011 45: Special issue on PFCsPFCs), ), OralOral at CMTPI11at CMTPI11
CC Oberg T.; Liu T., Oberg T.; Liu T., Chemo Lab, Chemo Lab, 2011, 107, 592011, 107, 59--64.64.
dd BhhataraiBhhatarai, B.; , B.; GramaticaGramatica, P., P. ,, Molecular DiversityMolecular Diversity,, 2010, 2010, 15, 46715, 467--476. 476.
ee BhhataraiBhhatarai, B.; , B.; GramaticaGramatica P., P., Chem. Res. Chem. Res. ToxicolToxicol.,., 2010, 23, 5282010, 23, 528--539.539.
ff KovarichKovarich, S.; Papa, E., , S.; Papa, E., GramaticaGramatica, P , P SubmittedSubmitted toto SAR & QSAR SAR & QSAR EnvironEnviron. Res. Res., 2011., 2011..
PredictionPrediction of SIDS of SIDS endpointsendpointsand and characterizationcharacterization of of environmentalenvironmental behaviourbehaviour
ExpExp and and PredPred data data usedused forforPrioritizationPrioritization (Del 3.4)(Del 3.4)�� experimentalexperimental testingtesting
* * PredictionsPredictions comparedcompared withwith EPI Suite softwareEPI Suite software
SummarySummary on (B)on (B)TAZsTAZs
bb
aa
UIUI
RefRef
in in
progressprogress
WP3WP3
PartnPartn
ersers
PublishedPublished
(351 TAZ)(351 TAZ)
Pred&ADPred&AD
in progressin progress
(MLR)(MLR)
XXLCLC5050 EarthwormEarthworm
BCFBCF
XXLCLC5050 FishFish
in progressin progress
(MLR)(MLR)
XXECEC5050 DaphniaDaphnia
XXLDLD5050 HoneybeesHoneybees
XXLDLD5050 BirdBird
ExpExp data data
uploadeduploaded
in CAD. web in CAD. web
/ Database/ Database
Drag 5.5 Drag 5.5
"pruned"pruned““
++
onon--line line descdesc..
XXECEC5050 AlgaeAlgae
� �� �MLRMLRXXMPMP** (56)(56)
� �� �MLRMLRXXVPVP** (33)(33)
� �� �MLRMLRXXLogKowLogKow** (64)(64)PublishedPublished
++
CAD. web / CAD. web /
DatabaseDatabase
Drag 5.5Drag 5.5MLR MLR **XXWSWS** (49)(49)
DescDesc Data setData set
EcotoxEcotoxPhPh--chch
SIDSSIDS
((**onon--line) UIline) UI((NNobjobj))
Model Model D3.5D3.5EndpointEndpoint
aa BhhataraiBhhatarai, B.; , B.; GramaticaGramatica, P.,, P., Water ResearchWater Research,, 2011, 45 (3) 14632011, 45 (3) 1463--1471. 1471. OralOral at CMTPI11at CMTPI11bb KovarichKovarich S.etS.et UI UI groupgroup, , OralOral at CMTPI1at CMTPI1
PredictionPrediction of SIDS of SIDS endpointsendpointsand and characterizationcharacterization of of environmentalenvironmental behaviourbehaviour
ExperimentalExperimental data data usedused forforPrioritizationPrioritization (Del 3.4)(Del 3.4)�� experimentalexperimental testingtesting
* * PredictionsPredictions comparedcompared withwith EPI Suite softwareEPI Suite software
A A limitedlimited amountamount of of developeddeveloped and and publishedpublished locallocal modelsmodels forfor
CADASTER CADASTER chemicalschemicals are are nownow uploadeduploaded in the CADASTER in the CADASTER
databasedatabase::
BetterBetter resultsresults thanthan EPI Suite EPI Suite
and and checkcheck of of structuralstructural AD: AD:
definitiondefinition of of interpolationinterpolation or or
extrapolationextrapolation
WP3 in WP5 : WP3 in WP5 : ModelsModels in CADASTER webin CADASTER web
�� BFRsBFRs:: VP, MP, VP, MP, logKowlogKow, , logKoalogKoa
�� PFCsPFCs:: BP, WS, VPBP, WS, VP
�� (B)(B)TAZsTAZs: WS, : WS, logKowlogKow
OngoingOngoing modellingmodelling forfor (B)TAZ are (B)TAZ are basedbased alsoalso on on
descriptorsdescriptors calculatedcalculated in the CADASTER database in the CADASTER database ��������
uploadableuploadable modelsmodels
ExampleExample: : LocalLocal Model Model forfor Log Log KoaKoa of PBDEof PBDE
LogKoa=LogKoa= 6.654 +06.654 +0.222.222 T(O.T(O..Br.Br))
Experimental range of LogKoa: 7.34 (mono-BDE) – 11.96 (hepta-BDE)
1
2
7 8
10
1213
15
17
21
28
30 32
35
37
47
66
69
75
77
82
8599
100
119
126
153
154
156
183
7 8 9 10 11 12 13
LogKoa Exp.
7
8
9
10
11
12
13
Lo
gK
oa
Pre
d.
Training set
Prediction set
0
0.2
0.4
1 21 41 61 81 101 121 141 161 181 201
BDE
dis
tan
ce
fro
m t
he
str
uc
tura
l
Do
ma
in (
ha
t)
nona-deca
Are the predictions in the structural domain ?
90.4 % 90.4 % intointo ADAD
n° Obj Descriptor R2% Q2boot% Q2
EXT(rand20%) %
30 T(O..Br) 97.36 96.77 99.56
Exp. vs. Pred. data
6
7
8
9
10
11
12
13
14
15
1 2 7 8 10 12 13 15 17 21 28 30 32 35 37 47 66 69 75 77 82 85 99 100 119 126 153 154 156 183
PBDE
Lo
gK
oa
Exp. LogKoa Chen (2003) Papa (2008) Xu (2007) KoaWIN
ComparisonComparison withwith some some existingexisting modelsmodels, in , in particularparticular EPISuiteEPISuite
monomonomonomono----tritritritri
tetratetratetratetra----heptaheptaheptahepta
PredictedPredicted and and ExperimentalExperimental data data forfor 30 30 PBDEsPBDEs
Author Method N° obj. N° vars R2% Q2LOO% Q2
EXT % RMSE(30 obj)
Papa et al. MLR 30 1 97.4 96.8 99.6 0.23
Xu et al. MLR 22 2 97.6 97.2 - 0.31
Chen et al. PLS 13 10 97.9 97.5 - -
KoaWIN (Episuite) KOW/KAW 0.81
0
0.5
1
1.5
2
2.5
3
3.5
monoBDE diBDE triBDE tetraBDE pentaBDE hexaBDE heptaBDE octaBDE nonaBDE decaBDE
∆∆ ∆∆ lo
g u
nit
s
average ∆ (|YPapa- YKoaWIN|) average ∆ (|YPapa-YXu|)
YYPapaPapa = = PredictionsPredictions byby ourour model (model (rangerange Log Log KoaKoa: 7.32 : 7.32 –– 15.09)15.09)YYEpisuiteEpisuite = = PredictionsPredictions byby KoaWINKoaWIN ((DmaxDmax = 3.33 log = 3.33 log unitsunits; ; rangerange Log Log KoaKoa: 6.81: 6.81--18.23)18.23)YYXuXu = = PredictionsPredictions byby XuXu etet al.al. (2007) ((2007) (DmaxDmax =1=1.06.06 log log unitsunits; ; rangerange Log Log KoaKoa: 7.4: 7.4--15.73) 15.73)
n° bromine increase = D increase
PredictionsPredictions forfor allall the 209 the 209 PBDEsPBDEs congenerscongeners
High difference with EPISUITE for highly brominated PBDEs
ComparisonComparison withwith some some existingexisting modelsmodels, in , in particularparticular EPISuiteEPISuite
Papa, E.; Papa, E.; KovarichKovarich, S.; , S.; GramaticaGramatica P., P., Molecular Informatics Molecular Informatics 2011 , 30, 2322011 , 30, 232--240 (Proceedings issue of Euro240 (Proceedings issue of Euro--QSAR2010).QSAR2010).
to develop and propose in:
• local QSAR models• specific for the 4 classes of emerging pollutants• externally validated• verified for their structural Applicability Domain
AimAim of WP3 in CADASTER Projectof WP3 in CADASTER Project
http://www.cadaster.euhttp://www.cadaster.eu
http://www.qsar.ithttp://www.qsar.it
[email protected]@uninsubria.it
Thanks to all for your attentionThanks to all for your attention !!!!
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)