![Page 1: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/1.jpg)
Contemporary QSAR Classifiers Compared
Craig BruceSchool of Chemistry
![Page 2: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/2.jpg)
Craig Bruce HPC User Meeting17th January 2007
1
Introduction
QSARSimilar Property PrincipleSimilar structure » similar properties
QuantitativeStructure-ActivityRelationship
![Page 3: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/3.jpg)
Craig Bruce HPC User Meeting17th January 2007
2
Methods
Support Vector Machine
![Page 4: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/4.jpg)
Craig Bruce HPC User Meeting17th January 2007
3
Methods
Support Vector Machine Decision Tree
![Page 5: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/5.jpg)
Craig Bruce HPC User Meeting17th January 2007
4
Methods
Support Vector Machine Decision Tree Random Forest Ensemble
Bagging Boosting
Parameter Tuning
![Page 6: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/6.jpg)
Craig Bruce HPC User Meeting17th January 2007
5
DatasetsDataset Compound type No.
Compounds No. Descriptor s
2.5D Fragments
A C E Angiotensin converting enzyme 114 5 6 1024
AchE Acetyl-cholinesterase inhibito rs 111 6 3 774
B Z R Benzodiazepine recepto r 163 7 5 832
COX2 Cyclooxygenase-2 inhibitor s 322 7 4 660
DHFR Dihydrofolate reductase inhibitors
397 7 0 952
G P B Glycogen phosphorylase b 6 6 7 0 692
THER Therolysin inhibitors 7 6 6 4 575
T H R Thrombin inhibito rs 8 8 6 6 527
Sutherland, J. J.; O'Brien, L. A.; Weaver, D. F. J. Med. Chem. 2004, 47(22), 5541-5554.
![Page 7: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/7.jpg)
Craig Bruce HPC User Meeting17th January 2007
6
Cross-Validation
Trained on full datasetCV to measure classifier
Dataset
![Page 8: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/8.jpg)
Craig Bruce HPC User Meeting17th January 2007
7
Need for HPC
8 datasets2 descriptor sets7 classifiers10 repeats of CV1120 models to generate
![Page 9: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/9.jpg)
Craig Bruce HPC User Meeting17th January 2007
8
Results - 2.5DDataset Tree Bagged
Tree
Boosted
Tree
Random
Forest
SVM Tuned
Foresta
Tuned
SVMb
A C E 86.9 86.5 86.6 85.4 90.3 89.3 89.9
AchE 70.6 71.6 72.7 72.6 72.0 79.5 74.3
B Z R 71.7 75.5 75.4 74.0 77.4 79.5 81.6
COX2 75.6 75.7 76.1 73.4 75.4 75.7 75.2
DHFR 78.8 83.2 83.4 83.1 79.6 84.9 82.2
G P B 70.6 74.5 76.2 74.1 73.9 76.7 75.3
THER 67.2 69.2 67.8 69.7 69.5 74.6 74.6
T H R 66.5 69.1 68.0 69.1 67.2 72.5 69.0
a 100 Treesb Polynomial kernel; exponent = 2; complexity constants = 0.05
![Page 10: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/10.jpg)
Craig Bruce HPC User Meeting17th January 2007
9
Results - Fragments
a 100 Treesb RBF kernel; width = 0.1; complexity constants = 1
Dataset Tree Bagged
Tree
Boosted
Tree
Random
Forest
SVM Tuned
Foresta
Tuned
SVMb
A C E 80.4 82.0 81.0 80.5 78.9 80.0 82.2
AchE 64.1 68.0 68.8 70.5 69.4 70.5 77.1
B Z R 74.0 75.0 69.8 67.3 74.0 68.7 75.8
COX2 71.1 71.5 71.0 68.1 72.6 68.7 71.1
DHFR 84.4 85.4 83.1 84.9 83.5 85.5 86.5
G P B 73.8 75.6 76.2 74.5 77.4 75.2 76.7
THER 72.2 75.8 75.5 75.4 75.3 76.7 73.4
T H R 71.5 69.2 68.8 66.7 71.1 68.4 69.8
![Page 11: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/11.jpg)
Craig Bruce HPC User Meeting17th January 2007
10
Statistics
Paired t-testMultiple Comparison Tests
Nonparametric Friedman test (corrected Iman & Davenport) Post-hoc Nemenyi test
Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets.J. Mach. Learn. Res. 2006, 7, 1-30
![Page 12: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/12.jpg)
Craig Bruce HPC User Meeting17th January 2007
11
Statistical Results
10 vs 100 trees in random forest tuning in 2.5D
Across classifiers statistical difference detected Tuned SVM & RF better than decision tree Other differences not significant
![Page 13: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/13.jpg)
Craig Bruce HPC User Meeting17th January 2007
12
Problems
Datasets are large 2GB RAM quickly used (unfairly) Although larger amounts of RAM can be
supported it is very expensive
Problem for larger datasets and runningensemble classifiers
![Page 14: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/14.jpg)
Craig Bruce HPC User Meeting17th January 2007
13
HPC solutions
Split task over many nodesParallelRandom ForestBagging
![Page 15: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/15.jpg)
Craig Bruce HPC User Meeting17th January 2007
14
Tree computation
FinalClassification
![Page 16: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/16.jpg)
Craig Bruce HPC User Meeting17th January 2007
15
Tree computation
FinalClassification
![Page 17: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/17.jpg)
Craig Bruce HPC User Meeting17th January 2007
16
Interpretation
QSAR need good accuracy and Interpretability
SVM transform the dataDecision trees produce instant
classification rules
![Page 18: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/18.jpg)
Craig Bruce HPC User Meeting17th January 2007
17
Trees
![Page 19: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/19.jpg)
Craig Bruce HPC User Meeting17th January 2007
18
Conclusions
SVM excellent classifier Ensemble of trees very competitive Universal parameters for random forest; SVM
more dataset specific Trees have interpretability advantage Future work
Extraction of information from ensemblesBruce, C. L.; Melville, J. L.; Pickett, S. D.; Hirst, J. D.
Contemporary QSAR Classifiers Compared.J. Chem. Inf. Mod. 47, 219–227 (2007).
![Page 20: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/20.jpg)
Craig Bruce HPC User Meeting17th January 2007
19
Acknowledgements
Jonathan HirstJames Melville
Stephen PickettChris LuscombeGavin Harper
![Page 21: Contemporary QSAR Classifiers Comparedcomp.chem.nottingham.ac.uk/members/bruce-files/cbruce...2007/01/17 · Craig Bruce HPC User Meeting 17th January 2007 5 Datasets Dataset Compound](https://reader035.vdocument.in/reader035/viewer/2022081222/5f79e9dc56774f464865ede2/html5/thumbnails/21.jpg)
Craig Bruce HPC User Meeting17th January 2007
20
Any Questions?