A Novel SAR-Driven Approach for Identifying True High-Throughput Screening HitsS. Frank Yan, Hayk Asatryan, Jing Li, Kaisheng Chen, and Yingyao Zhou
Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121, USA
ChemAxon User Group Meeting, June 2006
Modern drug discovery relies heavily on large-scale high-throughput screening (HTS) to identify potential starting points for medicinal chemistry optimization. The typical “top X” activity cutoff method used to generate hits from large amount of raw HTS data is intrinsically error-prone due to the noisy nature of single-dose HTS, which oftentimes leads to a large number of false positives. Here we propose a novel knowledge-based, SAR-driven statistical approach for primary HTS hit generation using ChemAxon technology for clustering and chemical fingerprints. The method is also implemented with SciTegic Pipeline Pilot. In a proof-of-concept study for an in-house HTS campaign, the new approach proved to be more effective in identifying confirmed active compounds in diverse chemical scaffolds containing valuable SAR information, as demonstrated by a significantly improved confirmation rate compared to the traditional “top X” cutoff method.
A Proof-of-Concept Study•HTS data from an internal project were used and results from secondary experiments were used as benchmark. The 50,000 most active compounds were selected for analysis (HTS activity < ~0.76)
•Compound clustering and fingerprinting were generated using ChemAxon software.
OPI approach
Top X method
Scaffold-based Probability Score Alone Is Sufficient to Prioritize Hits
Confirmation rate for those selected compounds
Significant Structural Diversity in the Selected Hits
Some Scaffolds Picked by OPI
SIDXXXX645
SIDXXX4148 compounds selected, 5/6
confirmed activemean = 0.05 stdev. = 0.46
SIDXXX5988 compounds selected, 7/7
confirmed activemean = 0.05 stdev. = 0.18
28 compounds selected, 12/28 confirmed active
mean = 0.11 stdev. = 0.30
57 compounds selected, 31/36 confirmed active
mean = 0.31 stdev. = 0.09
SIDXXXX000
Great Improvement over the traditional “Top X” method
Advantages of OPI Hit-picking•An individualized activity threshold for every cluster/scaffold instead of a one-fits-all cutoff
•Effective in eliminating experimental artifacts (particularly those in the high-activity region)
•Improved hit confirmation rate (85% vs. 55%)
•Hits are inherently analyzed on a cluster/scaffold basis and SAR information can be readily extracted, facilitating the hit-to-lead process
•Some level of library redundancy is required
Ontology-Based Pattern Identification* in Hit Selection
*Novel Statistical Approach for Primary High-Throughput Screening Hit SelectionS. Yan et al. J. Chem. Inf. Model. 45(6), 1784-1790, 2005 In silico gene function prediction using ontology-based pattern identificationY. Zhou et al. bioinformatics, vol.21 no. 7 2005, p1237-1245
Guilt by association Structure–activity relationship
To automatically determine a subset of compounds for each cluster/scaffold, which not only share similar structure but also similar high HTS activity
•Cluster all tested, QC-ed compounds (>1,000,000) from an HTS campaign and rank them by activity
•For one given cluster, select more and more compounds by decreasing the activity cutoff and compute the corresponding hypergeometric P-value
•The cutoff for this cluster is determined when P-value reaches minimum P0, and member compounds whose activities are higher than the cutoff are selected as potential hits and assigned a score P0
•Repeat steps 2 and 3 for all clusters
•Rank/select hits based on score P0 and HTS activity
N compounds from HTS
A cluster of n compounds
m’
Cluster probability score P0 = min P(N,n,m,m’)
Increasingly select m compounds by lowering the activity cutoff
m’ compounds (P=P0) are selected as potential hits for this compound cluster/scaffold
Lower activity, more compounds
0.12
0.18
0.23 0.26
0.41
0.50
0.19
Implementation Using Pipeline Pilot
The Hit-to-Lead ParadigmTwo important milestones that have fundamental far-reaching effects
Bleicher et al. (2003) Nat. Rev. Drug Discov., 2, 369
“Cherry-Pick” the HTS Hits
A new approach to more effectively select primary hits is urgently needed!
Low activity High activity
# o
f co
mp
ou
nd
s
An arbitrary activity cutoff
In many real cases, the
confirmation rate is often
low
~100 to ~5000
The HTS Approach
Initial HTS campaign
Quality control
Primary hit selection
Hit validation
>1,000,000 1,000,000 1,000 100
HTS
assa
y a
ctivity
Compound group
Highly active singletons
Scaffolds with good activity and good SAR
Scaffolds with good activity but okay SAR
cutoffScaffolds with very bad SAR
cutofftraditional cutoff
Likely a false positive
Scaffolds with okay activity but good SAR
Valuable SAR Is Immediately Caught for This Scaffold
Imidazopyridine
Selected hitsNot selected
0.12 0.12
0.16 0.18
0.18 0.19
0.23
0.26
0.41
0.5
0.51
0.65
0.67