luddite: an information theoretic library design tool
DESCRIPTION
Luddite: An Information Theoretic Library Design Tool. Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002. Outline. Overview Search Strategy Cost Function Algorithms Algorithm Extensions Implementation Details Results. Overview. - PowerPoint PPT PresentationTRANSCRIPT
Luddite: An Information Theoretic Library Design Tool
Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig
July 18, 2002
Outline Overview Search Strategy Cost Function Algorithms Algorithm Extensions Implementation Details Results
Overview Genomics and proteomics provide many
novel targets Need to find drugs for targets
Which compound to screen? What target? Methods to answer debated for many years
QSAR Recently combinatorial and parallel synthesis
techniques have transformed question of which single compound to analyze to one of which collection of compounds (library).
Overview Develop algorithm for design libraries
Discrete – collection of individual compounds
Combinatorial – collections of compounds synthesized in a parallel or combinatorial fashion
Based on information theoretic techniques
Overview Idea – Use molecules to “interrogate”
target receptor about what chemical features are required for binding
Objective – Compose library maximizing conclusions drawn from “answers” across all possible experimental outcomes
Goal – Design library that allows discovery of most information about optimization target
Search Strategy Strategies used in “20 Questions” are
applicable Binary Search
With every guess eliminate half the search space
Codeword Search Every outcome corresponds to a single
codeword Optimal set of questions can be asked
simultaneously Same set of optimal questions can be used
every time
Search Strategy
Search Strategy Library design analogous to “20
Questions” Searching for features required for ligand
binding, desired phenotype, and/or good pharmacokinetic properties instead of a number
“feature” – four-point pharmacophore
Search Strategy - Example
Search Strategy - Assumptions “20 Questions” Analogy useful but
assumes1. Every compound tests half of possible
features2. Can synthesize any compound in design
space3. Every assay value is accurate4. Goal is a single feature
Search Strategy - Remedies Eliminating Assumptions
1. Minimum of log2(F) bits to decode F outcomes Loose upper bound on number of compounds
2. Ability of set of questions to decode message is invariant to column reordering – therefore not necessary that every compound in design space be obtainable in order to find a maximally efficient set of questions
Search Strategy - Remedies 3. Error-correcting codes (ECC) based on
Hamming Distance 4. Adjust probability of features in an
iterative process and prune unlikely features. Will probably lead to convergence Enhances Efficiency Improves probability of success
Cost Function Given set of features search for a set
of compounds that allow decoding of each individual feature If not possible seek to decode as many
features as possible with flattest distribution across size of feature classes Feature Class – subset of features that all
have same codeword Entropy well suited to this calculation
Cost Function - Entropy Entropy – measure of uncertainty
All codewords same – no uncertainty -> minimal entropy
All codewords different -> maximum entropy
Wish to optimize following equation
M is library measure H is entropy of feature classes C is # distinct classes ||ci|| is size of feature class i F is # of features
Cost Function – Entropy Example
Algorithm - Overview Start with list of synthesized compounds Goal - select subset to maximize entropy State - set of compounds whose entropy can be
calculated Note: From entropy calculation that state is a function
of classes but our moves through state space are a function of the compounds. In general can’t be calculated incrementally and must
be completely reevaluated whenever the state changes Stark contrast with other library design methods
Despite seeming limitation method is very efficient
Algorithm - Details Approach to discrete and combinatorial
designs very similar Both use a greedy build-up of library to
desired number of compounds Greedy – technique that utilizes local max to find
global max Followed by a second phase that
reevaluates each of the library components looking for a better selection
Repeat till no improvement
Algorithm - Extensions1. Often desirable to guarantee certain items included
in library2. Ability to sub sample source pool during build-up and
optimization phases Dramatically decrease run time Only slightly impact quality of designs
3. Define minimum Tanimoto fingerprint similarity between any two compounds in discrete library
1 implemented for discrete and combinatorial algorithms.
2 and 3 only implemented for discrete algorithm.
Implementation Details C++ Microsoft Window NT 500 MHz Intel Pentium III 500 MB RAM
Results 9 different libraries selected with algorithm
273,373 compound source pool 3 component reaction A+B+C->D Monomer lists of length 33,436 19 4-point pharmacophore signatures calculated
for all compounds in source pool Compared final measures to optimal result
and random result
Results
Results - Entropy Combinatorial algorithm lags
behind discrete one for performance
Discrete Library of 91 compounds has same measure as optimal combinatorial library of 250 compounds
Still possibly more cost-effective to synthesize combinatorial library
General rule – twice as many compounds required in a combinatorial library to achieve same information as a discrete library
Iterative setting Use combinatorial algorithm
early to discover Use discrete algorithm later
to cherry-pick specific compounds