luddite: an information theoretic library design tool

22
Luddite: An Information Theoretic Library Design Tool Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002

Upload: jamuna

Post on 23-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Luddite: An Information Theoretic Library Design Tool. Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002. Outline. Overview Search Strategy Cost Function Algorithms Algorithm Extensions Implementation Details Results. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Luddite: An Information Theoretic Library Design Tool

Luddite: An Information Theoretic Library Design Tool

Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig

July 18, 2002

Page 2: Luddite: An Information Theoretic Library Design Tool

Outline Overview Search Strategy Cost Function Algorithms Algorithm Extensions Implementation Details Results

Page 3: Luddite: An Information Theoretic Library Design Tool

Overview Genomics and proteomics provide many

novel targets Need to find drugs for targets

Which compound to screen? What target? Methods to answer debated for many years

QSAR Recently combinatorial and parallel synthesis

techniques have transformed question of which single compound to analyze to one of which collection of compounds (library).

Page 4: Luddite: An Information Theoretic Library Design Tool

Overview Develop algorithm for design libraries

Discrete – collection of individual compounds

Combinatorial – collections of compounds synthesized in a parallel or combinatorial fashion

Based on information theoretic techniques

Page 5: Luddite: An Information Theoretic Library Design Tool

Overview Idea – Use molecules to “interrogate”

target receptor about what chemical features are required for binding

Objective – Compose library maximizing conclusions drawn from “answers” across all possible experimental outcomes

Goal – Design library that allows discovery of most information about optimization target

Page 6: Luddite: An Information Theoretic Library Design Tool

Search Strategy Strategies used in “20 Questions” are

applicable Binary Search

With every guess eliminate half the search space

Codeword Search Every outcome corresponds to a single

codeword Optimal set of questions can be asked

simultaneously Same set of optimal questions can be used

every time

Page 7: Luddite: An Information Theoretic Library Design Tool

Search Strategy

Page 8: Luddite: An Information Theoretic Library Design Tool

Search Strategy Library design analogous to “20

Questions” Searching for features required for ligand

binding, desired phenotype, and/or good pharmacokinetic properties instead of a number

“feature” – four-point pharmacophore

Page 9: Luddite: An Information Theoretic Library Design Tool

Search Strategy - Example

Page 10: Luddite: An Information Theoretic Library Design Tool

Search Strategy - Assumptions “20 Questions” Analogy useful but

assumes1. Every compound tests half of possible

features2. Can synthesize any compound in design

space3. Every assay value is accurate4. Goal is a single feature

Page 11: Luddite: An Information Theoretic Library Design Tool

Search Strategy - Remedies Eliminating Assumptions

1. Minimum of log2(F) bits to decode F outcomes Loose upper bound on number of compounds

2. Ability of set of questions to decode message is invariant to column reordering – therefore not necessary that every compound in design space be obtainable in order to find a maximally efficient set of questions

Page 12: Luddite: An Information Theoretic Library Design Tool

Search Strategy - Remedies 3. Error-correcting codes (ECC) based on

Hamming Distance 4. Adjust probability of features in an

iterative process and prune unlikely features. Will probably lead to convergence Enhances Efficiency Improves probability of success

Page 13: Luddite: An Information Theoretic Library Design Tool

Cost Function Given set of features search for a set

of compounds that allow decoding of each individual feature If not possible seek to decode as many

features as possible with flattest distribution across size of feature classes Feature Class – subset of features that all

have same codeword Entropy well suited to this calculation

Page 14: Luddite: An Information Theoretic Library Design Tool

Cost Function - Entropy Entropy – measure of uncertainty

All codewords same – no uncertainty -> minimal entropy

All codewords different -> maximum entropy

Wish to optimize following equation

M is library measure H is entropy of feature classes C is # distinct classes ||ci|| is size of feature class i F is # of features

Page 15: Luddite: An Information Theoretic Library Design Tool

Cost Function – Entropy Example

Page 16: Luddite: An Information Theoretic Library Design Tool

Algorithm - Overview Start with list of synthesized compounds Goal - select subset to maximize entropy State - set of compounds whose entropy can be

calculated Note: From entropy calculation that state is a function

of classes but our moves through state space are a function of the compounds. In general can’t be calculated incrementally and must

be completely reevaluated whenever the state changes Stark contrast with other library design methods

Despite seeming limitation method is very efficient

Page 17: Luddite: An Information Theoretic Library Design Tool

Algorithm - Details Approach to discrete and combinatorial

designs very similar Both use a greedy build-up of library to

desired number of compounds Greedy – technique that utilizes local max to find

global max Followed by a second phase that

reevaluates each of the library components looking for a better selection

Repeat till no improvement

Page 18: Luddite: An Information Theoretic Library Design Tool

Algorithm - Extensions1. Often desirable to guarantee certain items included

in library2. Ability to sub sample source pool during build-up and

optimization phases Dramatically decrease run time Only slightly impact quality of designs

3. Define minimum Tanimoto fingerprint similarity between any two compounds in discrete library

1 implemented for discrete and combinatorial algorithms.

2 and 3 only implemented for discrete algorithm.

Page 19: Luddite: An Information Theoretic Library Design Tool

Implementation Details C++ Microsoft Window NT 500 MHz Intel Pentium III 500 MB RAM

Page 20: Luddite: An Information Theoretic Library Design Tool

Results 9 different libraries selected with algorithm

273,373 compound source pool 3 component reaction A+B+C->D Monomer lists of length 33,436 19 4-point pharmacophore signatures calculated

for all compounds in source pool Compared final measures to optimal result

and random result

Page 21: Luddite: An Information Theoretic Library Design Tool

Results

Page 22: Luddite: An Information Theoretic Library Design Tool

Results - Entropy Combinatorial algorithm lags

behind discrete one for performance

Discrete Library of 91 compounds has same measure as optimal combinatorial library of 250 compounds

Still possibly more cost-effective to synthesize combinatorial library

General rule – twice as many compounds required in a combinatorial library to achieve same information as a discrete library

Iterative setting Use combinatorial algorithm

early to discover Use discrete algorithm later

to cherry-pick specific compounds