chemoinformatics in drug design

Chemoinformatics in Drug Design

Biological Sequence Analysis, May 6, 2011

Irene Kouskoumvekaki,Associate Professor,Computational Chemical Biology,CBS, DTU-Systems Biology

2 CBS, Department of Systems Biology

Computational Chemical Biology group

Irene Kouskoumvekaki

Associate Professor

Olivier Taboureau

Associate Professor

Sonny Kim Nielsen

PhD student Kasper Jensen

PhD student

Tudor Oprea

Guest Professor

Ulrik Plesner

master student


Definition:Chemoinformatics

Gathering and systematic use of chemical information, and application of this information to predict the behavior of unknown compounds in silico.

data prediction


Definition:A drug candidate…

... is a (ligand) compound that binds to a biological target (protein, enzyme, receptor, ...) and in this way either initiates a process (agonist) or inhibits it (antagonist)

The structure/conformation of the ligand is complementary to the space defined by the protein’s active site

The binding is caused by favorable interactions between the ligand and the side chains of the amino acids in the active site. (electrostatic interactions, hydrogen bonds, hydrophobic contacts...)


In vitro / In silico studies

Drug Discovery

Clinical studies

Animal studies


The Drug Discovery Process

Chemoinformatics


The Drug Discovery Process

MKTAALAPLFFLPSALATTVYLA

GDSTMAKNGGGSGTNGWGEYL

ASYLSATVVNDAVAGRSAR…(etc)

We know the structure of the biological target

We identify/predict the binding pocket

Challenge:

To design an organic molecule that would bind strong enough to the biological target and modute it’s activity.

New drug candidate


What is it?Alzheimer's is a disease that causes failure of brain

functions and dementia. It starts with bad memory and disability to function in common everyday activities.

Example: – Alzheimer’s disease

How do you get it?Alzheimer's disease is the result of malfunctioning

neurons at different parts of the brain. This, in turn, is due to an inbalance in the concentration of neurotranmitters.


How can we treat it?

Example: – Alzheimer’s disease

Acetylkolin neurotransmitter

Drug against Alzheimer’s


Old School Drug discovery process

Screening collection

HTS

Actives

103 actives106 cmp.

Follow-up

Hits

1-10 hits

Lead series

0-3 lead series

Hit-to-lead

Clinical trials

Drug candidate

0-1

Lead-to-drug

High rate of false positives !!!


Failures


Drug discovery in the 21st Century

Diverse set of molecules tested in the lab

in vitro in silico + in vitro

Computational methods to select subsets (to be tested in the lab) based on prediction of drug-likeness, solubility, binding, pharmacokinetics, toxicity, side effects, ...


The Lipinski ‘rule of five’ for drug-likeness prediction

Octanol-water partition coefficient (logP) ≤ 5 Molecular weight ≤ 500 # hydrogen bond acceptors (HBA) ≤ 10 # hydrogen bond donors (HBD) ≤ 5

If two or more of these rules are violated, the compound might

have problems with oral bioavailability.(Lipinski et al., Adv. Drug Delivery Rev., 23, 1997, 3.)


Major Aspects of Chemoinformatics



•Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information.

•Information Use: Data analysis, correlation and model building.

•Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences.


Information Acquisition and Management


Small molecule databases


0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

18,000,000

20,000,000

May-05 Sep-05 Jan-06 May-06 Sep-06 Jan-07 May-07 Sep-07

Compound

Substance

Growth In PubChem Substances & Compounds

Recent count: Substance: 72,156,631 Compound: 28,807,320 Rule of 5: 20,692,980


Searching in PubChem


Structural representation of molecules

Structural representation of molecules


Beyond the Lipinski Rule of 5...

•Chemometrics: The application of mathematical or statistical methods to chemical data (simple, linear methods)e.g. Principal Component Analysis

•Machine Learning: The design and development of algorithms and techniques that allow computers to learn (complex, non-linear algorithms)e.g. Artificial Neural Networks, K-means clustering


Prediction of Solubility, ADME & Toxicity

Solid

drug

Dissolution

Solubility

Drug in

solution

Membrane

transfer

Absorption

Absorbed

drug

Liver extraction Systemic

circulation

Metabolism


Prediction of biological activity/selectivity


Prediction models at CBS


Virtual screening


Virtual Screening Flavors

1D1D filters

e.g. Lipinskis Rule of Five

LIGAND-BASED

TARGET-BASED


Molecular similarity on the Chemical Space

• Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. (Not always true!)

• Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.


Ligand-based VS: Fingerprints

– widely used similarity search tool– consists of descriptors encoded as bit strings– Bit strings of query and database are compared using

similarity metric such as Tanimoto coefficient

MACCS fingerprints: 166 structural keys

that answer questions of the type:

• Is there a ring of size 4?

• Is at least one F, Br, Cl, or I present?

where the answer is either

TRUE (1) or FALSE (0)


Tanimoto Similarity

Tc c

a b c

9

10 9 90.9

or 90% similarity


Tanimoto Similarity


Ligand-based VS: Pharmacophore


Structure-based Virtual Screening: Docking

Given a protein and a database of ligands, docking scores determine which ligands are most likely to bind.

Binding pocket of target Library of small compounds


Energy of binding

Binding pocket of target Library of small compounds

-10 kcal/mol

+1 kcal/mol

-1 kcal/mol

+10 kcal/mol

ΔG = ΔH - TΔS

vdW

Hbond

Desolvation E

Electrostatic E

Torsional free E


“Docking” and “Scoring”

•Docking involves the prediction of the binding mode of individual molecules

– Goal: new ligand orientation closest in geometry to the observed X-ray structure (Conformations of ligands in complexes often have very similar geometries to minimum-energy conformations of the isolated ligand)

•Scoring ranks the ligands using some function related to the free energy of association of the two partners, looking at attractive and repulsive regions and taking into account steric and hydrogen bonding interactions

– Goal: new ligand score closest in value to the docking score of the X-ray structure


Docking algorithms

•Most exhaustive algorithms:– Accurate prediction of a binding pose

•Most efficient algorithms– Docking of small ligand databases in reasonable time

•Rapid algorithms– Virtual high-throughput screening of millions of

compounds


Scoring functions

•Molecular mechanics force field-based

Score is estimated by summing the strength of intermolecular van der Waals and electrostatic interactions between all atoms of the ligand-target complex

-CHARMM, AMBER

•Empirical-based

Based on summing various types of interactions between the two binding partners (hydrogen bonds, hydrophobic, …)

- ChemScore, GlideScore, AutoDock

•Knowledge-based

Based on statistical observations of intermolecular close contacts from large 3D databases, which are used to derive potentials or mean forces

-PMF, DrugScore


Ligand-based VSgood enrichment of candidate

molecules from the screening of large databases with less computational efforts

×too coarse to pick up subtle differences induced by small structural variations in the ligands

many options for model refinement

Structure-based VSbetter fit for analyzing smaller

sets of compounds, especially in retrospective analysis

include all possible interactions thus allowing the detection of unexpected binding modes

×Changing parameters for docking algorithms and scores is demanding

Mutants are being developed:

• pharmacophore methods with information about the target’s binding site

• docking programs that incorporate pharmacophore constraints

Combination of pharmacophore, docking and molecular dynamics (MD) screens

44 CBS, Department of Systems Biology44

http://www.vcclab.org/lab/edragon/


Public Web Chemoinformatics Toolshttp://pasilla.health.unm.edu/

http://pasilla.health.unm.edu/


ChemSpiderwww.chemspider.com


Open Babelhttp://openbabel.org/wiki/Main_page

48 CBS, Department of Systems Biology D. Vidal et al, Ligand-based Approaches to In Silico Pharmacology, Chemoinformatics and Computational Chemical Biology, Ed J. Bajorath, Springer, 2011


Questions?

chemoinformatics in drug design

Documents

dtusystems biology

prediction of drug

new drug candidate

druglikeness predictio

alzheimers diseasehow

biological targetwe

binding pocketchallenge

biological target protein