christian laggner, phd. computer aided molecular design group pharm. chem. dept

29
234 th ACS Meeting Boston 2007 C. Laggner Construction of a Virtual Library of Potential Endocrine Disruptors for in silico Target Fishing Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept. University of Innsbruck, Austria

Upload: hafwen

Post on 18-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Construction of a Virtual Library of Potential Endocrine Disruptors for in silico Target Fishing. Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept. University of Innsbruck, Austria. Overview. What are Endocrine Disruptors? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Construction of a Virtual Library of Potential Endocrine Disruptors

for in silico Target Fishing

Christian Laggner, PhD.Computer Aided Molecular Design GroupPharm. Chem. Dept.University of Innsbruck, Austria

Page 2: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Overview

• What are Endocrine Disruptors?• Need for computational screening

methods • Construction of the compound library• Applicability of publicly available

compound collections – problems, needs

Page 3: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Endocrine Disruptors

• Exogenous substances that interfere with the endocrine system of humans or animals – Mimick endogenous hormones– Block the effects of hormones– Change the levels of hormones: stimulate or inhibit

production, transport, or degredation

• Disturb regulation of development, growth, reproduction, and behavior

• Some common targets: – nuclear hormone receptors (ER, AR, PR, AhR, PPAR,

RXR, TR, …) – oxidoreductases (Aro, 11-HSD, …)

Page 4: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

ED chemicals come from various sources:

• Pesticides– Insecticides

– Bactericides, fungicides

• Additives in polymers

• Drugs – Side-effect

– Release in wildlife (wastewater)

• Phytoestrogens

• Produced from precursor substances– Incomplete combustion

– Wastewater

Cl

Cl

Cl

Cl

Cl

Cl

OH

RSn

R

R

X

R = Ph, nBu

OHOH

OH

OHOH

H H

H

OH

Some Examples

O

O Cl

Cl

Cl

Cl

O

OOH

OH

OH

Page 5: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

ED Screening Programs

• US: Endocrine Disruptor Screening Program (EDSP) http://www.epa.gov/scipoly/oscpendo/index.htm

• EU: REACH program (Registration, Evaluation and Authorisation of Chemicals), Endocrine Disrupters Website http://ec.europa.eu/environment/endocrine/index_en.htm

Tens to hundreds of thousands of compounds from various sources to be screened against multiple targets– prioritize small subset for initial screening

Page 6: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Virtual High-Throughput Screening

• Collection of pharmacophore models for over 300 unique targets, also ED targets

• Fast screening of x compounds against y targets -> activity profiles– Find new candidates– Find new targets

More on pharmacophore-based parallel screening in Thierry‘s talk at 3:15 pm…

Page 7: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

But What Shall We Screen?

• Endocrine Disruption Priority Setting Database v.2 http://www.ergweb.com/endocrine/

– For selecting chemicals for Tier 1 Screening

– Pesticides, commercial chemicals, cosmetic ingredients, food additives, nutritional supplements, mixtures, …

– 142,975 entries

– No structures, but compound names and CAS numbers

– Merge with structures from a public substance library (PubChem)

Page 8: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

The PubChem Project

• Part of NIH's Molecular Libraries Roadmap Initiative

• Collects structures and information about molecules from various databases – DB sources: substance vendors, biological

properties, toxicology, metabolic pathways, …– links to original database

• Mixed bag of goodies: differrent information for various molecules

Page 9: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

The PubChem Project

• Data organized into 3 sub-databases:– PCSubstance: More than 19 Mio. substance

records (= original database entries)

– PCCompound: More than 10 Mio. compound records (= unique structures)

– PCBioAssay: almost 600 bioassays with data for selected compounds

• Data publicly accessible via – web browser: http://pubchem.ncbi.nlm.nih.gov/

– ftp client: ftp://ftp.ncbi.nlm.nih.gov/pubchem/

– access via a programmatic XML interface (PUG) http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi

Page 10: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Pipeline Pilot

• Graphically compose data processing networks (“protocols”)

• Configurable components for each step

Page 11: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Library Generation Overview

CAS nr. list

Unmerged hits

Merged hits

Unique structures

3D database

Name list

Unmerged hits

Merged hits

Search PC Substances

Merge by same name / CAS nr.

Merge by structure

Filter, 3D conversion

Page 12: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Initial Searches

Name list:• Names exist for 65.1% of initial 143.0k list entries• Filtered:

– No CAS number („Roofing paper“, „Putrescent whole egg solids“, „Red pepper“, „Paint“, …)

– Name contains „polymer“, „derivative“, or „analogue“

– Name shorter than 4– characters String length

distribution peaks:truncated names

62305 (43.6%) unique names remaining

Page 13: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Initial Searches

• Search with name list in PubChem Substances, July 2007 (17.8 Mio. entries):– 85,000 hits– 46.6% of list entries found – Takes 11.5 h on a Pentium 4, 3.0 GHz

CAS number list:• 97.0% had unique number• Search in PubChem Substances:

– 179,000 hits – 83.5% of list entries found – Takes 46 min

• Only 3060 entries found by name and not by CAS number

Page 14: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Merge Hits for Same Search Terms

Have molecular structure, not isotope-labeled, no R-groups. Correct protonation states

Merge name hits by name / CAS hits by CAS number

How to check whether different structures describe the same molecule?– Stereochemistry not always fully described

• Solution: remove stereochemistry and compare SMILES string

– Different tautomers for the same compound give different SMILES strings

• Solution (not for all cases): InChI

Page 15: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

InChI

• IUPAC International Chemical Identifier

• Describes chemical structures in layers and sublayers: chemical formula, connectivity, charges, protonation states, stereochemistry, isotopes, tautomerism

• Different layers allow to adjust the level of similarity/identity

but

• tautomerism detection does not include keto-enol and ring-chain tautomerism (sugars…)

InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11) 12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1

Page 16: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Merge Hits for Same Search Terms

Merge multiple structures per entry by InChI without stereo and tautomerism layersPrefer the structure with the highest amount of stereo information (longest SMILES string)

4.5% (by name) and 7.9% (by CAS number) had still different structures: errors, additional components Check whether we can find a preferred structure

Page 17: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Looking for the Preferred Structure

„ferric ammonium citrate“ gives two results:

3 hits (from three differrent databases), 6 carbon atoms

1. Preferred structure among hits from one database2. Preferred structure among all hits

4 hits (all from one database), 7 carbon atoms

WRONG

Page 18: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Citric acid1,2,3-Propanetricarboxylic acid, 2-hydroxy-, manganese saltMagnesium citrateCitric acid monohydrate1,2,3-Propanetricarboxylic acid, 2-hydroxy-, lead(2+) salt (2:3)Ferrous citrate1,2,3-Propanetricarboxylic acid, 2-hydroxy-, iron salt1,2,3-Propanetricarboxylic acid, 2-hydroxy-, iron(3+) salt (1:1)Ferric citrateFerric ammonium citrateSodium citrateSodium citrate dihydrate

Merge Salts and Mixtures

Remove small counterions, mixture components, neutralise compounds: preferrence for one structure for >80% of problematic hits

Check for wrong valences

Merge all compounds with same structure– prioritize names of unmodified

compounds

– save names, CAS, … of the others

OHOH

OOHO

OH

O

Page 19: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Checking for Right Valences

Pentavalent carbon atoms are not so rare as you might think…

Page 20: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Final Filtering

Keep only compounds suitable for pharmacophore screening:

– Only selected elements: H, C, N, O, S, P, F, Cl, Br, I, B, Al, Si, Ge, As, Se, Sn, Sb, Te, Pb

– Must have at least one C atom– 70 ≤ MW ≤ 1000

76754 compounds, 63.9% of search list

Page 21: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Construction of the 3D Database

• Prepare 3D start conformation: add H atoms, generate 3D coordinates, minimize

• Generate 3D database with Catalyst catDB (FAST, MaxConfs = 255): 76677 successfuly converted (99.9%)

Page 22: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Analysis of the Database

• Derwent WDI 2005 (67050 entries): filtered, desalted, merged in same way 57667 entries remaining

• Overlap: 8513 entries (14.8% WDI, 9.0% EDPSD)

• Oral bioavailability (Lipinski‘s Rule of 5):– WDI 64.0%– EDPSD 79.2%

• Druglikeness (Ghose et al.1999):– WDI 39.7%– EDPSD 18.2%

Page 23: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Analysis of Results

Red: WDIBlue: EDPSD

Page 24: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

First Screening Results

Target % EDPSD % WDI

ER 0.52 0.86

ER 0.13 0.43

PPAR 0.40 0.70

PPAR 2.53 4.41

PPAR 8.20 16.74

RXR 0.80 0.59

RXR 1.68 2.13

TR 0.03 0.17

TR 0.01 0.07

Page 25: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Harvesting Structures from Public DBs

• Many common chemicals can be retrieved by comparing public compound lists

• Searching via a registry number (CAS, SID, CID, EINECS/ELINCS, …) is much faster than via name

– Names splitted between PCSubstances and PCCompounds

– Often wrong CAS number given (salts, hydrates, mixtures, …)

PCS: PUBCHEM_EXT_DATASOURCE_REGID: 408148PUBCHEM_SUBSTANCE_SYNONYM: 1H-Benzimidazol-5-amine, 2- (4-aminophenyl)-

2-(4-Aminophenyl)-5-aminobenzimidazole7621-86-5 NSC408148

PCC: PUBCHEM_IUPAC_OPENEYE_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_CAS_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_SYSTEMATIC_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-aminePUBCHEM_IUPAC_TRADITIONAL_NAME: [2-(4-aminophenyl)-3H-benzimidazol-5-

yl]amine

Page 26: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Harvesting Structures from Public DBs

• Chirality information is often missing or unclearly defined– 2D structures: wedged bonds or pseudo-3D

– 3D structures: atom stereo parity set ortake it from the 3D structure

• Tautomerism: partially solved by InChI– No keto-enol tautomerism

– No ring-chain tautomerism

– Workaround: connectivity? (together with MW, MF)

Page 27: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Conclusions

• Public databases and compound lists useful for in silico reprofiling of known compounds

• Different sources - different level of information

– Need standards for treating stereo information

– Problem of tautomerism

• There are always some errors…

– Comparison of different data sources may help us find some of them

– How can we give feedback about wrong structures and avoid further spreading of errors?

Page 28: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Acknowledgements

• Simona Distinto• Johannes Kirchmair• Thierry Langer• Patrick Markt• Daniela Schuster• Gudrun Spitzer • Theodora Steindl

• Fabian Bendix• Martin Biely• Alois Dornhofer• Robert Kosara• Judith Rollinger• Gerhard Wolber

• Rémy D. Hoffmann• Nicolas Triballeau

• Lyubomir G. Nashev• Alex Odermatt

NIH / PubChem Project EPA / Endocrine Disruptor Screening Program

Page 29: Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept

234th ACS Meeting Boston 2007 C. Laggner

Finally…

Thank you for your attention!