history and challenges of chemoinformatics -...
TRANSCRIPT
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
History and Challenges of Chemoinformatics
Johann GasteigerComputer-Chemie-Centrum
University of Erlangen-NürnbergD-91052 Erlangen, Germany
www2.chemie.uni-erlangen.de/
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Overview
• the scope of chemoinformatics
• the beginnings
• a field of ist own
• scientific challenges
• political challenges
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Synthesis of Properties
The most fundamental and lasting objective of synthesis is not
production of new compoundsbut
production of properties
George S. HammondNorris Award Lecture, 1968
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
What structure do I need for a certain property?structure-activity relationships
How do I make this structure?synthesis design
What is the product of my reaction?reaction predictionstructure elucidation
Fundamental Questions in Chemistry
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
chemicalstructure
physicalproperty
chemicalproperty biological
property
starting materials
synthesisplanning
reactionpredictionstructure
elucidation
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Chemoinformatics - Why?
• complex relationshipsstructure - biological activitychemical reactivity
• amount of informationmany millions of compounds and reactionsmany millions of publications
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Number of Compounds in Chemistry
compounds published in CAS
0
10
20
30
40
50
1965 1970 1975 1980 1985 1990 1995 2000 2005
year
com
poun
ds (m
illio
ns)
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
From Data to Knowledge
know-ledge
information
data
generalization
context
measurementcalculation
deductivelearning
inductivelearning
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Chemoinformatics: Definition
„The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and organization.“
F. K. Brown, Annual Reports in Medicinal Chemistry 1998, 33, 375-384
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Chemoinformatics: Definition
The application of
informatics methods
to solve
chemical problems
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
The Scope of Chemoinformatrics
• structure representation and searching
• data analysis and chemometrics
• molecular modeling
• spectra analysis and structure elucidation
• reaction representation and searching
• reaction modeling and synthesis design
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Application Areas for Chemoinformatics
• drug design
• analytical chemistry
• chemical engineering
• inorganic chemistry
• medicinal chemistry
• organic chemistry
• physical chemistry
• theoretical chemistry
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• structure representation1965, Morgan
• structure elucidation1965, Sasaki, Munk, DENDRAL
• synthesis design1970, Corey & Wipke, Ugi, Gelernter, Hendrickson
• molecular modeling1970, Langridge, Marshall
• data analysis / chemometrics1970, Kowalski, Wold
Chemoinformatics – An Old Discipline
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Structure Representation
• European industry: BASF, Hoechst, ICI, Thomae,BASIC, IDC (1965 - )
• Wiswesser Line Notation (1969 - )• Chemical Abstracts Service: Morgan Algorithm 1965• Sheffield: M. Lynch, P. Willett (1970 - )• Paris: J.E.Dubois, DARC (1970 - )• Munich: I. Ugi, J. Gasteiger, C. Jochum (1972 -)
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Computer-Assisted Structure Elucidation
• DENDRAL: C.Djerassi, J. Lederberg, D.Feigenbaum (1965)
• CHEMICS: S.Sasaki (1965)• M.Munk (1968)• C.Steinbeck (1998)
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Computer-Assisted Synthesis Design
1969 Corey + Wipke OCSS LHASA, SECS1973 Ugi + Gasteiger CICLOPS WODCA, THERESA1971 Hendrickson SYNGEN1976 Bersohn SYNSUP1977 Gelernter SYNCHEM1985 Hanessian CHIRON1988 Zefirov FLAMINGOES1988 Sasaki + Funatsu AIPHOS
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Visualzation of Chemical Structures
LHASA 1970
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Data Analysis Methods
• Chemometrics: B.Kowalski (1970)• PLS: S. Wold (1978)• Self-organizing neural network: Kohonen (1983)• Backpropagation Algorithm: Rumelhart, Hinxton
(1987)
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Databases
• Chemical Abstracts Service (1975)• DARC System (1980)• Cambridge CSD (1984)• Inorganic Structures Database (1985)• Beilstein (1990)• Gmelin (1990)• ChemInformRX (1991)• SpecInfo (1991)
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• data storage and retrieval
• property prediction
• drug design
• synthesis design
• spectra analysis and prediction
Common Topics: Structure Representation
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Common Topics: Data Analysis Methods
• property prediction
• drug design
• analytical chemistry
• spectra analysis and prediction
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• representation of chemical structures
• searching structures in databases
• visualization of chemical structures
• representation of chemical reactions
• data analysis methods
Common Topics
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Handbook of Chemoinformatics
J. Gasteiger (Editor)
65 authors73 contributions
4 volumes1900 pages
Wiley-VCH, Weinheim(August 2003)
From Data to Knowledge
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• representation of Markush structures (patents)
• representation of polymers
• conformational flexibility (bioactive conformation)
• similarity searching (beyond fingerprints and Tanimoto coefficient)
Chemical Structures - Challenges
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• scoring functions for docking
• flexibility of proteins
Proteins - Challenges
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
gene drugprotein lead
Bioinformatics Chemoinformatics
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• electroníc laboratory notebooks
• publishing chemical information (3D structures, spectra)
• publishing and searching on the internet
• text mining
• optical character recognition
• input of chemical structure (hand writing, voice)
Information Acquisition - Challenges
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• descriptor elimination
• model validation
• automatic model building
• definition of applicability domain
Data Mining - Challenges
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• modeling of chemical reactivity
• prediction of the course of chemical reactions
• synthesis design
• prediction of metabolism/degradation (abiotic and biotic)
• analysis of biochemical pathways
Chemical Reactions - Challenges
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al. /slides/Biochemical_Pathways/Folien/CCC/gcb00.ppt© Gasteiger et al.C3
What is a Chemical Reaction?
+EC - Nr.: 4.1.3.7
COOHC
CH2
O
COOH
the bioinformaticianan event influenced by a gene, a protein
the computer scientista context sensitive graph rewriting rule
the chemistan event breaking and making bonds
CH3 CO
S CoA
COOH
CH2
C
CH2
COOHHO
COOH
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Biochemical Pathways
/slides/Biochemical_Pathways/Folien/CCC/roche_2.ppt© Gasteiger et al.C3
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Glucose6-phosphate
NADP+
NADPH H+
6-Phospho-gluconolactone
H2O6-Phospho-gluconate
Ribulose5-phosphate
CO2
Xylulose5-phosphate
Ribose5-phosphate
Glyceraldehyde3-phosphate
Sedoheptulose7-phosphate
Erythrose4-phosphate
Fructose6-phosphate
H+NADP+NADPH
1
3
45
6
2
5
9
24
7
8
10 11
14
Glyceraldehyde3-phosphate
15
1
23
45
67
8
10
12
12 13
14
1512
5[r10] 5[r12] 10[r14] 10[r1] 10[r2] 10[r3] 8[r4] 3[r6] 3[r8]
2[c13] 20[c2] 10[c6] 1[c8] ---> 20[c4] 20[c5] 10[c9] 3[c12]
maximize NADPH production
Pathway Searching
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
• physical– spectra (CASE)
– color of dyes etc
• chemical– chemical reactivity
• biological– toxicity
• risk assessment (chemical + biological) REACH
Prediction of Properties - Challenges
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Application Areas for Chemoinformatics - Challenges
• drug design
• analytical chemistry
• chemical engineering
• inorganic chemistry
• medicinal chemistry
• organic chemistry
• physical chemistry
• theoretical chemistry
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Teaching
Sheffield
UMIST
Strasbourg
Erlangen
Indiana University
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Textbooks on Chemoinformatics
• V. Gillet, A. Leach
• J. Gasteiger, T. Engel
• J. Bajorath
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Chemoinformatics - A Textbook -
J. Gasteiger, T. Engel(Editors)
650 pages
Wiley-VCH, Weinheim(September 2003)
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Teaching
• define curriculum in chemoinformatics
• what contents of chemoinformatics have to go into regular chemistry curricula
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Cooperation Industry - Academia
• industry: generate data
• academia: develop methods
provide academia access to data
C3 Introduction into CI; SS 03/1st lecture© Gasteiger et al.
Funding
• increase awareness for importance of Chemoinformatics
• go into committees