indiana university school of c571/c696 chemical information tech. 2004, lecture 7. page 1 c571/c696...
TRANSCRIPT
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 1
Indiana University School of
C571/C696 Chemical Information Technology
David [email protected]
http://www.informatics.indiana.edu/djwild
Representing 3D Structures
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 2
Indiana University School of
What we’ll cover today
• Sources of 3D information (X-ray, NMR)• Experimental 3D databases• Rotatable bonds & conformational flexibility• Representing 3D structures using distance matrices• Estimation of 3D structure on computer• Conformational search and minimization• 3D descriptors and fingerprints• Types & sources of protein information• How proteins are represented on computer• PDB file format
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 3
Indiana University School of
Sources of 3D information
• X-ray Crystallography• NMR Spectroscopy• Computer-generated 3D structures
• X-ray and NMR methods apply to both small molecules and protiens
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 4
Indiana University School of
X-ray crystallography
• Exploits diffraction of x-rays by electron clouds• Allows 3D location of atoms to be inferred• Requires sample to be in crystalline form• More info:
– http://www-structure.llnl.gov/Xray/101index.html
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 5
Indiana University School of
X-ray crystallography
Taken from http://www-structure.llnl.gov/Xray/101index.html
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 6
Indiana University School of
NMR Spectroscopy
• Exploits magnetic fields created by quantum spin in nucleii
• Atomic spin can switch state when radio waves are applied
• Different atoms and groups resonate at different frequencies
• Information can be pieced together to infer 3D structure• More info:
– http://www.rod.beavon.clara.net/nmr1.htm
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 7
Indiana University School of
Experimental 3D Databases – Cambridge Structural Database
• Experimental X-ray structures for 261,000 structures (Jan 2004)
• Various tools for searching the database (some available free)
• More info at:http://www.ccdc.cam.ac.uk/
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 8
Indiana University School of
CSD Growth since 1970
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 9
Indiana University School of
Factors involved in 3D representation
• Rotatable bonds and Conformational flexibility• Sampling conformations or including flexibility in
algorithms• Measuring energy of conformations• Representation of electronic and other characteristics
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 10
Indiana University School of
Rotatable bonds and conformational flexibility
• Most compounds have rotatable bonds. This means that the molecule can take on many 3D conformations.
• Molecules prefer low-energy states, so low-energy conformations are more likely
• How do we work out which bonds are rotatable?• Do we pick one particular conformation (e.g. lowest
energy), or pick several, or allow for flexibility?
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 11
Indiana University School of
Working definition of a rotatable bond
Any single bond which is:– Not part of a ring– Not terminal (e.g. methyl)– Not in a conjugated system (e.g. amide)
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 12
Indiana University School of
C
Torsion (dihedral) angle
• The torsion angle (τ), also known as the dihedral angle, is the relative position, or angle, between the A-B bonds and the C-D bonds when considering four atoms connected in the order A-B-C-D
A
BC
D A
B
D
ττ
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 13
Indiana University School of
Ring flexibility
• Chair & boat conformations• Occur with non-aromatic rings (e.g. cyclohexane)
Chair Boat
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 14
Indiana University School of
3D representation on computer
• The Coordinate Table is an extension of the atom table which lists coordinates of atoms in 3D space relative to a defined origin
• The Distance Matrix gives distances (in Ångstrom) between all atoms. It’s main use is in comparison of 3D structures. It can be derived from the coordinate table.
• These are usually stored in addition to a connection table.
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 15
Indiana University School of
Atom Label X Y Z
1 C -1.8920 -0.9920 -1.5760
2 C -1.3680 -2.1480 -0.9880
3 C -0.0760 -2.1440 -0.4640
4 C 0.7080 -0.9840 -0.5200
5 C 0.2000 -0.1560 -1.1960
6 C -0.1080 0.1600 -1.6520
7 O 2.0840 -1.0280 0.1040
8 O 2.5320 -2.0320 0.6360
9 C 2.8760 0.0240 0.1120
10 O 0.7520 1.3320 -1.0840
11 O 0.6680 2.0240 0.0320
12 C 1.3000 3.0600 0.1520
13 C -0.2400 1.5760 1.4440
Coordinate Table
1
2
3
4
6
5
78
9
10
11
12
13
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 16
Indiana University School of
Distance Matrix
1
2
3
4
6
5
78
9
10
11
12
13
4.8Å
3.5Å
1 2 3 4 5 6 7 8 9 10 11 12 13
1 1.4 2.4 2.8 2.4 3.8 4.8 4.2 1.4 2.4 2.7 2.9 4.3
2 1.4 2.4 2.8 4.3 5.1 5.0 2.4 3.7 3.9 4.2 5.6
3 1.4 2.4 3.8 4.2 4.8 2.8 4.2 4.7 4.9 6.4
4 1.4 2.5 2.8 3.6 2.4 3.7 4.7 4.6 6.1
5 1.5 2.4 2.3 1.4 2.3 3.7 3.5 4.8
6 1.3 1.2 2.5 2.8 4.4 3.9 5.0
7 2.2 3.7 4.1 5.7 5.2 6.3
8 2.8 2.5 4.2 3.5 4.3
9 1.4 2.6 2.3 3.7
10 2.2 1.3 2.5
11 1.2 2.4
12 1.5
13
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 17
Indiana University School of
3D Molecule file formats
• All tend to include coordinate/atom lookup table and connection table information
• Examples: MOL file (MDL), Sybyl MOL2 file (Tripos)
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 18
Indiana University School of
3D MOL file for Aspirin Chime 12290214053D
21 21 0 0 1 V2000 -1.8920 -0.9920 -1.5760 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.3680 -2.1480 -0.9880 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.0760 -2.1440 -0.4640 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7080 -0.9840 -0.5200 C 0 0 0 0 0 0 0 0 0 0 0 0 0.2000 0.1560 -1.1960 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.1080 0.1600 -1.6520 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0840 -1.0280 0.1040 C 0 0 0 0 0 0 0 0 0 0 0 0 2.5320 -2.0320 0.6360 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8760 0.0240 0.1120 O 0 0 0 0 0 0 0 0 0 0 0 0 0.7520 1.3320 -1.0840 O 0 0 0 0 0 0 0 0 0 0 0 0 0.6680 2.0240 0.0320 C 0 0 0 0 0 0 0 0 0 0 0 0 1.3000 3.0600 0.1520 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2400 1.5760 1.1440 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.8760 -0.9600 -1.9840 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.9880 -3.0360 -0.9520 H 0 0 0 0 0 0 0 0 0 0 0 0 0.3000 -3.0600 -0.0040 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.4880 1.0840 -2.0560 H 0 0 0 0 0 0 0 0 0 0 0 0 2.5640 0.7800 -0.3240 H 0 0 0 0 0 0 0 0 0 0 0 0 -0.7600 0.6360 0.9320 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.0080 2.3480 1.2880 H 0 0 0 0 0 0 0 0 0 0 0 0 0.3440 1.4320 2.0560 H 0 0 0 0 0 0 0 0 0 0 0 0 13 21 1 0 13 20 1 0 13 19 1 0 11 13 1 0 11 12 1 0 10 11 1 0 9 18 1 0 7 9 1 0 7 8 1 0 6 17 1 0 5 10 1 0 5 6 1 0 4 7 1 0 4 5 1 0 3 16 1 0 3 4 1 0 2 15 1 0 2 3 1 0 1 14 1 0 1 6 1 0 1 2 1 0M END
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 19
Indiana University School of
Computer estimation of 3D structure
• Programs take as input 2D structures (e.g. in SMILES) and output 3D structures
• There is no one correct 3D structure, since in three dimensions a molecule is conformationally flexible
• Methods may output one single conformation, or an ensemble of possible conformations
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 20
Indiana University School of
Fragment / Rule Based 3D Structure Generation
• Split 2D structure into small fragments matched to a pre-defined empirical database
• Generally use a combination of real fragment coordinates, theory and rules to generate the 3D structure
• Generally produce one or more low-energy conformations
• Examples: Concord, Corina, Omega
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 21
Indiana University School of
Distance Geometry Based Structure Generation
• Rapidly samples “conformational space” of molecule, looking for valid conformations based on distance bounds.
• Outputs an ensemble of possible conformations, which can then be scored, e.g. by energy
• For algorithm, see– http://www.daylight.com/meetings/summerschool01/course/
basics/dist.html
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 22
Indiana University School of
Concord
• Distributed by Tripos, inc.• One of the earliest structure generators• Fragment / rule-based• Produces low-energy, geometry optimized
conformation• An industry standard• More information:
– http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/concord.html
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 23
Indiana University School of
Corina
• Created by Gasteiger lab in Germany• Fragment / Rule-based• Similar to Concord• More information, plus 1,000 free structure
generations on the web, at:– http://www2.chemie.uni-erlangen.de/software/corina/
free_struct.html
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 24
Indiana University School of
Omega
• Recently introduced by OpenEye• Rule-based• Systematically tests conformations, not
stochastic• Extremely fast generation of multiple low-energy
conformations• Can handle 100,000 compounds/processor/day• Free academic use license• More information at:
– http://www.eyesopen.com/products/applications/omega.html
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 25
Indiana University School of
Rubicon
• Marketed by Daylight• Mixture of Distance Geometry and SMARTS-
based rules• Rules can be user-defined• For more information, see
– http://www.daylight.com/products/rubicon.html
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 26
Indiana University School of
Structure Minimization
• Finding the conformer or conformers that have the lowest energy, and are therefore most likely to be found in nature (“conformational search”)
• May start with an existing non-optimized structure• Can use standard optimization methods such as
exhaustive search, simulated annealing, monte carlo, or genentic algorithms
• Can attempt to use ab initio derivation• More info see:
– http://www.chem.swin.edu.au/modules/mod6/
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 27
Indiana University School of
3D small molecule databases and searching
• Databases store coordinate tables and often distance matrices
• Searching is a little different from 2D searching:– Needs to take into account conformational flexibility– Requirements different
• Less common and less mature than 2D databases and searching
• See http://www.netsci.org/Science/Cheminform/feature06.html for a review
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 28
Indiana University School of
3D substructure (“pharmacophore”) search
• A pharmacophore is a set of features in 3D required for binding to a particular protein
• E.g. “find all of the molecules that have an OH group between 2 and 5 Å away from a Carboxyl Oxygen, both of which are 7-8 Å from a Benzene Ring
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 29
Indiana University School of
3D Similarity Searching
• Can use 3D fingerprints based on pharmacophore “fragments”– See, e.g., Comparing 3D Pharmacophore Triplets and 2D Fingerprints
for Selecting Diverse Compound Subsets. H. Matter and T. Pötter, J. Chem. Inf. Comput. Sci.; 1999; 39(6) pp 1211 - 1225
• Can be atom based, involving comparison of distance matrices– E.g. finding pairs of most-similar atoms between molecules, based on
their distances from other atoms in the molecule
• But other forms are also used, e.g. using fields– See, e.g., Calculation of Structural Similarity by the Alignment of
Molecular Electrostatic Potentials, D. Thorner, D. Wild, P. Willett, & M. Wright, Perspectives in Drug Discovery and Design, 9/10/11, 301-320, 1998
• May be used for searching databases or ranking small datasets
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 30
Indiana University School of
A debate! – 2D vs 3D similarity
Which is more effective…
… for retrieving molecules with similar biological activity?
… for retrieving molecules with similar 2D structures?
… for retrieving related molecules of interest to chemists?
… for ranking molecules for a particular target?
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 31
Indiana University School of
WDI - Mean Actives Retrieved in Top 300
0
10
20
30
40
50
60
2D Finger
3D Atom
3D Fields
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 32
Indiana University School of
Agrochemicals Dataset - Correlation between similarity and activity with four activities
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
A1 A2 A3 A4
2D Finger
3D Atom
3D Field
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 33
Indiana University School of
Any consensus?
Which is more effective…
… for retrieving molecules with similar biological activity?
Usually 2D
… for retrieving molecules with similar 2D structures?
2D
… for retrieving related molecules of interest to chemists?
Sometimes 2D, sometimes 3D (bioisosteres)
… for ranking molecules for a particular target?
Sometimes 2D, sometimes 3D
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 34
Indiana University School of
Other forms of 3D information
• Surface (van de Waal’s, Connolly, volume)• Properties projected onto surface (electrostatics,
hydrophobics)• Fields (energy, force, electrostatic, steric, hydrophobic)• Atom-based properties (charge, hydrophobicity, etc)
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 35
Indiana University School of
What is a macromolecule?
• Any very large molecule (>1000 atoms)• Usually made up of repeating building block
molecules (amino acids, nucleic bases, etc) in a chain
• Polypeptides (amino acid building blocks)• Proteins (amino acid building blocks)• Nucleic acids (made up of bases)• Polysaccharides (made up of sugars)• We shall be focusing on polypeptides and proteins
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 36
Indiana University School of
Types of protein information
• Atomic (3D atom coordinates and bond information)
• Primary (Amino acid sequence)• Secondary (Alpha helices, beta sheets, etc)• Tertiary (3D folding of protein)• Quaternary (dimers, protein families)
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 37
Indiana University School of
Atomic information
• 3D coordinates of all atoms in the protein
• Derived from X-ray crystallography or NMR Spectroscopy
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 38
Indiana University School of
Primary structure (Sequence)
• Lists Amino acids in order they appear in chain• Uses three letter or one-letter abbreviations, e.g:
Ser-Tyr-Ser-Met-Glu-His-Phe-Arg-Trp-Gly-Lys
S Y S M E H F R W G K
• Essentially “1-dimensional” representation of the protein
• Can be stored on computer as a text string
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 39
Indiana University School of
Secondary structure
• α-helix – C=O and NH groups hydrogen bond to group 4 along in the chain, forming a coil shape:β-sheet, turn
• β-sheet – flat structure due to hydrogen-bonding between two or more chains
Certain groups of amino acids tend to form themselves into regular 3D shapes:
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 40
Indiana University School of
Secondary structure (2)
• Secondary structural features can be fairly well predicted from primary structure, or it can be inferred from atom coordinates
• Primary sequence can be ‘tagged’ with secondary structure information
• E.g.
G A F T G E I S P G M I K D C G A T W Vβ β β β β β β α α α α α α α
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 41
Indiana University School of
Tertiary structure
• How the protein chain is folded in three dimensions
• Information mostly derived from atomic coordinate information
• Extremely difficult to predict from scratch using computational methods
• May be predicted by finding proteins with similar primary and secondary structures that have known coordinates (homology modeling, threading).
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 42
Indiana University School of
Tertiary structure example (HIV)
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 43
Indiana University School of
Protein information representation
• Atomic – coordinate/connection table
• Primary – text string
• Secondary – text string
• Tertiary – set of points and vectors
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 44
Indiana University School of
File formats
• Tripos Sybyl MOL2– For storage of atomic coordinate information– Same as 3D small molecule file format
• PDB format– Special format for proteins– Complex and somewhat ill-defined– Allows representation of multiple types of information
(primary, secondary, tertiary, atomic)
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 45
Indiana University School of
PDB file format
• Official guide: http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html
• Different sections to specify different kinds of information– Title– Primary structure– Heterogen– Secondary Structure– Connectivity Annotation– Miscellaneous– Crystallographic / Co-ordinate– Connectivity– Book-keeping
• Each section made up of keywords, one per line
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 46
Indiana University School of
PDB Title section
• HEADER – Type, date, ID code• COMPND – Description of compound• TITLE – Title of experiment used to produce structure• AUTHOR• JRNL – Reference publication• REMARK - Comments
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 47
Indiana University School of
Primary structure section
• SEQRES – specifies amino acid sequence• MODRES – specifies modifications to amino acids
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 48
Indiana University School of
Secondary structure section
• HELIX – specifies start & end of helical section• SHEET – specifies start & end of turn• TURN – specifies location of turn
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 49
Indiana University School of
Coordinates section
• ATOM – specifies coordinates for an atom in a residue• HETATM – specifies coordinates for other atoms (e.g. in
drug)• TER – specifies end of list of coordinates for a chain
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 50
Indiana University School of
Connectivity section
• CONECT – specifies connectivity between atoms (usually used for non amino-acids)
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 51
Indiana University School of
PDB file example HIV ProteaseHEADER PROTEIN 28-OCT-96 COMPND HIV-1 PROTEASE COMPLEXED WITH THE INHIBITOR A77003 (R,S) AUTHOR GENERATED BY SYBYL, A PRODUCT OF TRIPOS ASSOCIATES, INC. SEQRES 1 A 99 PRO GLN ILE THR LEU TRP GLN ARG PRO LEU VAL THR ILE SEQRES 2 A 99 LYS ILE GLY GLY GLN LEU LYS GLU ALA LEU LEU ASP THR SEQRES 3 A 99 GLY ALA ASP ASP THR VAL LEU GLU GLU MET SER LEU PRO SEQRES 4 A 99 GLY ARG TRP LYS PRO LYS MET ILE GLY GLY ILE GLY GLY SEQRES 5 A 99 PHE ILE LYS VAL ARG GLN TYR ASP GLN ILE LEU ILE GLU SEQRES 6 A 99 ILE CYS GLY HIS LYS ALA ILE GLY THR VAL LEU VAL GLY SEQRES 7 A 99 PRO THR PRO VAL ASN ILE ILE GLY ARG ASN LEU LEU THR SEQRES 8 A 99 GLN ILE GLY CYS THR LEU ASN PHE SEQRES 1 B 99 PRO GLN ILE THR LEU TRP GLN ARG PRO LEU VAL THR ILE SEQRES 2 B 99 LYS ILE GLY GLY GLN LEU LYS GLU ALA LEU LEU ASP THR SEQRES 3 B 99 GLY ALA ASP ASP THR VAL LEU GLU GLU MET SER LEU PRO SEQRES 4 B 99 GLY ARG TRP LYS PRO LYS MET ILE GLY GLY ILE GLY GLY SEQRES 5 B 99 PHE ILE LYS VAL ARG GLN TYR ASP GLN ILE LEU ILE GLU SEQRES 6 B 99 ILE CYS GLY HIS LYS ALA ILE GLY THR VAL LEU VAL GLY SEQRES 7 B 99 PRO THR PRO VAL ASN ILE ILE GLY ARG ASN LEU LEU THR SEQRES 8 B 99 GLN ILE GLY CYS THR LEU ASN PHE ATOM 1 N PRO A 1 8.133 -13.258 12.706 1.00 0.00 ATOM 2 CA PRO A 1 9.325 -12.418 13.001 1.00 0.00 ATOM 3 C PRO A 1 8.939 -10.978 13.283 1.00 0.00 ATOM 4 O PRO A 1 7.813 -10.607 13.030 1.00 0.00 ATOM 5 CB PRO A 1 10.211 -12.484 11.768 1.00 0.00 ATOM 6 CG PRO A 1 9.219 -12.779 10.674 1.00 0.00 ATOM 7 CD PRO A 1 8.271 -13.768 11.335 1.00 0.00 ATOM 8 H1 PRO A 1 7.974 -14.024 13.392 1.00 0.00
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 52
Indiana University School of
PDB file example HIV Protease (2)ATOM 1844 CE2 PHE B 99 5.527 -13.746 8.735 1.00 0.00 ATOM 1845 CZ PHE B 99 6.308 -12.665 8.239 1.00 0.00 ATOM 1846 OXT PHE B 99 5.672 -12.903 13.426 1.00 0.00 ATOM 1847 H PHE B 99 5.668 -10.590 12.626 1.00 0.00 TER 1848 PHE B 99 HETATM 1849 C1 A 1 -3.676 0.038 -4.301 1.00 0.00 HETATM 1850 N21 A 1 -2.730 -0.070 -5.222 1.00 0.00 HETATM 1851 H28 A 1 -2.958 0.299 -6.126 1.00 0.00 HETATM 1852 C22 A 1 -1.389 -0.623 -4.962 1.00 0.00 HETATM 1853 H29 A 1 -1.369 -1.096 -3.981 1.00 0.00 HETATM 1854 C25 A 1 -1.031 -1.707 -6.000 1.00 0.00 HETATM 1855 H30 A 1 -1.021 -1.235 -6.985 1.00 0.00 HETATM 1856 C27 A 1 -2.085 -2.821 -6.044 1.00 0.00 HETATM 1857 H36 A 1 -1.845 -3.547 -6.818 1.00 0.00 HETATM 1858 H35 A 1 -3.079 -2.429 -6.267 1.00 0.00 HETATM 1859 H34 A 1 -2.140 -3.350 -5.091 1.00 0.00 HETATM 1860 C26 A 1 0.365 -2.310 -5.758 1.00 0.00 HETATM 1861 H33 A 1 0.450 -2.709 -4.748 1.00 0.00 HETATM 1862 H32 A 1 1.159 -1.573 -5.891 1.00 0.00 HETATM 1863 H31 A 1 0.564 -3.134 -6.440 1.00 0.00 HETATM 1864 C23 A 1 -0.360 0.506 -4.927 1.00 0.00 HETATM 1865 N37 A 1 -0.195 1.091 -3.733 1.00 0.00 HETATM 1866 H59 A 1 -0.715 0.711 -2.967 1.00 0.00 HETATM 1867 C38 A 1 0.602 2.329 -3.511 1.00 0.00 HETATM 1868 H60 A 1 1.052 2.671 -4.449 1.00 0.00 HETATM 1869 C46 A 1 1.713 2.066 -2.491 1.00 0.00 HETATM 1870 H68 A 1 1.221 1.950 -1.522 1.00 0.00
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 53
Indiana University School of
PDB file example HIV Protease (3)
CONECT 1943 1934 1941 1944 CONECT 1944 1943 CONECT 1945 1864 CONECT 1946 1849 1947 1960 CONECT 1947 1946 1948 1949 1950 CONECT 1948 1947 CONECT 1949 1947 CONECT 1950 1947 1951 1958 CONECT 1951 1950 1952 CONECT 1952 1951 1953 1954 CONECT 1953 1952 CONECT 1954 1952 1955 1956 CONECT 1955 1954 CONECT 1956 1954 1957 1958 CONECT 1957 1956 CONECT 1958 1950 1956 1959 CONECT 1959 1958 CONECT 1960 1946 1961 1962 1963 CONECT 1961 1960 CONECT 1962 1960 CONECT 1963 1960 CONECT 1964 1849 MASTER 0 0 0 0 0 0 0 0 1965 2 126 16 END
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 54
Indiana University School of
Protein Databases
• The PDB (www.pdb.org) is the main worldwide repository for the processing and distribution of 3-D structure data of large molecules of proteins and nucleic acids. It currently holds around 24,000 structures
• Other databases (e.g. SwissProt http://au.expasy.org/sprot/) contain just sequence data for more proteins
• See also EBI: http://www.ebi.ac.uk/Databases/
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 55
Indiana University School of
PDB Growth
C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 56
Indiana University School of
Follow-up
• Read chapter 2 of Leach & Gillet• Read chapter 3 & 4 of Getting Started in
Chemoinformatics