zhi-jie liu institute of biophysics chinese academy of sciences introduction to macromolecular...
Post on 20-Jan-2016
217 views
TRANSCRIPT
Zhi-Jie Liu
Institute of BiophysicsChinese Academy of Sciences
Introduction to Macromolecular Structures
Outline
1. Varieties of macromolecules
2. Macromolecular structures
3. Structure determination by X-ray crystallography
4. Structure validation and deposition.
Varieties of macromolecules1. Proteins
2. DNA
3. RNA
4. Complexes: protein-protein, protein-DNA/RNA
Lipids, peptides, sugars, etc are categorized as non macromolecules
Our discussion is more focused on protein molecules
DNA/RNA
Deoxyribonucleic acid, DNA:
consists of two long polymers of simple
units called nucleotides, Cytosine,
Guanine, Adenine and Thymine.
The sequence of these four bases along
the backbone encodes information,
or the genetic code.
RNA has the same nucleotides except
that Thymine is replaced by Uracil.
A series of codons in part of a
mRNA molecule. Each codon
consists of three nucleotides,
usually representing a single
amino acid.
Genetic code
Macromolecular structuresProteins
Composed of one or more polypeptides which is a single linear polymer chain of amino acids. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code.
Proteins are the molecular building block of life. Protein molecules are three-dimensional,so is life.
General Amino Acid StructureAt pH 7.0
Cα
H
R
COO-+H3N
General Amino Acid Structure
Chirality of amino acidsThe "CORN" rule for determining the D/L isomeric form of an amino acid :
COOH, R, NH2 and H (where R is a variant carbon chain) are arranged around the chiral center C atom. Starting with the hydrogen atom away from the viewer, if these groups are arranged clockwise around the carbon atom, then it is the D-form. If counter-clockwise, it is the L-form.
L D
Varieties of amino acidstending to avoid water, nonpolar and uncharged, relatively insoluble in water. Side chains tend to associate with eachother to minimize their contact with water or polar sidechains.
Protein Structure & Function, ©2004 New Science Press Ltd
Varieties of amino acidsInteract with water, polar or charged,very soluble in water. side chains tend to associate with other hydrophilic sidechains, or with water molecules, usually by means ofhydrogen bonds.
Protein Structure & Function, ©2004 New Science Press Ltd
Varieties of amino acidshaving both polar and nonpolar characterand therefore a tendency to form interfaces betweenhydrophobic and hydrophilic molecules.
Protein Structure & Function, ©2004 New Science Press Ltd
Peptide Chain
Peptide Bond Lengths
Protein Conformation Framework
• Bond rotation determines protein folding, 3D structure
• Double bond disallows rotation
Bond Rotation Determines Protein Folding
Protein Conformation Framework
• Torsion angle (dihedral angle) – Measures orientation of four linked
atoms in a molecule: A, B, C, D
Dihedral angle
Protein Conformation Framework
• Torsion angle (dihedral angle) – Measures orientation of four linked
atoms in a molecule: A, B, C, D
– ԎABCD defined as the angle between the normal to the plane of atoms A-B-C and normal to the plane of atoms B-C-D
– Three repeating torsion angles along protein backbone: ω, φ, ψ
Backbone Torsion Angles
• Dihedral angle ω : rotation about the peptide bond, namely Cα
1-{C-N}- Cα2
Backbone Torsion Angles
• Dihedral angle φ : rotation about the bond between N and Cα
Backbone Torsion Angles
• Dihedral angle ψ : rotation about the bond between Cα and the carbonyl carbon
Backbone Torsion Angles
• ω angle tends to be planar (0º - cis, or 180 º - trans) due to delocalization of carbonyl π electrons and nitrogen lone pair
Backbone Torsion Angles
Backbone Torsion Angles
• φ and ψ are flexible, therefore rotation occurs here
• However, φ and ψ of a given amino acid residue are limited due to steric hindrance
Protein Structure & Function, ©2004 New Science Press Ltd
Steric Hindrance
• Interference to rotation caused by spatial arrangement of atoms within molecule
• Atoms cannot overlap
• Atom size defined by van der Waals radii
• Electron clouds repel each other
G.N. Ramachandran• Used computer models of small polypeptides to
systematically vary φ and ψ with the objective of finding stable conformations
• For each conformation, the structure was examined for close contacts between atoms
• Atoms were treated as hard spheres with dimensions corresponding to their van der Waals radii
• Therefore, φ and ψ angles which cause spheres to collide correspond to sterically disallowed conformations of the polypeptide backbone
• Only 10% of the {φ, ψ} combinations are generally observed for proteins
• First noticed by G.N. Ramachandran
Ramachandran Plot
• Plot of φ vs. ψ
• The computed angles which are sterically allowed fall on certain regions of plot
Computed Ramachandran Plot
White = sterically disallowed conformations (atoms come closer than sum of van der Waals radii)
Blue = sterically allowed conformations
Experimental Ramachandran Plot
φ, ψ distribution in 42 high-resolution protein structures (x-ray crystallography)
Ramachandran PlotAnd Secondary Structure
• Repeating values of φ and ψ along the chain result in regular structure
• For example, repeating values of φ ~ -57° and ψ ~ -47° give a right-handed helical fold (the alpha-helix)
The structure of cytochrome C shows many segments of helix and the Ramachandran plot shows a tight grouping of φ, ψ angles near -50,-50
alpha-helix cytochrome CRamachandran plot
Similarly, repetitive values in the region of φ = -110 to –140 and ψ = +110 to +135 give beta sheets. The structure of plastocyanin is composed mostly of beta sheets; the Ramachandran plot shows values in the –110, +130 region:
beta-sheet plastocyanin Ramachandran plot
φ, ψ and Secondary Structure
Name φ ψ Structure ------------------- ------- ------- ---------------------------------alpha-L 57 47 left-handed alpha helix3-10 Helix -49 -26 right-handed.π helix -57 -80 right-handed.Type II helices -79 150 left-handed helices formed by polyglycine and polyproline.Collagen -51 153 right-handed coil formed of three left handed helicies.
Four levels of protein structure
The Universe of Protein StructuresHow many proteins in the universe?
The smallest archaea genome encodes above 600 ORFs
Pyrococcus furiosus encodes 2200 ORFs
Homo sapiens encodes around 30,000 ORFS The facts:
The number of protein folds is large but limited. the number of different protein folds in nature is limited. They are usedrepeatedly in different combinations to create the diversity of proteins found in living organisms.
The Universe of Protein Structures
Protein structures are
modular and proteins can
be grouped into
families on the basis of the
domains they contain
There are around 1000 different protein folds
The Universe of Protein StructuresProtein motifs may be defined by their primary sequence or by thearrangement of secondary structure elements
Zinc finger motif
The Universe of Protein Structures
EF-hand motif
Protein Function in Cell
1. Enzymes • Catalyze biological reactions
2. Structural role• Cell wall• Cell membrane• Cytoplasm
Structure determination by X-ray crystallography
H K L I SgimaI Phi
2 5 9 101 5
3 7 8 49 4
…
X-Ray Diffraction Data
Phasing
Fourier Transforms Model building
Refinement
InverseFourier Transforms
Phase problem: Phase angles can not be
recoded by current X-Ray techniques.
Data processing
Crystal mounting and Cryo-Crystallography
X-ray sources: Rotation anode X-rays
Crystal mounting and Cryo-Crystallography
X-ray sources: synchrotron X-rays, 106 times stronger.
Shanghai Synchrotron Radiation Facility
Crystal mounting and Cryo-Crystallography
Data Collection:
Crystal mounting and Cryo-Crystallography
Data Collection:
Crystal mounting and Cryo-Crystallography
1. Lack of radiation damage thus increased crystal lifetime
2. Lower X-ray background and increased resolution
3. Fewer crystals required
4. Transport and ship in LN2
5. Mount when crystals are ready.
Advantages:
Crystal mounting and Cryo-Crystallography
Crystal mounting and Cryo-Crystallography
Mounting:
Robotic crystal diffraction quality screen
Crystal mounting robot
Crystal mounting and Cryo-Crystallography
Bragg’s law
In 1913, William Henry Bragg (1862–1942) and his son, William Lawrence Bragg (1890–1971), derived a formula to explain the diffraction ofX-ray by crystals.
They won the Nobel Prizein physics for their seminal roles in X-rayCrystallography.
Data collection strategy and data processing
Lawrence, Henry
D
a’b’
sin2dDCBD
An incident wave (wavelength λ ) strikes the planes “1” and “2 ”
)3.2.1( k
AB and AC verticalwith lights a and a’respectively.
The condition of a constructive interference:
kd sin2This relation is called Bragg’s law.
The path difference for rays from adjacent planes:
d1
2
hd
3
A
CB
ab
Data collection strategy and data processing
Data collection strategy and data processing
2.5A
Diffraction image from a RAXIS-IV image plate
Frame Oscillation = 1o
Exposure time = 20 min
Maximum resolution = 2.4 Å
Data collection strategy and data processing
Data collection strategy and data processing
Data collection strategy and data processing
Data processing:Indexing (finding the unit cell, orientation &
space group)Integrating (determining the intensities of
each spot)Merging (scaling data, averaging data &
determining data quality)Calculating structure factor amplitudes from
merged intensities
Diffraction DataSequence
Initial Phases
Quality Control
Refinement
Validation
Phase Combination
Model Building
The steps to solve the macromolecular crystal structure
Molecular Replacement Method (MR)
Isomorphs Replacement Method (MIR, SIR)
Anomalous Dispersion Method (MAD, SAD, SIRAS)
Direct Method
Other Methods
Phasing Methods in Macromolecular Crystallography
Phasing Methods in Macromolecular Crystallography
|Fp(h)|
|FP(h)|
|FPH(h)|
The phasing problem
FH(h)
The phase ambiguity in SIR
Phasing Methods in Macromolecular Crystallography
MIR
How to break the phase ambiguity?
Fourier Transformation and Electron Density Maps
Fourier Transformation
X-Ray diffraction
Experiment
Phasing method
),,(2exp,,1
,, lkhilzkyhxilkhFV
zyxh k l
Fig. 1 Effect of chainging countor level on the electron density map. In (A) a section of aldehyde dehydrogenase[2] density at 3.0Å resolution is shown using the 0.33 sigmma for the minimium countor level. The solvent is very noisy and the difference between protein and solvent is not obvious. In (B) the minimium countor level is increased to 0.5 sigmma. The solvent is less noisy and the protein and solvent is distinguishable. In (C) the minimium countor level is increased to 1.0 sigmma. The solvent is very clean and it is very easy to identify the protein boundry.
1.0 sigmma
0.5 sigmma0.33
sigmma
A B
C
FIG. 2 Effect of increasing phase error on the electron density map.
A: Density map at 2.0 Å resolution is shown using the final refined phases
B: An average of 22˚ of random error has been added to each phase.
C: An average of 45˚ of random error has been added to each phase.
D: An average of 67˚ of random error has been added to each phase.
(“Practical protein crystallography” by D E Mcree, Page 190)
3. A good map should show clear secondary structures ( helixes or -sheet).
Model Building: Steps in making the first trace in electron
density map
(1). Generating Ca chain trace. The only rule one has to observe is that the distance between Ca atoms of adjacent residues is always approximately 3.8 Å. Try to look for large pieces of secondary structure, such as helices and sheets, to start the Ca trace.
The side chains on a helix point to the nitrogen-terminal end. Another way to put it: the -helix resembles a Christmas tree, when viewed with the N-terminal end down, and the C-terminal end up.
(2). Identifying chain direction
(3). Generating main chain trace
Main chain can be automatically generated from a well traced Ca chain by many computer programs. In helices, the side chain positions are so highly constrained that you can accurately predict the main chain and C atom positions with a refined -helix from another protein.
Example of generated α-helix and β-sheet in electron density map
(4). Fitting the chemical sequence.Finding the first match of sequence to the map is a milestone in structure determination. Some tips are listed below:Heavy atoms bind to some specific residues.Hg-Cys, Pt-Met
Start the fitting from a well defined main chain trace where the density should be clear and rich in side chain information. These regions are often located inside the molecule.
The sulfur or Se-methionines are the perfect starting point for the sequence fitting if the map is from sulfur SAS or Se-MAD phases.Tryptophan is so much larger than all the other amino acids it can often be recognized.Hydrophilic side chains are often disordered.A correct fitting should be easily extended in both directions.
Representative electron density for amino acid side chains arranged in order of increasing size.
From an experimental electron density map calculated at 1.5 Angstrom resolution.
Generating the first modelGenerate the side chains based on the fitted sequence can be automated, but the generated side chain may not point at the correct direction. In most cases, the manual adjustments are needed.
Structure Validation and Deposition Generate symmetry related molecules. The
atoms at the contacts cannot come any closer than Van der Waals packing distance.
The side chains should fit the electron density map all over the whole molecule. If the fitting suddenly becomes bad in some region, it may indicate that something wrong with the fitting.Missing density is much better than extra density. It’s rarely seen that there is a blob of extra density for Gly, Ala or Pro residue.The model should make chemical sense and satisfy all that is known about the macromolecule.
Structure Validation and Deposition
It may be useful to evaluate the overall distribution of some residues, such as hydrophobic residues, glycine, and proline.If certain residues have been identified as being in the active site, are they close together in the model?
Structure Validation and Deposition
Structure Validation and Deposition
The stereochemical parameters such as bond length, bond angle etc, should within the standard deviation from their ideal values.
The Ramachandran Plot should be normal.
http://molprobity.biochem.duke.edu/
Structure Validation and Deposition
Atomic coordinates should be deposited to Protein Data Bank
http://www.pdb.org
谢谢!