zhi-jie liu institute of biophysics chinese academy of sciences introduction to macromolecular...

Zhi-Jie Liu

Institute of BiophysicsChinese Academy of Sciences

　　　　　

Introduction to Macromolecular Structures

Outline

1. Varieties of macromolecules

2. Macromolecular structures

3. Structure determination by X-ray crystallography

4. Structure validation and deposition.

Varieties of macromolecules1. Proteins

2. DNA

3. RNA

4. Complexes: protein-protein, protein-DNA/RNA

Lipids, peptides, sugars, etc are categorized as non macromolecules

Our discussion is more focused on protein molecules

DNA/RNA

Deoxyribonucleic acid, DNA:

consists of two long polymers of simple

units called nucleotides, Cytosine,

Guanine, Adenine and Thymine.

The sequence of these four bases along

the backbone encodes information,

or the genetic code.

RNA has the same nucleotides except

that Thymine is replaced by Uracil.

A series of codons in part of a

mRNA molecule. Each codon

consists of three nucleotides,

usually representing a single

amino acid.

Genetic code

Macromolecular structuresProteins

Composed of one or more polypeptides which is a single linear polymer chain of amino acids. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code.

Proteins are the molecular building block of life. Protein molecules are three-dimensional,so is life.

General Amino Acid StructureAt pH 7.0

Cα

H

R

COO-+H3N

General Amino Acid Structure

Chirality of amino acidsThe "CORN" rule for determining the D/L isomeric form of an amino acid :

COOH, R, NH2 and H (where R is a variant carbon chain) are arranged around the chiral center C atom. Starting with the hydrogen atom away from the viewer, if these groups are arranged clockwise around the carbon atom, then it is the D-form. If counter-clockwise, it is the L-form.

L D

Varieties of amino acidstending to avoid water, nonpolar and uncharged, relatively insoluble in water. Side chains tend to associate with eachother to minimize their contact with water or polar sidechains.

Protein Structure & Function, ©2004 New Science Press Ltd

Varieties of amino acidsInteract with water, polar or charged,very soluble in water. side chains tend to associate with other hydrophilic sidechains, or with water molecules, usually by means ofhydrogen bonds.


Varieties of amino acidshaving both polar and nonpolar characterand therefore a tendency to form interfaces betweenhydrophobic and hydrophilic molecules.


Peptide Chain

Peptide Bond Lengths

Protein Conformation Framework

• Bond rotation determines protein folding, 3D structure

• Double bond disallows rotation

Bond Rotation Determines Protein Folding


• Torsion angle (dihedral angle) – Measures orientation of four linked

atoms in a molecule: A, B, C, D

Dihedral angle


• Torsion angle (dihedral angle) – Measures orientation of four linked

atoms in a molecule: A, B, C, D

– ԎABCD defined as the angle between the normal to the plane of atoms A-B-C and normal to the plane of atoms B-C-D

– Three repeating torsion angles along protein backbone: ω, φ, ψ

Backbone Torsion Angles

• Dihedral angle ω : rotation about the peptide bond, namely Cα

1-{C-N}- Cα2


• Dihedral angle φ : rotation about the bond between N and Cα


• Dihedral angle ψ : rotation about the bond between Cα and the carbonyl carbon


• ω angle tends to be planar (0º - cis, or 180 º - trans) due to delocalization of carbonyl π electrons and nitrogen lone pair



• φ and ψ are flexible, therefore rotation occurs here

• However, φ and ψ of a given amino acid residue are limited due to steric hindrance


Steric Hindrance

• Interference to rotation caused by spatial arrangement of atoms within molecule

• Atoms cannot overlap

• Atom size defined by van der Waals radii

• Electron clouds repel each other

G.N. Ramachandran• Used computer models of small polypeptides to

systematically vary φ and ψ with the objective of finding stable conformations

• For each conformation, the structure was examined for close contacts between atoms

• Atoms were treated as hard spheres with dimensions corresponding to their van der Waals radii

• Therefore, φ and ψ angles which cause spheres to collide correspond to sterically disallowed conformations of the polypeptide backbone

• Only 10% of the {φ, ψ} combinations are generally observed for proteins

• First noticed by G.N. Ramachandran

Ramachandran Plot

• Plot of φ vs. ψ

• The computed angles which are sterically allowed fall on certain regions of plot

Computed Ramachandran Plot

White = sterically disallowed conformations (atoms come closer than sum of van der Waals radii)

Blue = sterically allowed conformations

Experimental Ramachandran Plot

φ, ψ distribution in 42 high-resolution protein structures (x-ray crystallography)

Ramachandran PlotAnd Secondary Structure

• Repeating values of φ and ψ along the chain result in regular structure

• For example, repeating values of φ ~ -57° and ψ ~ -47° give a right-handed helical fold (the alpha-helix)

The structure of cytochrome C shows many segments of helix and the Ramachandran plot shows a tight grouping of φ, ψ angles near -50,-50

alpha-helix cytochrome CRamachandran plot

Similarly, repetitive values in the region of φ = -110 to –140 and ψ = +110 to +135 give beta sheets. The structure of plastocyanin is composed mostly of beta sheets; the Ramachandran plot shows values in the –110, +130 region:

beta-sheet plastocyanin Ramachandran plot

φ, ψ and Secondary Structure

Name φ ψ Structure ------------------- ------- ------- ---------------------------------alpha-L 57 47 left-handed alpha helix3-10 Helix -49 -26 right-handed.π helix -57 -80 right-handed.Type II helices -79 150 left-handed helices formed by polyglycine and polyproline.Collagen -51 153 right-handed coil formed of three left handed helicies.

Four levels of protein structure

The Universe of Protein StructuresHow many proteins in the universe?

The smallest archaea genome encodes above 600 ORFs

Pyrococcus furiosus encodes 2200 ORFs

Homo sapiens encodes around 30,000 ORFS The facts:

The number of protein folds is large but limited. the number of different protein folds in nature is limited. They are usedrepeatedly in different combinations to create the diversity of proteins found in living organisms.

The Universe of Protein Structures

Protein structures are

modular and proteins can

be grouped into

families on the basis of the

domains they contain

There are around 1000 different protein folds

The Universe of Protein StructuresProtein motifs may be defined by their primary sequence or by thearrangement of secondary structure elements

Zinc finger motif

The Universe of Protein Structures

EF-hand motif

Protein Function in Cell

1. Enzymes • Catalyze biological reactions

2. Structural role• Cell wall• Cell membrane• Cytoplasm

Structure determination by X-ray crystallography

H K L I SgimaI Phi

2 5 9 101 5

3 7 8 49 4

…

X-Ray Diffraction Data

Phasing

Fourier Transforms Model building

Refinement

InverseFourier Transforms

Phase problem: Phase angles can not be

recoded by current X-Ray techniques.

Data processing

Crystal mounting and Cryo-Crystallography

X-ray sources: Rotation anode X-rays


X-ray sources: synchrotron X-rays, 106 times stronger.

Shanghai Synchrotron Radiation Facility


Data Collection:


1. Lack of radiation damage thus increased crystal lifetime

2. Lower X-ray background and increased resolution

3. Fewer crystals required

4. Transport and ship in LN2

5. Mount when crystals are ready.

Advantages:


Mounting:

Robotic crystal diffraction quality screen

Crystal mounting robot


Bragg’s law

In 1913, William Henry Bragg (1862–1942) and his son, William Lawrence Bragg (1890–1971), derived a formula to explain the diffraction ofX-ray by crystals.

They won the Nobel Prizein physics for their seminal roles in X-rayCrystallography.

Data collection strategy and data processing

Lawrence, Henry

D

a’b’

sin2dDCBD

An incident wave (wavelength λ ） strikes the planes “1” and “2 ”

)3.2.1( k

AB and AC verticalwith lights a and a’respectively.

The condition of a constructive interference:

kd sin2This relation is called Bragg’s law.

The path difference for rays from adjacent planes：

d1

2

hd

3

A

CB

ab

2.5A

Diffraction image from a RAXIS-IV image plate

Frame Oscillation = 1o

Exposure time = 20 min

Maximum resolution = 2.4 Å


Data processing:Indexing (finding the unit cell, orientation &

space group)Integrating (determining the intensities of

each spot)Merging (scaling data, averaging data &

determining data quality)Calculating structure factor amplitudes from

merged intensities

Diffraction DataSequence

Initial Phases

Quality Control

Refinement

Validation

Phase Combination

Model Building

The steps to solve the macromolecular crystal structure

Molecular Replacement Method (MR)

Isomorphs Replacement Method (MIR, SIR)

Anomalous Dispersion Method (MAD, SAD, SIRAS)

Direct Method

Other Methods

Phasing Methods in Macromolecular Crystallography


|Fp(h)|

|FP(h)|

|FPH(h)|

The phasing problem

FH(h)

The phase ambiguity in SIR


MIR

How to break the phase ambiguity?

Fourier Transformation and Electron Density Maps

Fourier Transformation

X-Ray diffraction

Experiment

Phasing method

),,(2exp,,1

,, lkhilzkyhxilkhFV

zyxh k l

Fig. 1 Effect of chainging countor level on the electron density map. In (A) a section of aldehyde dehydrogenase[2] density at 3.0Å resolution is shown using the 0.33 sigmma for the minimium countor level. The solvent is very noisy and the difference between protein and solvent is not obvious. In (B) the minimium countor level is increased to 0.5 sigmma. The solvent is less noisy and the protein and solvent is distinguishable. In (C) the minimium countor level is increased to 1.0 sigmma. The solvent is very clean and it is very easy to identify the protein boundry.

1.0 sigmma

0.5 sigmma0.33

sigmma

A B

C

FIG. 2 Effect of increasing phase error on the electron density map.

A: Density map at 2.0 Å resolution is shown using the final refined phases

B: An average of 22˚ of random error has been added to each phase.

C: An average of 45˚ of random error has been added to each phase.

D: An average of 67˚ of random error has been added to each phase.

(“Practical protein crystallography” by D E Mcree, Page 190)

3. A good map should show clear secondary structures ( helixes or -sheet).

Model Building: Steps in making the first trace in electron

density map

(1). Generating Ca chain trace. The only rule one has to observe is that the distance between Ca atoms of adjacent residues is always approximately 3.8 Å. Try to look for large pieces of secondary structure, such as helices and sheets, to start the Ca trace.

The side chains on a helix point to the nitrogen-terminal end. Another way to put it: the -helix resembles a Christmas tree, when viewed with the N-terminal end down, and the C-terminal end up.

(2). Identifying chain direction

(3). Generating main chain trace

Main chain can be automatically generated from a well traced Ca chain by many computer programs. In helices, the side chain positions are so highly constrained that you can accurately predict the main chain and C atom positions with a refined -helix from another protein.

Example of generated α-helix and β-sheet in electron density map

(4). Fitting the chemical sequence.Finding the first match of sequence to the map is a milestone in structure determination. Some tips are listed below:Heavy atoms bind to some specific residues.Hg-Cys, Pt-Met

Start the fitting from a well defined main chain trace where the density should be clear and rich in side chain information. These regions are often located inside the molecule.

The sulfur or Se-methionines are the perfect starting point for the sequence fitting if the map is from sulfur SAS or Se-MAD phases.Tryptophan is so much larger than all the other amino acids it can often be recognized.Hydrophilic side chains are often disordered.A correct fitting should be easily extended in both directions.

Representative electron density for amino acid side chains arranged in order of increasing size.

From an experimental electron density map calculated at 1.5 Angstrom resolution.

Generating the first modelGenerate the side chains based on the fitted sequence can be automated, but the generated side chain may not point at the correct direction. In most cases, the manual adjustments are needed.

Structure Validation and Deposition Generate symmetry related molecules. The

atoms at the contacts cannot come any closer than Van der Waals packing distance.

The side chains should fit the electron density map all over the whole molecule. If the fitting suddenly becomes bad in some region, it may indicate that something wrong with the fitting.Missing density is much better than extra density. It’s rarely seen that there is a blob of extra density for Gly, Ala or Pro residue.The model should make chemical sense and satisfy all that is known about the macromolecule.

Structure Validation and Deposition

It may be useful to evaluate the overall distribution of some residues, such as hydrophobic residues, glycine, and proline.If certain residues have been identified as being in the active site, are they close together in the model?



The stereochemical parameters such as bond length, bond angle etc, should within the standard deviation from their ideal values.

The Ramachandran Plot should be normal.

http://molprobity.biochem.duke.edu/


Atomic coordinates should be deposited to Protein Data Bank

http://www.pdb.org

谢谢！

zhi-jie liu institute of biophysics chinese academy of sciences introduction to macromolecular...

Documents

protein backbone

protein structure function

single amino acid

sequence of amino acids

proteindnarna lipids

water molecules

plane of atoms bc

linked atoms