x-ray crystallography - university of texas at...

X-Ray Crystallography“If a picture is worth a thousand words, then a macromolecular

structure is priceless to a physical biochemist.” – van Holde

Science 5 January 2007:Vol. 315. no. 5808, pp. 40 - 41Chemistry Nobel Rich in Structure

…In particular, there were 12 Nobel Prizes in chemistry and physiology or medicine awarded for work in this field from 1956 to 2006 (table S1) (2). Almost one in four chemistry prizes since 1956 have been for structure work, and in the last decade, fully half have dealt with work related to macromolecular structure.

…From 1970 to 2006, 4% of all chemistry publications dealt with crystallography, yet this subfield captured 19% of the available Nobel Prizes (table S3) (2). During the past decade, crystallography papers represented 7% of all chemistry publications, but commanded 4 of 10 available prizes.

http://www.sciencemag.org/cgi/content/full/315/5808/40#ref2

http://www.sciencemag.org/cgi/content/full/315/5808/40#ref2

Structural Analysis has Revolutionized Biochemistry and the Life Sciences

The X-ray Method

1. Grow USEFUL crystals from purified protein

2. Collect X-ray diffraction data; these are only the amplitudes of a complex number

3. The PHASE problem (recover information lost in detection)

4. Build the model, validate it, and interpret the results.

Variables that influence crystal growth

1. Nature of macromolecule – Purity and concentration of macromolecule

2. Nature and concentration of precipitant

3. pH / Temperature / Pressure

4. Level of reducing agent or oxidant

5. Substrates, coenzymes, and ligands / Metal ions

6. Preparation and storage of macromolecule / Proteolysis and fragmentation

7. Age of macromolecule / Degree of denaturation

8. Vibration and sound

9. Volume of crystallization sample

10. Seeding

11. Amorphous precipitate

12. Buffers

13. Cleanliness

14. Organism or species from which the macromolecule was isolated

15. Gravity, gradients and convection

Protein solubility as a function of the precipitant concentration

Crystal growing formats

Crystals can take up one of 14 lattices.

There are primitive lattices, with operators at the corners, and centered lattices.

Symmetry operators can be combined with the various lattices to generate SPACE GROUPS.

Some Packing Examples

Down a 3-fold axis

Down a 2-fold axis of P21212

(4 au/cell; 1 mol/au)

4X-ray tubes: the “sealed” tube

Object Transform Image

Object / Real Space

Transform / Reciprocal Space

Models

Electron Density Maps

What are transforms?• Transforms convert one kind of information into another.

The two forms can work “back and forth”.• A picture from your digital camera starts as a map of

pixels, each of different color and intensity, but to store it on your memory stick, it is transformed into a jpg file. This is a set of compact rules that allow you to recover the pixel image by reversing the jpg operation.

• An X-ray scatter pattern carries information about what scattered it. We want to measure that transform and convert it back to the image of the scatterer.

Scattering, Fourier Transform, and Phases1. X-rays impinge on object from direction s0

2. Consider an electron at origin, O

3. Like all other electrons it scatters in ALL directions, but focus on s

4. All other volume elements also scatter in all directions, we show one at r1 scattering in direction s.

5. The path difference is segment Oa - r1b. That is ∆ = (r1 · s) - (r1 · s0).

6. ∆ phase = 2πr1 · (s-so/λ) = 2πr1 · S, where S = (s-so/λ). This is shown in small drawing where s0 and s are unit vectors.

7. S is the "monitor" for the scattering; it is normal to the "plane" P "reflecting" s0 in direction s. S is short when the scattering angle is low, and long when the scattering angle is high. This will correspond to low and high resolution diffraction data. S will be associated with the crystallographic index, hkl, in crystallographic "reciprocal space".

8. Total scatter in direction s arises from scattering in ALL volume elements, call it F F(S) = ∫ ρ(r)e2πirS dr

9. This has the form of a Fourier transform -- the physical rule governing scattering.

Fourier Transform and Resolution

Crystallization and Transforms

If you measure the amplitude, and have the phase, for Fourier terms,

you can use that to construct the image of the scatterer.

Representation of the electron density of a one-dimensional "crystal" by a superposition of waves. The crystal is formed by a periodic repetition of a diatomic molecule, as shown at the top of the right-hand column. The component waves, each with proper phase and amplitude, are on the left. The curves on the right show the successive superposition of the five waves on the left. (From Waser, 1968.)

As you add these waves in the 1D cell, the image sharpens up.

Phases here are:

-, +, +, -, +

Increasing Res.

Phases here are:

-,-, +, +, -

Representation of another one-dimensional crystal, this one containing a triatomicmolecule. Note that this crystal is built up from the same waves as the crystal of (a) ; only the amplitudes and phases have been changed. (From Waser, 1968.)

8Solving the Phase Problem

1. MIR: Multiple Isomorphous Replacement (Heavy Atom)

2. MR: Molecular Replacement

3. MAD: multiwavelength anomolous dispersion

1. Use of Heavy Metal Ions for Phasing by MIR Methods

Phosphorylase + Ethyl Hg thiosalicylate

Native Phosphorylase

Metals are located by Difference Patterson

• The Diff Patterson is a map generated as: P(u,v,w) = Σ( FD – FN )2*cos (hu+kv+lw)

• Once a metal is located its Fourier transform (amp and phase) can be computed; this is a vector called fH.

• The known fH can be combined with the measured amplitudes of native protein and the derivative to estimate the protein phase

A really simple exampleSome special reflections have strong phase constraints, usually due to symmetry. For example the reflections viewed down a 2-fold axis, like the h0l reflections in space group P2, are constrained to be 0 or 180 degrees. If we know fH, then getting the protein phase is simple arithmetic. We know FD=FP+FH..

Suppose we measure amp FP = 25, FD=15, and we compute fH to have amp =10 and phase 0. Then phase of the protein must be 180.

For general reflections, the math is more complicated.

2. “Molecular Replacement"If an approximate model of the protein structure is known in

advance, approximate phases can be guessed, and the unknown parts of the structure can be calculated in an iterative

procedure (model building and refinement).

No heavy atom derivative required.

BUT – need starting model and orientation

(rotation and translation)

Molecular replacement can be used to determine the structure of an complex with inhibitor bound to an enzyme active site, if the structure of the enzyme itself is already known. Also, MR is often used to solve the structures of closely related proteins

in a superfamily.

3. "Multiwavelength Anomolous Dispersion"

(MAD) methodsAdditional information used in calculating phases can be obtained

if x-ray diffraction intensities can be measured at wavelengths near the absorption edge of the heavy atom derivative.

A tunable x-ray source is required (provided by a synchrotron).

a) often only a single heavy atom derivative is required to solve a structure (selenomethionine).

b) it is possible to solve the structure of higher molecularweight molecules (such as the ribosome, at MW = 2,500,000).

Protein Models are Built from Electron Density Maps

Once phases are obtained, they are combined with amplitudes to compute an electron density map (or maps really).

The more terms included, the better the resolution. There are ~8 times more terms in a 2.5 A map than in a 5 A map. Typically a map at ~2.8 A resolution has 4 terms for each non hydrogen atom. That is, there are 4 pieces of information to help you set the x,y,z, and B of each atom.

A map for a typical protein might be 50 x 50 x 50 A, and we might compute on 0.5 A grid, ie,106 points. Each point would be a summation over ~20,000 reflections. Clearly a computer job.

Electron Density and Resolution

Protein Models Need to be Refined

• Refinement means adjusting the model so that the calculated transform agrees as closely as possible with observed transform

• Residual factor is a key parameterR=Σhkl|Fobs – Fcalc|/ Σhkl|Fobs| • Solvent needs to be added

Data Collection

Wavelength (Å) 1.5418 1.5418

Space group P212121 C2221

Cell dimensions

a, b, c (Å) 47.92, 61.46, 132.13 62.82, 74.03, 121.88

Resolution (Å) (last shell)

30 – 2.1 (2.13-2.21) 30 – 2.6 (2.59-2.68)

Rmerge (%) (last shell) 6.6 (22.9) 6.0 (14.9)

<I/σI> (last shell) 28.5 (2.6) 32.0 (3.7)

Completeness (last shell)

87.7 (29.7) 85.7 (26.3)

Redundancy 6.1 6.8

Refinement

No.reflections 17766 7261

Rworking (last shell) 0.197 (0.234) 0.197 (0.284)

Rfree (last shell) 0.232 (0.287) 0.235 (0.316)

Average B factor for protein atom (Å2)

35.4 38.8

R.m.s deviation from ideality

Bond lengths (Å) 0.011 0.014

Bond angles (°) 1.236 1.482

X-ray data statistics for 2 forms of influenza NS1 –effector domain.

Ramachandran Plot for NS1-eff domain, space group P212121

PDB is a Structural Resource

http://www.rcsb.org/pdb/home/home.do

Vector Alignment Search Tool (VAST) algorithm.

VAST Search is NCBI's structure-structure similarity search service. It compares 3D coordinates of a newly determined protein structure to those in the MMDB/PDB database. VAST Search computes a list of structure neighbors that you may browse interactively, viewing superpositions and alignments by molecular graphics

SCOP Structural Classification of Proteins

Note that these links can be expanded to view details of various structural relatives

CATH - Protein Structure Classification

CATH is a novel hierarchical classification of protein domain structures, which clusters proteins at four major levels: Class (C), Architecture (A), Topology (T), and Homologous (H) Superfamily

Class, derived from secondary structure content, is assigned for more than 90% of protein structures automatically. Architecture, which describes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually. The topology level clusters structures according to their topological connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to toplogy families and homologous superfamilies are made by sequence and structure comparisons.

x-ray crystallography - university of texas at...

Documents