diffraction analysis of biological macromolecular strucure has revolutionized biology. over fifty...

25
Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by the DNA double helix whose structure was elucidated by Watson and Crick with the help of x-ray diffraction patterns obtained by Rosalind Franklin. Within five years of that event, the first complete protein crystal structure was reported. In the last 15 years, a number of important technical advances have allowed a dramatic increase in the pace of crystal structure determination, such that presently more than ten new protein structures are deposited in the Protein Data Bank every day. One dramatic achievement was the structure of the large subunit of the ribosome, comprising over 3000 nucleotides of RNA and 31 unique proteins. As in the case of other important crystal structure determinations, this structure is dramatically altering our understanding of biological function, in this Protein Crystallography Bill Royer Office: LRB 921 Phone: x6-6912

Upload: edward-gilmore

Post on 25-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by the DNA double helix whose structure was elucidated by Watson and Crick with the help of x-ray diffraction patterns obtained by Rosalind Franklin. Within five years of that event, the first complete protein crystal structure was reported. In the last 15 years, a number of important technical advances have allowed a dramatic increase in the pace of crystal structure determination, such that presently more than ten new protein structures are deposited in the Protein Data Bank every day. One dramatic achievement was the structure of the large subunit of the ribosome, comprising over 3000 nucleotides of RNA and 31 unique proteins. As in the case of other important crystal structure determinations, this structure is dramatically altering our understanding of biological function, in this case the mechanism of protein synthesis.

Protein CrystallographyBill Royer

Office: LRB 921Phone: x6-6912

Page 2: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Voltage-Dependent K channel

Ammonia Transporter

Ribosome

Page 3: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Overview of protein crystal structure analysis.Crystals provide the very large numbers of protein molecules needed for detectable diffraction intensities. In the case of microscopy, scattering from an object is focused by a lens to obtain an image of the object. Since there are no useful lenses for x-rays, the scattering itself is recorded from the exposure of a crystal to x-rays. In a typical diffraction experiment, a crystal is exposed in many different orientations to x-rays and the intensities of thousands of diffracted x-ray maxima (commonly referred to as "reflections") are measured. Unfortunately, the intensity of each diffracted reflection is only part of the information needed to obtain an image of the contents of the crystal. The other part is a phase assigned to each reflection which is lost during the intensity measurement. A major aspect of protein structure solution is solving the "phase problem" by obtaining reasonable estimates of the phase for each of thousands of reflections. Once a preliminary image is obtained at sufficient detail (known as "resolution") an atomic model of the three-dimensional structure of the protein can be constructed. Due partly to errors in initial phases, this model will usually contain significant errors. Refinement of atomic models is performed by comparing the observed diffracted intensities with those calculated from the model. By minimizing the discrepancies between calculated and observed diffraction intensities, one can obtained a much improved model.

Page 4: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

An important place to start our discussion is to address the following questions: why are crystals used and does the structure of a protein in a crystal have any relevance to the structure of a protein in a cell? The hallmark of crystals is the presence of regular edges and faces (see below) that result from the underlying arrangement of molecules. In a suitable protein crystal, there are likely to be 1013-1014 molecules, all arranged in the same, or a symmetrically related, orientation. While the scattering of x-rays from a single protein molecule is very weak, a crystal provides many billions of protein molecules, each of which will contribute to the overall scattering thus resulting in amplification to a measurable signal.

Protein Crystals

Page 5: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

We are used to thinking about crystals as solid objects, like salt or sugar. Protein crystals are quite different. In order to fill a three-dimensional crystal formed from irregularly shaped protein molecules, the spaces between protein molecules are occupied by solvent. Typically, only small portions of the protein surface are in contact with other protein molecules, with the majority of the surface being bathed in an aqueous medium. Most protein crystals have between 40 and 60% of their volume filled with solvent (see below). Interestingly, a protein crystal is only slightly more densely packed than a cell. For instance, in the human red blood cell, about 1/3 of the mass, including water, is from hemoglobin molecules, whereas in a crystal about 1/2 of the mass is hemoglobin. This suggests an environment that is not all that different within a crystal from that within a cell. As a result, it is perhaps not surprising that many lines of evidence indicate, for the most part, that the crystal structure of a protein molecules is quite similar to its structure in solution and, presumably, in the cell. Comparison proteins from crystals with different lattices show nearly the same three-dimensional structure, indicating the crystal lattice does not significantly alter a protein molecule's structure. In addition, protein structures determined by x-ray crystallography largely agree with those determined by NMR on proteins in solution.

One of the strongest lines of evidence is that many proteins can still function within their crystal lattice. In fact, with the development of fast x-ray data collection techniques, it is beginning to become possible to watch enzymatic activity and binding processes occur within a crystal by obtaining structures at various time points (in the nanosecond to microsecond range) following initiation of a reaction. Finally, it is important to note that crystallographic structures often allow a rational interpretation of mutagenesis experiments, which would not be the case if the structures in the crystal and solution were quite different.

Packing of molecules of psoriasin in a tetragonal lattice. Note the large solvent-filled spaces.

Page 6: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Protein Crystallization

One of the major difficulties in the method of protein crystallography is the growth of suitable crystals. In order to be useful, protein crystals need to be single with dimensions preferably greater than 0.1 mm (100 m) in each dimension. For high resolution analysis, normally larger crystals are essential, although the actual size needed depends upon the protein. Growth of crystals has often been considered more of an art than a science since conditions for optimal crystal growth can be very different foreach protein. Nonetheless, extensive experience in growing diffraction quality crystals has led to a very large data base and many useful strategies for growth and improvement of protein crystals. The basic strategy is to take a pure protein solution at rather high concentration (normally 10 mg/ml or greater) and slowly force the protein to come out of solution using various precipitating agents. Commonly the protein will come out of solution with salts (ammonium sulfate, phosphate), polyethylene glycols of various sizes, organic solvents (isopropanol, 2-Methyl-2,4-pentanediol) or low ionic strength. Most of the time, and especially if precipitation is too rapid, amorphous precipitant, rather than crystal formation, results. Precipitation is normally slowed down by using vapor diffusion or dialysis to bring the solution slowly to the desired conditions. Most often, hundreds or even thousands of different conditions are attempted, by varying pH, ionic strength, precipitation agent, temperature and presence of additives, before suitable crystals are obtained. Often once small crystals are obtained these can be improved by subtle changes in conditions and/or by using them as "seeds" to provide limited nuclei for crystal growth.

Page 7: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

A

B

C

a

b

c

The Unit Cell

The basic repeating unit of a crystal is termed the unit cell. This is defined as the parallelepiped shaped box from which the whole crystal can be constructed by translations along the three major axes, termed a, b and c. The cell edges have lengths of a, b and c, and the angles between axes are given by , and defined as shown to the right. ( is the angle between axes b and c, defines angle between axes a and c, while defines the angle between axes a and b.) Most often the unit cell will possess symmetry that relates an integral number of copies of what is termed an asymmetric unit. The asymmetric unit is the unique portion of a crystal, often corresponding to one protein molecule, or a multiple of protein molecules.

The Unit Cell

As will be discussed later, the basic properties of the unit cell are obtained from the initial analysis of the diffraction pattern. From this information, one can determine properties of the packing of molecules in the crystal including the percent of solvent in the crystal. This is done by calculating the volume of the cell and the volume occupied by the number of protein molecules per cell.

Page 8: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

X-raysWhy are x-rays used? X-rays comprise the electromagnetic radiation which covers the

spectrum between ultraviolet light and gamma radiation, that is wavelengths from about 0.1Å to 100Å. In order to resolve atoms in a molecular structure, one needs to use radiation with a wavelength that is on the order of the distance between atoms. Since bonded carbon atoms are about 1.5Å apart, we need radiation wavelengths that are comparable. The most common wavelength of x-rays used for diffraction experiments is 1.5418Å which can be conveniently generated from a Cu anode. (In contrast, visible light is electromagnetic radiation with wavelengths from about 4000Å to 7000Å.)

Principles of diffraction Shown to the right is a diffraction pattern from a protein crystal (lysozyme). The pattern consists of a regular series of spots with varying intensity. The position of each spot is determined entirely by the packing of the protein molecules in the crystal. The intensities reflect the underlying molecular structure and the symmetrical arrangements within the crystal. In the pattern, note four-fold symmetry that reflects the four-fold arrangement of lysozyme molecules in the crystal. We will first discuss how the packing of molecules leads to the observed positions of the spots. Extraction of information from the intensities of the reflections, requires solution of the phase problem which will be discussed shortly.

Page 9: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

e-

I21/m2

We now come to a consideration of how the crystal lattice dictates where diffraction maxima occur. X-rays, being electromagnetic waves, will induce an oscillation in charged particles, such as electrons and protons. These oscillating particles will then emit scattering in all directions. As suggested above, the

X-ray Scattering

scattering intensity at a given angle (2) is inversely proportional to the square of the mass of the particle. Since protons are about 1,800 times heavier than electrons, the scattering from an electron will be about 3.4 million times more intense than that from a proton. Therefore, we can neglect proton scattering and consider that all observed scattering (diffraction) is derived from electrons in the crystal. For this reason, the images (maps) we obtain from a crystallographic investigation will map the density of electrons in the crystals, and therefore they are termed "electron density" maps.

Page 10: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

e-

e-

Consider the scattering from two electrons as a function of diffraction angle. We must now take the wave nature of x-ray radiation into account. Critical for the effect of how two waves will interact is the phase difference between two waves. If two waves are "in phase", their crests and troughs are aligned. Addition of two waves that are in phase will lead to amplification of the signal, called "constructive interference". If two waves are completely "out of phase" (crests of one lining up with the troughs of the other) the waves will cancel out and no scattering will be observed (destructive interference). Scattering from the pair of electrons will alternate between maxima and minima as a function of scattering angle as the two waves alternate from being in phase and out of phase.

Waves in phase or out of phase?

ConstructiveInterference

destructiveInterference

Page 11: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

dhkl

dhkl

dhkl

dhkl

dsin

total path difference = 2dsin

n= 2dsin

n/d= 2sin

In 1913 William Bragg and his son Lawrence Bragg derived a fundamental equation that reveals the conditions under which diffraction from a crystal will occur. They evaluated the conditions under which the planes of the crystal lattice would lead to the scattered x-rays being in-phase compared with conditions where the scattering would be out-of-phase and thus be unobserved. In their ground-breaking analysis, the Braggs considered the scattering to be reflections from repeating planes within a crystal lattice. Shown to the right is a schematic diagram of their analysis. A given set of lattice planes has a spacing of dhkl. X-rays impinge upon these planes at an incoming angle of with scattering "reflected" off at an angle of relative to the planes, so the overall scattering angle considered is 2. At what values of would the waves be in-phase? The waves reflected from the first plane and the second plane undergo a total path difference of 2 times dhkl. times sin. If this distance is equal to an integral number of wavelengths, the scattered waves will be in-phase. This leads to the famous Bragg equation n=2dsin. If this distance is 1/2 wavelength, the scattered waves will be totally out-of-phase.

Bragg’s Law

Page 12: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

When one considers not just two planes, but rather the several thousand in each direction of a crystal, the only measurable diffraction occurs when Bragg's Law is obeyed. As a result, measurable diffraction from crystals only occurs at discrete locations, as seen to the right. Note that Bragg’s law indicates an inverse relationship between the distance between planes and the scattering angle (given by sin): n/d = 2sin/. As a result, crystals with large unit cell dimensions will have closely spaced reflections in their diffraction patterns, and those with small unit cell dimensions will have widely spaced reflections. Bragg's law also gives rise to our designation of the "resolution" of the structure. When a structure is quoted as being analyzed at 2Å resolution, what this really means is that diffraction spots are included out to spacings of 2Å (according to Bragg’s law).

Bragg’s law dictates how the planes of the crystal cell determine the position of the reflections in a diffraction pattern. As a result, from observing the position of the diffracted x-rays, one can determine the crystal spacings. However, what we are really interested in is determining the structure of the protein in the crystal. Each individual reflection provides information on a unique orientation through the crystal. In this way, a diffraction pattern can be considered to be similar to a CAT scan, with each reflection providing one view or projection of the structure. In order to obtain a three-dimensional structure, one needs to have information from all views. The intensity of the diffracted rays provides some of this information. Unfortunately, as described next, the intensities are only half of the information needed to determine the crystal structure of the protein.

Page 13: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

In light microscopy, light is scattered from the object of interest an then recombined by a lens to obtain a magnified image. In x-ray crystallography, the scattered x-rays cannot be recombined directly because there is no satisfactory material to make an x-ray lens. Rather the scattered rays themselves are measured and then recombined to form an image computationally. During the x-ray data collection procedure, only the intensity information for each reflection is measured. All phase information is lost. The phase problem refers to the need to determine, for each reflection hkl, not only the measurable intensity, but also a phase associated with it. This often means the determination of tens of thousands or even hundreds of thousands of phases, one for each unique hkl reflection. In this section we will briefly discuss two experimental methods (MIR and MAD) and one computational method (Molecular Replacement) for solving the phase problem.

The Phase Problem

Page 14: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

The relative importance of a reflection’s phase and amplitude are tested in an example shown below from small molecule crystallography. The first image shows the very high resolution electron density of this molecule using correct amplitudes and phases in the Fourier calculation. Note the very clear outline of the atomic structure. Next is shown a map derived from correct amplitudes, but setting all phases equal to zero. In this map, despite the use of correct amplitudes the map is totally uninterpretable in terms of the atomic structure. In the image to the lower right, a map is shown that was calculated with correct phases, but with amplitudes that were all given the same constant value. Note how this map, while not perfect, has many correct features and might actually be able to be interpreted in terms of the correct structure. This is because the phase information correctly allows positioning of the density from the Fourier waves, even though the actual amplitude of the waves is incorrect. This, then, illustrates that the phase, which cannot be directly measured, is even more important than the amplitudes, which are readily measured. Thus, correct determination of phases is critical to a correct crystallographic analysis of a protein structure.

Correct Amplitutes & Phases

Page 15: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

-SH

-SH -SH

-SH -S-HgCl

-S-HgCl -S-HgCl

-S-HgCl+ HgCl2

Native Crystal Heavy Atom Isomorphous Derivative

FP + FH = FPH

Multiple Heavy Atom Isomorphous Replacement (MIR)

In the MIR method, the diffraction pattern of a protein crystal is perturbed by specific labeling with heavy atom compounds. Electron dense atoms, such as mercury, gold, platinum, lead or uranium are best. Shown to the right is a protein labeled at free cysteine positions by HgCl2. The diffraction pattern of the derivative is different from the native (unlabeled) protein only by the presence of one HgCl at the same position in every molecule. The structure factor amplitude of each reflection in the derivative's diffraction pattern is the vector sum of that from the protein (FP) plus the contribution of the heavy atom alone (FH). (The structure factor amplitude is proportional to the square root of the intensity of a reflection.) The method requires that the two crystals be isomophous meaning that the only difference between native and derivative crystals is the presence of the additional heavy atoms. A single heavy atom derivative, in general, is not sufficient to solve the phase problem, as one is left with a phase ambiguity for most reflections. This phase ambiguity can be broken by the use of additional, different, heavy atom derivatives. This method was pioneered by Max Perutz at the MRC.

Page 16: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

-SeCH3

-SeCH3 -SeCH3

-SeCH3

The use of anomalous scattering is similar to MIR in that small changes in diffraction intensities are used for phase information, but in this case the measured intensity differences can be obtained in a single crystal. Anomalous scattering results when the energy of incoming radiation is near the binding energy of an electron. When the binding energy of an electron is similar to the energy of the x-ray photons, resonance effects result in alterations in the way x-rays are scattered. Significant resonance effects only occur with heavier atoms, not with hydrogen, carbon, nitrogen or oxygen, the primary components of protein molecules. Maximum use of anomalous scattering for phase determination is obtained by collecting data at different x-ray wavelengths (energies) at a synchrotron source. Each chosen wavelength will have different anomalous effects, and thus will be equivalent to one derivative, with multiple wavelengths equivalent to multiple heavy atom derivatives. Perhaps the most elegant application of this method is to substitute selenomethoine for regular sulfur methionine by expressing the protein in a Met auxotroph strain of E. coli in thre presence of selenomethionine. This is the most general approach for solving the phase problem for a protein of unknown structure. This method was pioneered by Wayne Hendrickson at Columbia University.

MultiwavelengthAnomalous

Diffraction (MAD)

Typical wavelengths for measurements with Selenium (Se-Met)

1 = 0.9000Å

1 = 0.9795Å

1 = 0.9802Å

Page 17: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Molecular Replacement

Crystal A Crystal B

xB = R()xA + t(x,y,z)

The method of Molecular Replacement is a computer technique by which the crystal structure can be determined if the protein has a similar structure to one already known. For example, it can be used to determine the structure of a mutant protein or a complex of two proteins if each has a known structure. Computationally, the known structure is rotated, then positioned in the unit cell to best match the observed diffraction pattern. Key to its success is the ability to separate the two steps such that the best orientation can be found first and used when positioning the molecule in the unit cell. This technique was pioneered by Michael Rossmann

at Purdue University.

Orientation determined first by

“Rotation Function”

Position second by “Translation

Function”

Page 18: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

The quality of an electron density map depends upon two factors: the accuracy of the phasing and the resolution of the structure. Resolution refers to the detail of the map. The concept of resolution is apparent from a consideration of the scattering from a duck. Shown to the right is a duck (A) and its diffraction pattern (B). If one were to calculate a Fourier Transform using the entire diffraction pattern, the original duck would be generated. However, if only the center region of the diffraction pattern as shown in C was used, a lower detailed picture of a duck would be obtained (D). Using the even smaller center region as shown in E, would lead to a very low resolution duck image. In this image (F), one can see only the gross shape of the original duck.

Resolution of a Crystal Structure

Page 19: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Turning now to protein crystallography, consider the diffraction pattern below. In a crystallographic analysis, the value of resolution is obtained from the minimum Bragg spacing (highest 2 angle) of reflections that are included in the analysis. The resolution can be directly calculated using Bragg's law (= 2dsin) where corresponds to the highest angle of any reflection used in the map calculation. One way to consider the effects of resolution is that increasing the resolution increases the number of Fourier terms (reflections) that contribute to the structure. For example, consider a crystal that has 1000 unique reflections to a minimum Bragg spacing of 6Å, that is to say 1000 reflections would be used for a structural analysis at 6Å resolution. (This is approximately the number of unique reflections that would be expected for a crystal that has a 35kDa molecule in its asymmetric unit.) The number of reflections to 3Å would be 1000 x (6Å/3Å)3, or 8,000 reflections. The factor 6/3 measures the number of additional reflections in each direction going from 6Å to 3Å, it is raised to the 3rd power because we must consider the three-dimensional nature of the diffraction. (In the precession pictureshown below, the edge of the data shown corresponds to Bragg spacings of 3Å.) The number used in a 2Å analysis would be 1,000 x 33 or 27,000 reflections. A 1.5Å analysis would include 1,000 x 43 or 64,000 reflections. Increasing the number of terms provides a great deal more detail about the structure. Normally, it is the order of the protein crystals that limits the resolution of the analysis. Probably fewer than half of all protein crystals used for structural analysis diffract to better than 2.0Å. (It is important to remember that high resolution refers to a large diffraction angle and thus a small Bragg spacing, such as 1.5Å, while low resolution refers to a smaller diffraction angle and, thus, a large Bragg spacing, such as 6Å.)

Page 20: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Let us now consider the effects of adding more terms to the quality of electron density maps. Shown below are electron density maps of an -helix. At 5.0Å resolution the helix appears as a rod of density. In fact, at 6.0Å resolution, and even lower, helices show up as clear rods of density. It was very fortunate for the history of protein crystallography that the first two protein crystal structures determined by Kendrew and Perutz, myoglobin and hemoglobin, had very high -helical content. These structures were determined first at about 6Å resolution allowing the course of the polypeptide chain to be followed, and it was clear from these low resolution studies that myoglobin had a similar fold to that of the hemoglobin subunits. The success at low resolution provided the impetus to continue to higher resolution. Another important example is in the analysis of bacteriorhodopsin by electron crystallography in the 1970's. In a spectacular feat, Unwin and Henderson obtained a 7Å structure that clearly showed it was formed from 7 -helical segments which spanned the membrane, and provided the first structural insight into membrane proteins. If bacteriorhodopsin spanned the membrane with -sheets, as porin does, very little could have been discerned from the analysis. As the resolution is increased to 4Å, under optimal conditions the helical path of the main chain might be. evident, while at 3Å it would be quite clear. Also at 3Å resolution, side-chain density becomes evident, especially for large residues. Often initial phasing at 3.0Å suffices for following the course of the peptide chain, although errors can occur. (However, it is essential that the protein sequence is known.) Higher resolution maps provide significantly more accuracy. For instance, at 2.5Å, clear bumps appear for carbonyl oxygen atoms, which allows the direction of the chain to be unambiguous. At higher resolution, details become even clearer. When refining structures, it is very advantageous to use data to better than 2.5Å, preferably to 2.0Å. Refinement at 1.5Å can often lead to molecular models with error estimates of less than 0.2Å. (The errors are much smaller than the resolution because the centroids of the atom positions can be located much more precisely than the minimum Bragg spacing.) Additionally, the ordered solvent structure, which can play a very important role in protein function, can be reasonably well determined at a resolution of 2.0Å and quite well determined in a structural analysis at 1.5Å resolution.

Page 21: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

6 Å resolution crystal structure: -helices show up as rods of density, Fe atoms as dense spheres

3 Å resolution crystal structure: main-chain and side chain features can be discerned, but care, and prior stereochemical knowledge is needed for full interpretation of structure.

0.8 Å resolution crystal structure: individual atoms can be unambiguously located, bonding geometry is clear

Examples of Crystallographic Resolution

Page 22: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by
Page 23: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Crystallographic Refinement:  Once an atomic model has been fit to an electron density map, the expected diffraction from such a model can be calculated. The agreement between the calculated and observed structure factors is usually monitored during the course of refinement with the R-factor:

R = ||Fobs| - |Fcalc|| / |Fobs| R-factors for an initially fit model are generally around 40 - 50% (0.4-0.5) but with refinement the R-factor can be lowered to values often under 20%. (Random R-factors would be about 58%. Of course, a perfect match of the data would give an R-factor of 0.0%.) Refinement based on this agreement also makes use of known stereochemistry (bond lengths, bond angles, van der Waals radii) in order to achieve convergence of a reasonable structure. The classical method for refinement uses a least-squares minimization where the diffraction data and stereochemistry are both used as restraints to be minimized. In the last decade molecular dynamics routines have been incorporated in the refinement procedure and allow a much greater radius of convergence from the initial atomic model.

  A particularly useful statistic for monitoring the progress of refinement is known as "Free-R" cross-validation. In this method, a small fraction (usually 5-10%) of the reflections are not directly used in the refinement but are saved as a test set. Since the discrepancy between calculated and observed amplitudes of these reflections are not minimized, the R-factor (see above) calculated with these reflections will not be quite as good as that with the other reflections. This provides an independent measure of the progress of the refinement. Normally, the Free R should be no more that 5-10% greater than the conventional R-factor, and it can be a very powerful method for revealing errors in the structure.

 

 

Page 24: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

All refined published structures are (should be) deposited in the Protein Data Bank. From this site, you can view structures or, better yet, down load them for viewing

with other software, such as PYMOL.

http://www.rcsb.org/

Page 25: Diffraction analysis of biological macromolecular strucure has revolutionized biology. Over fifty years ago, modern molecular biology was ushered in by

Quality of a Crystallographic Structure:  How can one judge the quality of a reported crystallographic structure? Probably the simplest way is to consider how much observed data was used in the analysis and how well the model agrees with the observed data. The resolution of the analysis (assuming reasonably complete data) is the best indicator of whether sufficient quantity of data is available. If the analysis is at a resolution of 2.0Å or better, normally a structure should be quite accurate. Medium resolution structures, in the range of 3.0Å to 2.5Å, will tend to be less accurate, and more precautions need to be taken in the refinement. However, with proper analysis, medium resolution structures can still be quite accurate, with overall coordinate errors less than 0.4Å. Even lower resolution structures can still be very useful, especially if there is significant redundancy allowing molecular averaging in the "asymmetric unit" of the structure, such as in the case of virus structures. For agreement of the observed data, the most commonly used statistic is the R-factor. The "Free R-factor" (see above) is a much more useful measure than the conventional R-factor. One would expect the conventional R-factor to be about 0.20 (20%) or lower and the Free R to be less than 30%, and optimally in the range of 25% or even lower. In the few cases of well documented incorrect published structures, the analysis was carried out at medium resolution and the atomic models were refined to conventional R-factors of between 25-30%. These structures were done prior to the introduction of the Free-R, which undoubtedly would have revealed the errors prior to publication. However, a careful analysis of the stereochemical details, including the number of water molecules incorporated into the model, could, and did, raise questions about the accuracy of the reported model in each of these studies.