Structural Biology
8/27/10
Why determine structures?
Visualize primary sequence in context of folded protein (buried vs. solvent exposed)
Highlight residues important for intermolecular interactions (co-crystals, packing, or computational (docking))
Allow for the design of properly folded mutant proteins
Visualize surface features to aid in identifying or designing binding partners(e.g. clefts, promontories, hydrophobic, or specifics of fold)
Provide a template for modeling studies to understand the function of related molecules
Allow use of structural databases to gain insight into function/evolution
Structural Biology techniques
● Electron microscopy ≈ 5Å ?
● NMR equivalent resolution ≈ 2Å
● X-ray crystallography ≈ 1Å
● Hybrid techniques EM + NMR/Crystallography
4.0 Å
3.0 Å
1.8 Å
1.0 Å
Molecules
secondarystructureelements
residues
atoms
Low
HighR
esol
utio
n
Center for Biological Sequence analysis DTU
Resolution particulars
X-ray crystallography & Nuclear Magnetic Resonance (NMR)
X-ray crystallography utilizes information gleaned from bouncing X-rays offan ordered array of molecules. NMR utilizes information about magnetic
environment of nuclei with non-zero spin
NMR provides several snapshots of the object of interest all ~equally validX-ray crystallography provides one snapshot of the object of interest.
NMR cannot be practically used for large molecules (at least not yet). X-ray can be used for even very large molecules and complexes.
Most importantly, structures that have been determined using bothtechniques are very similar!
NMR(very basic drill)
Purify proteinADC
PRE-AMPRECEIVER DETECTOR
TRANSMITTER
CONTINUOUS REFERENCE
BINARY NUMBERS TO COMPUTERS
= 500 MHz = 500,000,000 Hz
499,995,000 < o < 500,005,000 Hz
sample
PROBE
Magnet
+-
+-5,000 Hzo- =
Collect data
Analyze Data (make assignments) Apply distance constraintsand calculate structures
Varian Inova-600 spectrometer
NMR - How it works
NMR uses the behavior of nuclei with magnetic moments in an applied magnetic field.
For a given type of nucleus (1H), introduce RF radiation and excite transitions of nuclei from low to high energy state. Monitor emittedRF radiation as nuclei descend to low energy state (decay).
Biochemistry 5th edition,Berg, Tymoczko&Stryer
FID (Free induction decay)
NMR (continued)
De-convolute (separate) all the RF emissions from the FIDto get a spectrum
Individual nuclei transition at slightly different frequencies (resonances)depending on their chemical environments (electron clouds, other nuclei). The difference in resonance frequencies of nuclei from those of the same nuclei in a standard compound, are called chemical shifts. Therefore,Each protein has a unique spectrum for a given nuclei (1H, 13C, 15N,etc)
Example reference compound trimethylsilane (TMS)
A nuclear Overhauser effect (NOE) experiment give peaks between protons that are close in space even though they’re not bonded.
A correlation spectroscopy (COSY) experiment results in peaks between protons that are connected through covalent bonds. In this way, individual amino-acids have a characteristic signature (i.e. Ala vs. Ser).
Intro to Protein Structure, Branden & Tooze
By using COSY and NOESY experiments, one can identify various AAs and their neighboring AAs (sequential assignment). Once assignments are made,NOE info gives distance constraints. Distance constraints between atoms,once the atoms have been identified, reveal the structure!
Refinement is used in conjunction with known geometricand energetic constraints (in addition to the acquired distance constraints).
Because of the limited number of distance constraints and the natureof solution-structure determination, one ends up with a set of structuresthat satisfy the distance criteria. So called “lowest penalty structures”.
Kim et al.,Nature 404, 151 - 158 (09 March 2000)
X-ray crystallography(basic drill)
Grow crystals Collect diffraction data
Solve structure
Protein phase diagram
nucleationClear
metastable
[protein]
[pre
cipi
tant
]
(constant temperature, pressure, pH)
precipitate
undersaturated
How do get a protein crystal?This is the hard part!
Start with very pure protein
Get a supersaturated solution
Wait (sometimes a long time!)
Keep trying….
Crystallization
Most common technique is vapor diffusion.
Reservoir (0.5 mL of 20% PEG 8,000, 200mM MgCl2, 100mM Tris pH 8)
Drop (2L protein (20 mg/mL), 2L Reservoir solution)
Cover with clear tape and place at RT or 4ºC
Reservoir will slowly pull water out of drop and drop will concentrate.Hopefully you’ll get crystals. Many commercially available screens.
0.3mm
If you want to make a well-behaved, soluble expression construct spanning a region of a protein with unknown structure, you would:
C) Use several different 2º structure prediction algorithms with your sequence of interest and any homologs
E) Make several different constructs with different starts and stops
A) Do a data base search to identify other proteins with similar protein sequences
D) Compare all of these 2º structure predictions (and decide!)
F) All of the above
B) Use several different sequence alignment algorithms to align any homologous sequences
But remember, before you try crystallizing…..
Then you must work out expression, purification details!
When X-rays shine on atoms, the atoms become new sources ofX-radiation. Each atom reflects X-rays in all directions. There is structural information in the “scattered X-rays, but it’s too weak when the atoms are from just one protein molecule.
A crystal aligns a very large number of molecules in the same orientation.
This provides the potential for a much stronger signal than when usingjust one molecule.
X-rays
crystal
Scattered X-raysreinforce in certaindirections and cancelin most others
Home X-ray setup
Cryo-protected crystal in rayon loop
Another way of thinking about it…
Crystal is composed of many families of “planes” of atoms. Each family of planes are parallel and each is separated fromthe next by a specific distance “d”. Reflection of X-rays from these planes is reinforced when the geometric situation pictured above is achieved.
Bragg’s law - 2dsinθ = nn usually = 1, is wavelength and is known
Two dimensional crystal
“a” and “b” are the lengths of the sidesof the unit cell (each unit cell in black).O is the origin.
The sets of planes (green, blue, pink)are called Miller planes. The green setintersects the cell edge “a” at a=1/2and cell edge “b” at b=1. Therefore,the green set of planes are the (2,1)of Miller planes. What you do is invertthe 1/2 and it becomes 2. If the planesintersected “a” at 1/3, and “b” at 1/4,they would be the (3,4) family of Millerplanes. Etc. You just look at the unitcell in the upper left corner – The planesare drawn in all the cells to show theyintersect all the cells in the same way
Note: if you slowly rotated this crystal in the X-ray beam, you would satisfythe requirements of Bragg’s law. Each set of planes would diffract in differentdirections.
This is a real diffraction pattern of a crystal in a special orientation(X-rays are being shined directly into the side of a unit cell)
a
b
h k lGreen (2,1,0)Blue (1,1,0)Pink (1,-1,0)Orange (-4,4,0)
Every reflection arises from a different set of Miller planes.Every reflection has an index h,k,l – no two are the same.
So since we know the crystal to “film” distance, the wavelength, and where the spots are on the film, we can use geometry and calculate the size of “a” and “b”.
This diffraction shows thatthis crystal has systematic absenses. But given the regularity of the diffractionpattern, we can easily measurethe spacings along “a” and “b”.
Direct beam
Where (1,0,0)would be
1.5 mm
X-rays80mm
detector
(1,0,0) 1.5 mm
So 1.5/80 = tan 2θtan 2θ = 0.018752θ = tan-1(0.01875) = 1.074º θ = 0.537º
2dsinθ = 1.5418Å (CuK)Re-arranging:d = 1.5418Å/2sinθSolving:a = 82.2Å
We can do the same for b and c. Actually, programs have gotten sosophisticated, you feed any random orientation picture to a programand it scans the image, finds the spots and uses them to determinea,b,c, and any angles between them and the lattice type, the symmetryand the orientation of the crystal! In other words, the program knowsthe Miller indices for all the spots.
So you simply start turning the crystal and collecting images. For example, you turn the crystal 1º and take a 1º oscillation picture.Do this for 180º, and you have a full data set.
Integrate spot(Add counts in
Pixels)
Integrate background(then subtract from spot)
Do this for all (e.g.) ~40,000 spots in you data set.
Now you must scale the spots from one image to the next (sometimes your shooting through a thicker part of the crystal etc.)
When all spots have been integrated and scaled, you have a data set.
Now each spot (h,k,l) should really be considered to be a wave. The intensity of the spot is the amplitude and the number of oscillationsacross the unit cell is revealed by its Miller indices. The (1,0,0) reflection would have one wavelength (of a sinusoidal wave) in the unit cell along the a direction, the (2,0,0) would be two wavecrests, etc.
These waves can be added together – sometimes reinforcing, some-times cancelling out. When they’ve all been added together, theydescribe the shape of the “thing” that scattered them originally.
X-ray diffraction data
2 0 3 1483.63 -1 -3 19999.9 3 -1 -2 6729.63 -1 -1 30067.13 -1 1 8227.03 -1 2 29901.53 -1 3 24487.53 -1 4 502.1
h k l I
Each data point has indexand intensity
Bragg’s Law: n = 2d sin
Now all we need is the “phase”for each data point (reflection)
3 dimensions
f(x) = F0cos2(0x + 0)+
F1cos2(1x + 1)+
F2cos2(2x + 2)+
F3cos2(3x + 3)+
F4cos2(4x + 4)+...
Fncos2(nx + n)
f(x) = Fhcos2(hx + h)h=1
n
Gale Rhodes Crystallography Made Crystal Clear(2nd edition)
Fourier Series(1D example)
The only trouble is, we must know the offset (phase) for each of the waves. In the previous 1D example, the phases were either 0º or180º. Remember, we have ~40,000 of these “waves”. We know howtall they are and we know their wavelengths, but we don’t know the phases. The so-called “Phase Problem”
origin
width of cell
1 wavelength
origin
width of cell
This ? Or this ?
One way to address this is to introduce a “heavy” atom into a crystal and collect another data set (say HG dataset).
Now sinusoidal waves can also be represented as vectors.
0, 360
= 45º
The length of the vector is the amplitude ofthe wave, the direction is the phase.
Now we have two data sets. One set is HGthe other is native (NAT). We can use a techniquecalled the Patterson function to locate thecoordinates of the Hg atom. The Pattersonfunction doesn’t require phases.
Once we locate (in x,y,z) the Hg atom, we actually know its contributionto each diffraction spot – its little vector!
Now Miller indices are a very convenient way of thinking aboutdiffraction from a crystal. A more accurate way of thinking about what makes a given data point (h,k,l) relatively intense or weakis given by this formula:
Fhkl is a vector. It is the sum of all the little vectors from all the atomsin the cell. But we have located the Hg atom so we know its x, y, z. So we know the direction and the phase for the contribution to the reflection made by the Hg atom! We will call this Fhg.
FhgFhkl
So what we have are a bunch of |Fhkl|s – we have magnitudesbut not directions. So we will represent them as circles with radii thatare proportional to their magnitude.
|FNAT|
Native reflection hkl
|FHG|
HG reflection hkl
And for the hkl reflection, we know vector Fhg (Note an Fhg for each hkl)
We also know that |FNAT| + Fhg = |FHG|
Or |FHG| - Fhg = |FNat|
|FNAT||FHG|
Fhg
-Fhg
-Fhg
Native HA
|FHG| is offset by -Fhg
Another derivative (or other help)…
Structure solved at CAMD
IQGAP1 “GAP-related domain”43kD
IQGAP1 GRD vs p120 RasGAP
HIV matrix
Tiam1 Rac1