csic€¦ · edm techniques • suppose we have successfully phased by one of . ab initio, sad-mad,...

60
New Phasing Strategies Carmelo Giacovazzo Istituto di Cristallografia, CNR, Bari, Italy [email protected] MCS2016 MADRID 25-29 May 2016

Upload: others

Post on 10-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

New Phasing Strategies

Carmelo Giacovazzo Istituto di Cristallografia, CNR,

Bari, Italy

[email protected]

MCS2016 MADRID 25-29 May 2016

Page 2: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Ab initio Patterson techniques: based on superposition techniques

and extrapolation of structure factors Tests on proteins with:

• - data at non-atomic resolution • (1.5 Å<RES<2.1Å) • - number of n.H atoms in the asymmetric unit up to

about 7890 • - an atomic species from Fe up

Page 3: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The superposition method is based on: Patterson method provides the interatomic vectors.

If rH is located then

provides say the structure,together with big noise. extrapolation techniques will be explained later

{ } natomsji ,.....,1, ,j-i =rr

{ } ..1,........ji, ,j-i H natoms=+ rrr

{ } natoms 1,....,i ,i =r

Page 4: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Code Name PDB Code Nasym (Resid) HA RES (RESEff) RSh (%) VIVAX 1Z1Y 3243 (372) 9 Yb 2.00 (2.00) 20

CATHE 2ATO 1648 (215) 1 Au 2.00 (2.00) 18

PROTA 1BUU 1283 (168) 1 Ho 1.92 (1.93) 21

CYANA 1F2W 2077 (259) 2 Hg 1 Zn 1.90 (1.95) 13

RUBR_NI 1R0J 420 (54) 1 Ni 1.90 (1.95) 18

R2YFD 1YFD 6115 (750) 13 Hg 4 Fe 1.90 (1.90) 24

R2JPR 1JPR 6112 (750) 14 Hg 4 Mn 1.88 (1.89) 23

R2PM2 1PM2 5562 (678) 14 Hg 4 Yb 1.80 (1.80) 27

SUBMBPA 1YTT 1770 (115) 4 Yb 1.80 (1.80) 28

CARBOPEP 1ARM 2463 (309) 4 Hg 1 Cu 1.76 (1.80) 27

CALMOPB 1NOY 2339 (296) 14 Pb 1 As 1.75 (1.75) 29

LYSOH87 1H87 1054 (129) 2 Gd 1.72 (1.72) 32

CA2F14 2F14 2123 (259) 1 Hg 1 Zn 1.71 (1.71) 30

CUTA1 1NAQ 5832 (672) 10 Hg 1.70 (1.81) 24

Page 5: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

ADP_GD 2YZW 2325 (303) 2 Gd 1.70 (1.70) 33

RUBR_CO 1ROH 420 (54) 1 Co 1.70 (1.70) 32

OXA10 1E3U 7890 (984) 8 Au 1.65 (1.65) 36

R2JQC 1JQC 6111 (750) 13 Hg 4 Mn 1.61 (1.61) 38

APHA 1N8N 1654 (212) 1 Au 1.60 (1.71) 31

CA2G0E 1G0E 2072 (260) 1 Hg 1 Zn 1.60 (1.65) 36

FERRE170 1A70 735 (97) 2 Fe 1.60 (1.62) 34

RUBR_HG 1ROG 420 (54) 1 Hg 1.60 (1.61) 42

RUBR_GA 1ROF 420 (54) 1 Ga 1.60 (1.61) 38

PYRR 1A3C 1416 (181) 2 Sm 1.60 (1.60) 30

LIPID 2FJ9 700 (86) 1 Pb 1 Zn 2 Cl 1.56 (1.56) 45

PAZUR 1PAZ 917 (123) 1 Cu 1.55 (1.56) 43

PCOC 1IX2 1335 (208) 8 Se 1.54 (1.55) 38

FERRICYTO 1CCR 907 (112) 1 Fe 1.50 (1.69) 38

MUTA 2FP1 2606 (166) 2 Pb 1.50 (1.53) 42

RUBR_CD 1ROI 420 (54) 1 Cd 1.50 (1.53) 50

PDK1 1W1D 1289 (151) 1 Au 1.50 (1.50) 45

Page 6: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Code Name CORRSIR2007 CORR CORRextr Trial Time

VIVAX 0.40 0.33 0.42 1 − CATHE 0.21 0.21 0.23 16 − PROTA 0.78 0.75 0.78 1 1h6’ CYANA 0.28 0.27 0.35 1 − RUBR_NI 0.22 0.21 0.23 27 − R2YFD 0.55 0.49 0.66 2 24h44’ R2JPR 0.54 0.61 0.68 1 2h49’ R2PM2 0.65 0.58 0.71 3 35h42’ SUBMBPA 0.68 0.71 0.73 1 2h18’ CARBOPEP 0.56 0.45 0.66 1 2h31’

CALMOPB 0.75 0.74 0.75 1 1h7’ LYSOH87 0.43 0.35 0.55 3 − CA2F14 0.37 0.31 0.42 1 − CUTA1 0.65 0.68 0.74 1 6h47’

Page 7: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

ADP_GD 0.65 0.54 0.70 1 2h04’ RUBR_CO 0.28 0.31 0.73 9 22h27’ OXA10 0.79 0.75 0.79 1 1h18’ R2JQC 0.66 0.66 0.77 1 2h7’ APHA 0.26 0.33 0.83 34 864h CA2G0E 0.55 0.63 0.74 1 1h27’ FERRE170 0.29 0.73 0.82 1 1h14’ RUBR_HG 0.74 0.81 0.85 1 17’ RUBR_GA 0.78 0.78 0.78 1 9’ PYRR 0.75 0.48 0.75 13 58h47’ LIPID 0.66 0.68 0.76 1 48’ PAZUR 0.84 0.83 0.86 1 14’ PCOC 0.78 0.80 0.81 1 1h37’ FERRICYTO 0.73 0.33 0.78 1 46’ MUTA 0.77 0.72 0.77 1 17’ RUBR_CD 0.30 0.79 0.82 3 1h38’ PDK1 0.65 0.65 0.76 1 11’ PPEC150 0.79 0.27 0.80 2 3h52’

Page 8: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Direct Methods at non atomic resolution

A new formula has beeen recently developed, exploiting the model progressively available during the phasing process . The formula is the following:

Page 9: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

hhh ab /tan =ξ

{ +−

= hhhh

hhh pp

A

A Re

Rb φσ

σsin2

2

)sin()sin([, khkkhkkhkhkkhkkh −−−−− +++− ppA RRRR φφσφφβ

})]sin()sin( khkkhkkhkkhkkhkk −−−−− +−++ ppppAAppA RRRR φφσσφφσ

{ hhhh

hhh pp

A

A Re

Ra φσ

σ cos2 2−=

)cos()cos([, khkkhkkhkhkkhkkh −−−−− +−++ ppA RRRR φφσφφβ

})]cos()cos( khkkhkkhkkhkkhkk −−−−− +++− ppppAAppA RRRR φφσσφφσ

Page 10: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• We will apply the formula to the set of structures with data resolution between 1.5 Å and 2.1Å , heavy atoms larger than Ca included in the structure, to which Patterson methods were successfully applied.

• Showing that our approach succeeds should weaken the common believe that atomic resolution is a necessary ingredient for the ab initio phasing by Direct Methods.

Page 11: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The three tangent procedure • After the first tangent ( from random phases),

phase estimates are processed by edm+FL cycles,

• then a second tangent procedure is applied to those phases , followed by edm+FL cycles

• A third tangent is finally applied followed by edm+FL cycles .

• Without FL the full procedure is less effective. • A suitable FOM stops the trial and recognizes the

correct solution

Page 12: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

PDB code RES

Resid HA

Ntrial CC COV

3ajw 2.10 135 Hg 1 101 0.62 0 1crm* 2.02 260 Hg 4 - - 0 1z1y* 2.00 316 Yb 9 - - 0 1buu 1.93 149 Ho 1 11 0.79 99 1yfd 1.90 694 Hg 13 Fe 4 - - 0 1jpr 1.89 695 Hg 14 Mn 4 - - 0 1naq* 1.81 636 Hg 18 - - 0 1arm* 1.80 312 Hg 4 Cu 1 65 0.52 0 1pm2 1.80 692 Hg 14 Mn 4 - - 0 1ytt 1.80 227 Yb 4 178 0.72 1r0h* 1.78 53 Co 1 - - 0 1n0y 1.75 168 Pb 14 42 0.90 21 1h87* 1.72 129 Gd 2 - - 0 2f14* 1.71 258 Hg 1 - - 0

Page 13: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

2yzw 1.70 283 Gd 2 44 0.53 0 1g0e 1.70 259 Hg 1 52 0.62 0 1ccr 1.69 112 Fe 1 - - 0 1e3u 1.65 978 Au 8 192 0.83 99 2p09 1.65 70 Zn 1 - - 0 1a70 1.62 97 Fe 2 - - 0

1jqc 1.61 694 Hg 13 Mn

4 - - 0

1r0f 1.61 54 Ga 1 4 0.85 98 1r0g 1.61 54 Hg 1 10 0.86 97 1iha 1.60 414* Rh 2 Br 2 47 0.63 - 2fj9 1.56 86 Pb 1 Zn 1 143 0.80 98 1paz 1.56 121 Cu 1 2 0.89 99 1ix2 1.55 205 Se 10 - - - 2fp1 1.53 329 Pb 2 81 0.83 96 1r0i 1.53 53 Cd 1 5 0.87 98

Page 14: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

PDB code tan1(Ffom2) tan2(Ffom2) tan3(Ffom2) Final

3ajw 88.19(1.65) 85.39(1.92) 72.45(2.38) 47.13

1crm 50.86(1.09) 53.28(1.10) 54.02(1.00) 71.27

1z1y 69.21(1.15) 59.53(1.55) 53.78(1.54) 67.74

1buu 86.28(1.75) 76.80(3.82) 15.17(3.67) 38.38

1yfd 63.54(1,76) 48.56(2.09) 48.67(1.81) 66.69

1jpr - - - -

1naq - - - -

1arm 70.43(1.12) 46.49(1.19) 42.74(1.28) 57.58

1pm2 - - - -

1ytt 48.09(3.05) 31.75(3.24) 33.54(3.13) 40.94

1r0h - - - -

1n0y 54.06(7.04) 41.79(7.28) 42.69(7.27) 24.71

1h87 - - - -

2f14 57.71(2.06) 54.12(2.17) 50.04(2.03) 68.62

Page 15: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

2yzw 69.85(2.33) 47.93(2.10) 37.28(2.41) 52.94 1g0e 51.41(2.58) 43.87(2.73) 41.68(2.68) 52.94 1ccr - - - - 1e3u 76.37(3.43) 18.20(4.10) 14.41(4.00) 37.56 2p09 - - - - 1a70 - - - - 1jqc - - - - 1r0f 57.67(4.55) 34.11(4.60) 34.49(4.53) 36.06 1r0g 69.87(1.58) 32.78(1.51) 33.18(1.58) 32.58 1iha 63.28(3.19) 44.05(3.14) 44.93(3.28) 52.00 2fj9 72.10(1.36) 69.58(1.48) 64.74(2.18) 39.37 1paz 64.00(3.90) 38.48(3.96) 35.17(4.05) 32.69 1ix2 - - - - 2fp1 63.76(3.65) 33.05(3.92) 32.47(3.80) 39.35 1r0i 74.73(1.88) 45.25(3.76) 28.35(3.65) 32.20

Page 16: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

TECHNIQUES FOR

PHASE IMPROVEMENT

Page 17: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

EDM techniques • Suppose we have successfully phased by one of

AB INITIO, SAD-MAD, SIR-MIR or MR technique. • The phases obtained are usually not good enough

to allow automated model building programs (AMB) [e.g., ARP/wARP , PHENIX , MAID , MAIN , Buccaneer ] to provide reliable models of the protein.

• EDM techniques are the traditional intermediate step: they modify electron density maps for improving the phases.

Page 18: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Main sources of information for improving the electron densities

• Positivity of the electron density • Experimental phases ( used in phase

combination) • Flatness of the solvent region ( used in

solvent flattening); • Distribution of the density in the e.d. map (

used in Histogram Matching procedures); • Non-crystallographic symmetry ( used in

molecular averaging)

Page 19: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The iterative flow-chart

• EDM are essentially iterative techniques: • {Fobs, φexp}→→FT→→ρ(r)→→→→→→→→ EDM • ↑ ↓ • {Fobs, φcomb} ↓ • ↑ ↓ • {Fmod, φmod} ←← FT-1 ← ← ρmod(r)

• Usually φcomb is the combination of φmod with φexp

Page 20: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Structure factor extrapolation FREE LUNCH

Page 21: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

About the data resolution limit Data resolution limits the information provided by

the diffraction experiment. This information may be not enough for solving a

protein by ab initio methods. • The basic question is: • is it possible to add some supplementary

information to the experimental data by extrapolating moduli and phases of unobserved reflections?

• The answer is positive.

Page 22: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

1. Å

3.5Å 4.0Å

2.5Å

Page 23: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• The structure factor extrapolation (SFE) method was applied for two different tasks:

• i) for increasing the rate of success of the ab initio phasing procedures .

• ii) for improving the electron density available at the end of the phasing process ( the best attainable by the current phasing techniques, SIR-MIR, SAD-MAD, MR, ab initio). In this last case it takes the name of FREE LUNCH

Page 24: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

SFE during ab initio phasing procedures • The number of extrapolated reflection should

not be too large ( they should not dominate the phasing process) and should not be too small , to be negligible for the success of the phasing process.

• SFE essentially improves the current electron density. Its modification and its Fourier inversion improves the phase values of the observed reflections.

• Thus the phasing procedure becomes more robust.

Page 25: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• THE FREE LUNCH APPROACH • Phasing processes ( ab initio , SAD-MAD, SIR-MIR,

SIRAS-MIRAS ) usually end with phase errors variable from 30° to 75°.

• Therefore the crucial task of interpreting the final electron density map may be not straightforward: additional work may be required to locate main and side chains .

• Structure factor extrapolation (beyond and behind RES) helps to overcome some of the drawbacks generated by the limited RES value.

Page 26: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• The following slide concerns the protein • R2pm2 (1PM2 ): nCh =2 NRES=750 • Heavy atoms: 14 Hg 4 Yb , Resolution=1.80 • Images by PyMOL corresponding to ARP/wARP

autobuilding under different conditions. • The order in the slide is the following:

• True Structure : After SIR2008 : • After Solomon; After Solomon+Free Lunch

Page 27: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

R2PM2:true struct.

After Ab initio

After EDM

After EDM+FL

Page 28: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The VLD algorithm for

ab initio phasing and for

improving electron densities

Page 29: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Difference Fourier synthesis • Let be the target structure

and be a model structure then

• Since the phases of are unknown it is usually chosen so that

)2exp()]exp(||)exp(|[|1

)()(

∑ −−=

=−

hhhhh

rr

hriiFiFV pp

p

πφφ

ρρ

)(rρ)(rpρ

)(rρ

)2exp()]exp(|]||[|1

)()(

∑ −−=

≈−

hhhh

rr

hriiFFV pp

p

πφ

ρρhh pφφ =

Page 30: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The relation needs to be weighted. Thus a good way for doing that is to replace the Fourier coefficients

• by (Read, 1986)

• In terms of normalized structure factors, by

• where m is the weight of the reflection, and

• takes into account the correlation between model and target structure.

hh pφφ =

|]||[| hh pFF −

)exp(|)|||( pp iFDFm φ−

)( pARmR σ−

DN

pA Σ

Σ=σ

Page 31: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The VLD formula • The best coefficient for a difference Fourier

synthesis is

• When the model is suffficiebntly good D≈1 and VLD agrees with Read expression.

• When the model is poor, D≈0 and the VLD coefficient becomes

)1()( DRRmR ppA −−−σ

)( pRmR −

Page 32: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• Steps of VLD algorithm • ( Burla, Giacovazzo, Polidori, (2010), • J.Appl. Cryst. 43, 825-836) • A random E- electron density map is calculated. • The model structure is obtained by selecting

2.5% of the largest intensity pixels. • The calculation of is performed. • A difference electron density map is

calculated via the best coefficients . • is modified ( by selecting 4% of the pixels

with largest positive values and 4% of the pixels with largest negative values ) and added to to obtain a new estimate of :

Aσpρ

pρρ

qpnew ρρρ +=

Page 33: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• The corresponding E-map is calculated which is submitted to cycles of EDM. In each cycle the parameter is updated.

• At the end a new model is obtained and a next iteration starts.

• Please note, direct methods are not needed. All is made in direct space.

Page 34: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

SMALL STRUCTURES (100 test structures up to 80 non-H

atoms in the asymmetric unit): all solved in default, <T> = 21 sec

• 24 MEDIUM SIZE STRUCTURES • From 81 to 250 non-H atoms in the a.u. • All solved in default, <t>= 8.55 min

• PROTEINS AT ATOMIC RESOLUTION • Better than standard DM

Page 35: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

MOLECULAR RELACEMENT

Page 36: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

REVAN, a pipeline for MR • The pipeline REVAN aims at solving protein structures via

molecular replacement (MR) even when the sequence identity (SI) between target and model is smaller than 0.30.

• REVAN combines a variety of programs and algorithms (REMO09, REFMAC, DM, DSR, VLD, FREE LUNCH, COOT, BUCCANEER and PHENIX-AutoBuild).

• The MR model, suitably rotated and positioned, is first refined by REFMAC, used in an extraordinary intensive way. A large number of cycles, usually between 75 and 200, are performed : the number is fixed by an automatic stop criterion.

• That leads to significant model improvements.

Page 37: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The last REFMAC cycle provides a new model template defined by a set of atomic positions and by the corresponding vibrational factors.

The corresponding electron density is submitted to cycles of DM-VLD –FL which usually improve the REFMAC phases by 4°-5°.

How to exploit the better electron density for allocating a new better REFMAC molecular model

without passing through the ex-novo model rebuilding step, which may fail because , at this stage, is still of poor quality?

Page 38: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

In a recent paper on crystallographic least squares ( Giacovazzo, 2015.) it was suggested the use of the so called vector refinement (Arnold & Rossmann , 1988) , a special least squares procedure.

It was shown that such refinement minimizes the difference between the current electron density, as computable directly from the REFMAC model (in our case ), and the electron density corresponding to the higher quality reflections (obtained by DM-VLD –FL in our case ).

Page 39: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• Luckily in REFMAC there is an option (denoted as phased maximum likelihood application) substantially equivalent to the vector refinement.

• Cycles of DM-VLD –FL –REFMACpd are performed which are able to improve previous molecular models.

Page 40: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• If previous steps do not lead to a model with high FOM, the template is remodeled in order to be more similar to the target ( first, a polyalanine model is created to which cycles of REFMACP are launched to refine such pruned model).

• COOT is applied for automatically mutating the model residues with low thermal factors

• Then the sequence EDM-VLD-EDM is launched to obtain a new and probably better electron density .

Page 41: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• The best rotamer orientations, selected using the MolProbity library (Lovell et al., 2000), are scored by searching for the best fitting between the current molecular model and the current electron density.

• Gaps between the protein-chain fragments are filled using COOT: the fragments are extended via additional residues at the N- and C-termini according to the protein-model alignment

Page 42: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• FREE LUNCH is launched : then • BUCCANEER automatically starts to perform the

AMB process. The program stops if success is obtained (in the practice, if Rfree is significantly smaller than 0.4) , otherwise the phases obtained from the BUCCANEER model are used as starting point for an additional application of the sequence EDM-VLD-EDM. The resulting values are then used as starting point for a new AMB application.

• The sequence EDM-VLD-EDM - BUCCANEER is cycled up to four times, and stops if Rfree, calculated by BUCCANEER at cycle (n+1) is larger than the previous one.

Page 43: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• DIFFICULT CASES • DiMaio et al. (2011; see also Adams et al., 2013)

described new algorithms for difficult MR cases, where it is possible to correctly locate the model, but the edm improved electron density map is too noisy to be interpreted.

• This usually occurs when SI < 0.30, where atomic positions of template and of the target structures may differ by 2-3 Å.

Page 44: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• To overcome the problem, a suite using physically realistic all-atom potential functions, originally designed to predict protein structures given their amino acid sequence (i.e., the program ROSETTA , see Das et al., 2009), was used for:

• a) identifying the correct MR solution, when ambiguous ;

• b) improving the model, until it may be interpreted, by combining force field and experimental electron density.

Page 45: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• The procedure is very efficient when looking at the results: it was able to solve, among others, the 13 structures reported in Table 1 ( see below) , all characterized by SI < 0.30.

• The procedure however requires considerable computing time: indeed, up to several thousand Rosetta models should be generated for each structure, and that leads the overall cpu time to vary from approximately 30 to 130 hours per structure.

• REVAN is able to solve nearly all DiMaio structures.

Page 46: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

<|∆φ(VE)>

<|∆φ(VE)4>

<|∆φ(VE)5>

<|∆φ(VE)6>

<|∆φ(VE)7>

<|∆φ(VE)8>

Rfree

1 76 65

57

66

58

61

54

63

54

62

54

60

53

31

0..29

2 76 70

67

66

63

64

61

66

63

67

64

n.p.

n.p.

62

0.48 P

3 75 69

65

66

63

56

54

58

54

n.p.

n.p.

55

53

32

0..31

6 81 53

48

52

46

51

46

n.p.

n.p.

n.p.

n.p.

51

46

30

0..29

7 81 60

57

59

55

58

54

58

55

60

55

n.p.

n.p.

30

0.30

10 75 51

47

51

47

51

46

n.p.

n.p.

n.p.

n.p.

51

46

47

0.44 P

11 84 72

66

67

59

68

58

70

57

72

59

n.p.

n.p.

27

0.29

12 75 35

32

34

31

n.p.

n.p.

n.p.

n.p.

n.p.

n.p.

n.p.

n.p.

25

0.29

13 73 50

44

50

44

50

44

49

43

50

43

49

43

32

0.32

Page 47: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• The PHANTOM DERIVATIVE METHOD ( PhD)

Page 48: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• Let ρ be the unknown electron density of the target structure, and let |F| and φ be the corresponding amplitudes and phases.

• A large number of structures ( the ancil structures), with the same unit cell parameters and the same space group of the target structure, uncorrelated each other and uncorrelated with the target, are randomly created.

• Let ρa(j) , j=1,...n, be the electron densities of the ancil structures, and Fa(j) and φa(j) be the corresponding amplitudes and phases, both being a priori perfectly known.

Page 49: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

The PhD method aims at exploiting the derivative electron densities for phasing the target. Since the ancil structures are not real structures (they are randomly generated) also the derivative structures are unreal, and therefore no experimental amplitude is available for them. That justifies the name of phantom derivative method.

njjj ad ,...1),()( =+= ρρρ

Page 50: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

We don’t treat here the ab initio case. We suppose that the model map is available, e.g., from a MR technique.

The derivative model electron densities , j=1,...,n are calculated, which are summed into the sum

function ρs cannot provide any additional information on the

target structure: indeed the ancil structures are uncorrelated each other and are uncorrelated with the target . Their sum goes into the background of the map.

)()( 00 jj ad ρρρ +=

)()( a1001

s jnj njd

n

jρρρρ ∑+=∑= =

=

Page 51: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

If we modify ρs by edm techniques we obtain

It may be expected that each will show

structural features originally present in plus additional structural features generated by the

edm procedure. Summing the n maps will increase the contrast

between a better target model and the background created by the ancil densities. Edm+VlD+FL cyclic application are more appropriate than single edm.

)(mod1

smod jdn

jρρ ∑=

=

)()( a1mod01 jj nj

nj ρρ ∑+∑= ==

)(mod0 jρ

Page 52: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

CODE Res(Å) NresT MR DM PhD CODE Res(Å) NresT MR DM PhD

1dy5 0.87 248 74 76 21 2f53 1.99 811 59 52 45 1bxo 0.90 323 74 71 21 2ayv 2.00 148 53 48 43 2fc3 1.54 124 54 49 35 2pby 2.07 1155 79 75 72 1tgx 1.55 180 58 54 41 2f8m 2.09 472 64 57 51 2a46 1.65 217 69 57 37 1yxa 2.10 740 74 69 66 1lys 1.72 258 53 51 45 2f84 2.10 323 55 49 44 1cgo 1.79 127 74 66 55 1cgn 2.15 125 73 65 57 2otb 1.79 432 58 54 42 1xyg 2.19 1380 64 58 54 1kqw 1.80 134 59 54 43 2a4k 2.30 439 60 55 47 2sar 1.85 192 52 47 37 2b5o 2.50 584 50 49 45 1lat 1.90 145 70 68 66 1ycn 2.51 619 56 50 44 1e8a 1.95 175 69 57 48 2iff 2.58 555 62 61 61

Page 53: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Procedure specifically designed for MR cases: Intensive application of REFMAC to MR phases as in REVAN;

The REFMAC phases are submitted to VLD+FL+PhD, which ends with new phase estimates: let be their average phase error ( again in the VLD notation the DM step is contained.

Page 54: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

CODE

1dy5 74 21 21 22

1bxo 74 34 34 19

2fc3 54 34 33 31

1tgx 58 36 35 34

2a46 69 40 37 23

1lys 53 29 29 32

1cgo 74 57 53 35

2otb 58 37 36 32

1kqw 59 35 34 31

2sar 52 44 41 33

>∆< || MRφ >∆< || Rφ >∆< + || DRφ >∆< +++ || PFVRφ

Page 55: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

1lat 70 61 59 50

1e8a 69 40 39 36

2f53 59 30 29 36

2ayv 53 33 32 33

2pby 79 45 43 35

2f8m 64 44 42 37

1yxa 74 45 43 38

2f84 55 37 36 36

1cgn 73 49 46 30

1xyg 64 42 40 38

2a4k 60 37 35 31

2b5o 50 38 37 36

1ycn 56 33 32 33

2iff 62 65 64 79

Page 56: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Structures for which Buccaneer is unable to automatically provide a reliable molecular model (Unsolved Structures) under different conditions Conditions MD 1dy5, 1tgx, 1cgx, 1lat, 2pby, 1yxa, 1cgn, 2iff PhD 1lat, 2pby, 1yxa, 1cgn, 2iff REF+V+F+Pr 1lat, 2iff

Page 57: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

Phase improvement via ancils related with the target structure

• Ancil structures, obtained by shifting the target by origin permissible translations, may be employed for refining model phases: i.e.,

• In this case

• The method enlarges the concept of ancil , provides the correct values of and significantly reduces the cpu refinement time.

)()( trr −= ρρa

|,)(||)(| hh FFa = hthh πφφ 2+=a

|)cos(|||2|| hthh πFFd = hthh πφφ +=d

, .

|| hdF

Page 58: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• The new PhD technique has advantages and disadvantages :

• in some space groups, a limited number of permissible origin translations;

• centred or pseudocentred cell; • but the derivative amplitudes are perfectly

known ( algebrically related to target amplitudes).

• A maximum of 9 ancils is sufficient to improve the original model phases

Page 59: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

CODE MR DM PhDr PhDs CODE MR DM PhDr PhDs

1dy5 74 76 21 19 2f53 59 52 45 44 1bxo 74 71 21 19 2ayv 53 48 43 41 2fc3 54 49 35 30 2pby 79 75 72 72 1tgx 58 54 41 37 2f8m 64 57 51 48 2a46 69 57 37 28 1yxa 74 69 66 66 1lys 53 51 45 43 2f84 55 49 44 44 1cgo 74 66 55 56 1cgn 73 65 57 54 2otb 58 54 42 37 1xyg 63 58 54 55 1kqw 59 54 43 38 2a4k 60 55 47 45 2sar 52 47 37 33 2b5o 50 49 45 45 1lat 70 68 66 62 1ycn 56 50 44 45 1e8a 69 57 48 48 2iff 62 61 61 64

Page 60: CSIC€¦ · EDM techniques • Suppose we have successfully phased by one of . AB INITIO, SAD-MAD, SIR-MIR. or . MR. technique. • The phases obtained are usually not good enough

• Under development: • 1) PHD ab initio and non ab initio;

• 2) VLD based on the Fourier difference mF-2Fp rather than on mF-Fp. Much more

powerful indeed.