ample – using de novo protein structure modelling techniques to create and enhance search models...

1
AMPLE – Using de novo protein structure modelling techniques to create and enhance search models for use in Molecular Replacement Figure taken from: http://en.wikiversity.org/wiki/File:Cytochrome_C_Oxidase_1OCC_in _Membrane_2.png Summary Ab initio modelling offers exciting prospects for solving structures for novel proteins that cannot be solved with existing methods. There are many possibilities for extending and improving our approach that we will be exploring in future work. REFS: [1] Simons, K.T., Kooperberg, C., Huang, E., Baker, D., 1997. Journal of molecular biology 268, 209– 225. [2] Bibby, J., Keegan, R.M., Mayans, O., Winn, M.D., Rigden, D.J., 2012. Acta Crystallographica Section D Biological Crystallography 68, 1622–1631. [3] Barth, P., Wallner, B., Baker, D., 2009. Proceedings of the National Academy of Sciences 106, 1409–1414. [4] Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, et al. (2011) PLoS ONE 6(12):e28766. doi:10.1371/journal.pone.002876 Ab initio Modelling with Rosetta The Rosetta [1] suite of programs can be used to generate structures for proteins starting only from their sequence. Rosetta collects short fragments of residues of known structures similar to the target and then strings these fragments together to generate a collection of models of the target structures (decoys). Ab initio Models for Molecular Replacement •Modelling produces clusters of similar models: MR works effectively with superimposed ensembles approximating the target. •Within and between clusters similarity indicates accuracy, so we can trim (truncate) inaccurate regions: MR may only require a partial model. AMPLE AMPLE [2] takes ab initio models (such as those generated with Rosetta) and clusters the most similar ones together. For each cluster it then truncates the most dissimilar residues using a range of different variance thresholds, leaving a core of similar residues. The truncated models are clustered together and aligned to create an ensemble of similar models. These ensembles are then used as candidate models in molecular replacement and have been found to be very successful. Previous work was restricted to relatively small globular proteins, but we have been looking at extending the approach to more complex proteins. We were granted early- access time on the Blue Wonder supercomputer at STFC's Hartree center at Daresbury Laboratory. This is an IBM System iDataPlex cluster comprising 8,192 Intel Xeon E5-2670 processor cores. AMPLE was extended to make efficient use of the cluster allowing us to rapidly benchmark the code on 113 different protein structures. Transmembrane Proteins Transmembrane proteins sit within the cell membrane and are in a non-aqueous environment that makes them difficult to work with experimentally. However, the membrane environment imposes additional constraints on the protein that can be used to guide and focus the modelling. This work has looked at extending AMPLE to work with Rosetta transmembrane models [3] for 18 different proteins. Coiled- Coils These are proteins where the alpha- helix coils are themselves coiled around each other. Placement of a small portion of one helix (sometimes even the wrong one) enables MR to "bootstrap" into the correct whole structure, so we are also looking at a collection of 95 different coiled-coil structures. Contact Prediction Recent work has significantly improved the accuracy with which residues that are in close contact with each other ("contacts") can be predicted from the sequence alone; this is due to a combination of the huge numbers of new sequences that are available, but also improved contact prediction algorithms [4] . Small numbers of contacts can be used to guide modelling, significantly improving the results and extending it to much larger proteins then had previously been possible. We are currently extending AMPLE to take advantage of these contacts. Figure from reference 4. High-Performance Computing Ronan Keegan , Jens Thomas, Jaclyn Bibby, Martyn Winn and Dan Rigden CCP4, STFC Rutherford Appleton Laboratory and University of Liverpool, UK, email: [email protected]

Upload: ambrose-johnston

Post on 29-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: AMPLE – Using de novo protein structure modelling techniques to create and enhance search models for use in Molecular Replacement Figure taken from: Cytochrome_C_Oxidase_1OCC_in_Membrane_2.png

AMPLE – Using de novo protein structure modelling techniques to create and enhance search models for use in Molecular Replacement

Figure taken from: http://en.wikiversity.org/wiki/File:Cytochrome_C_Oxidase_1OCC_in_Membrane_2.png

SummaryAb initio modelling offers exciting prospects for solving structures for novel proteins that cannot be solved with existing methods. There are many possibilities for extending and improving our approach that we will be exploring in future work.

REFS:[1] Simons, K.T., Kooperberg, C., Huang, E., Baker, D., 1997. Journal of molecular biology 268, 209–225.[2] Bibby, J., Keegan, R.M., Mayans, O., Winn, M.D., Rigden, D.J., 2012. Acta Crystallographica Section D Biological Crystallography 68, 1622–1631.[3] Barth, P., Wallner, B., Baker, D., 2009. Proceedings of the National Academy of Sciences 106, 1409–1414.[4] Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, et al. (2011) PLoS ONE 6(12):e28766. doi:10.1371/journal.pone.002876

Ab initio Modelling with RosettaThe Rosetta[1] suite of programs can be used to generate structures for proteins starting only from their sequence. Rosetta collects short fragments of residues of known structures similar to the target and then strings these fragments together to generate a collection of models of the target structures (decoys).

Ab initio Models for Molecular Replacement•Modelling produces clusters of similar models: MR works effectively with superimposed ensembles approximating the target.•Within and between clusters similarity indicates accuracy, so we can trim (truncate) inaccurate regions: MR may only require a partial model.

AMPLEAMPLE[2] takes ab initio models (such as those generated with Rosetta) and clusters the most similar ones together. For each cluster it then truncates the most dissimilar residues using a range of different variance thresholds, leaving a core of similar residues.

The truncated models are clustered together and aligned to create an ensemble of similar models.

These ensembles are then used as candidate models in molecular replacement and have been found to be very successful.

Previous work was restricted to relatively small globular proteins, but we have been looking at extending the approach to more complex proteins.

We were granted early-access time on the Blue Wonder supercomputer at STFC's Hartree center at Daresbury Laboratory. This is an IBM System iDataPlex cluster comprising 8,192 Intel Xeon E5-2670 processor cores.

AMPLE was extended to make efficient use of the cluster allowing us to rapidly benchmark the code on 113 different protein structures.

Transmembrane ProteinsTransmembrane proteins sit within the cell membrane and are in a non-aqueous environment that makes them difficult to work with experimentally. However, the membrane environment imposes additional constraints on the protein that can be used to guide and focus the modelling. This work has looked at extending AMPLE to work with Rosetta transmembrane models[3] for 18 different proteins.

Coiled-CoilsThese are proteins where the alpha-helix coils are themselves coiled around each other. Placement of a small portion of one helix (sometimes even the wrong one) enables MR to "bootstrap" into the correct whole structure, so we are also looking at a collection of 95 different coiled-coil structures.

Contact PredictionRecent work has significantly improved the accuracy with which residues that are in close contact with each other ("contacts") can be predicted from the sequence alone; this is due to a combination of the huge numbers of new sequences that are available, but also improved contact prediction algorithms[4].

Small numbers of contacts can be used to guide modelling, significantly improving the results and extending it to much larger proteins then had previously been possible. We are currently extending AMPLE to take advantage of these contacts.

Figure from reference 4.

High-Performance Computing

Ronan Keegan, Jens Thomas, Jaclyn Bibby, Martyn Winn and Dan RigdenCCP4, STFC Rutherford Appleton Laboratory and University of Liverpool, UK, email: [email protected]