fitz-interactive graphics tool for investigating molecular symmetry and homology

4
FITZ -interactive graphics tool for investigating molecular symmetry and homology Garry Taylor* Laboratory of Molecular Biology, Department of Crystallography, Birkbeck College, Malet St, London WClE 7HX, TTTI Uh A program written for an Evans and Sutherland Picture System 2 is described which uses the visual feedback of molecular graphics to aid in the comparison of atomic structures. Quantitative comparisons can be made by invoking an algorithm of McClachlan. The program also allows a crystal packing arrangement to be investigated which is shown to be a valuable adjunct to the crystallographic method of molecular replacement. Finally, an object may be created using the pen and tablet allowing a balsa wood-like representation of the electron density of a low resolution protein structure to be built for subsequent viewing and comparisons on a graphics screen. Keywords: protein crystallography, graphics, homology received 16 November 1982, revised 9 December 1982 Protein crystallography is constantly adding to our knowledge of the 3D atomic structures of biologically- important molecules and is providing a database from which to draw conclusions regarding patterns of design at the secondary structural level and beyond. For example, there often exists a high degree of structural homology between functionally similar proteins from distinct species, sometimes despite a very low degree of amino acid sequence homology’. Many proteins consist of several structurally-similar domains; such internal symmetry leading to thermodynamically-stable molecules*. Further, several structural motifs have been defined which are common to proteins which are quite distinct functionally but have common topological roles in maintaining tertiary structure3. Such observations have been the source of much speculation regarding the evolutionary inferences of these structural similarities, eg a gene coding for a primitive structural domain of some primordial organism may have become duplicated and with suitable mutations have formed a gene coding for a two-domain structure of high specificity within a more highly- differentiated organism. With evolutionary and functional goals in mind, protein crystallographers have therefore often studied families of proteins from several species, eg insulins from primitive fish to human sources or acid proteases from microbial and mammalian sources. Structural *Present address: Laboratory of Molecular Biophysics, Department of Zoology, South Parks Road, Oxford OX1 3PS, UK homology within such families can, in theory, be used to solve an unknown member of a family ab initio by the crystallographic technique of molecular replacement. This will be discussed later. EXPERIMENTAL FITZ was written in Fortran IV for use on a PDP-11 operating under RSX-11M. This was driving an Evans and Sutherland Picture System 2 with a monochrome scope and 64 kword of 16-bit memory. The Picture System 2 is a high resolution calligraphic display with a 4096 x 4096 pixel addressable screen. The picture processor contains a matrix arithmetic unit for rapid transformation of the displayed vectors. FITZ is heavily overlayed because of the 16-bit virtual address limitations of the PDPll, but uses memory management directives to create extra dynamic regions of memory when needing to manipulate large data arrays. It therefore occupies mainly only 28 kword of PDPll memory, the linear display lists being created in Picture System memory. Full advantage is taken of the Picture System’s ability to perform depth cueing, windowing, clipping, etc, with interaction being via the pen and tablet; functions are chosen from a screen menu. The colour photographs were simulated by multiple exposures using colour filters. STRUCTURAL COMPARISONS Up to four objects can be displayed, within memory limitations and transformed independently (rotated/ translated). An object may be an alpha carbon backbone, a full atomic model (both displayed as stick representations with or without labels) or a contour balsa wood-like model created previously using FITZ. A file of alpha carbon coordinates is created prior to using FITZ in such a way that a backbone may be drawn with only one call to the 3D drawing routine, thus minimizing overheads and allowing many such representations to be displayed. All rotations are relative to fixed screen axes and are about the current centre of mass of each object. To accomplish this, a reasonably complex set of transformation matrix concatenations are required4 which have been extended to cope with several objects for FITZ. To facilitate the differentiation of several objects Volume 1 Number 1 March 1983 0263-7855/83/010005-04 $3.00 0 1983 Butterworth & Co (Publishers) Ltd 5

Upload: garry-taylor

Post on 21-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

FITZ -interactive graphics tool for investigating molecular symmetry and homology Garry Taylor*

Laboratory of Molecular Biology, Department of Crystallography, Birkbeck College, Malet St, London WClE 7HX, T TTI Uh

A program written for an Evans and Sutherland Picture System 2 is described which uses the visual feedback of molecular graphics to aid in the comparison of atomic structures. Quantitative comparisons can be made by invoking an algorithm of McClachlan. The program also allows a crystal packing arrangement to be investigated which is shown to be a valuable adjunct to the crystallographic method of molecular replacement. Finally, an object may be created using the pen and tablet allowing a balsa wood-like representation of the electron density of a low resolution protein structure to be built for subsequent viewing and comparisons on a graphics screen.

Keywords: protein crystallography, graphics, homology

received 16 November 1982, revised 9 December 1982

Protein crystallography is constantly adding to our knowledge of the 3D atomic structures of biologically- important molecules and is providing a database from which to draw conclusions regarding patterns of design at the secondary structural level and beyond. For example, there often exists a high degree of structural homology between functionally similar proteins from distinct species, sometimes despite a very low degree of amino acid sequence homology’. Many proteins consist of several structurally-similar domains; such internal symmetry leading to thermodynamically-stable molecules*. Further, several structural motifs have been defined which are common to proteins which are quite distinct functionally but have common topological roles in maintaining tertiary structure3.

Such observations have been the source of much speculation regarding the evolutionary inferences of these structural similarities, eg a gene coding for a primitive structural domain of some primordial organism may have become duplicated and with suitable mutations have formed a gene coding for a two-domain structure of high specificity within a more highly- differentiated organism.

With evolutionary and functional goals in mind, protein crystallographers have therefore often studied families of proteins from several species, eg insulins from primitive fish to human sources or acid proteases from microbial and mammalian sources. Structural

*Present address: Laboratory of Molecular Biophysics, Department of Zoology, South Parks Road, Oxford OX1 3PS, UK

homology within such families can, in theory, be used to solve an unknown member of a family ab initio by the crystallographic technique of molecular replacement. This will be discussed later.

EXPERIMENTAL

FITZ was written in Fortran IV for use on a PDP-11 operating under RSX-11M. This was driving an Evans and Sutherland Picture System 2 with a monochrome scope and 64 kword of 16-bit memory. The Picture System 2 is a high resolution calligraphic display with a 4096 x 4096 pixel addressable screen. The picture processor contains a matrix arithmetic unit for rapid transformation of the displayed vectors. FITZ is heavily overlayed because of the 16-bit virtual address limitations of the PDPll, but uses memory management directives to create extra dynamic regions of memory when needing to manipulate large data arrays. It therefore occupies mainly only 28 kword of PDPll memory, the linear display lists being created in Picture System memory.

Full advantage is taken of the Picture System’s ability to perform depth cueing, windowing, clipping, etc, with interaction being via the pen and tablet; functions are chosen from a screen menu. The colour photographs were simulated by multiple exposures using colour filters.

STRUCTURAL COMPARISONS

Up to four objects can be displayed, within memory limitations and transformed independently (rotated/ translated). An object may be an alpha carbon backbone, a full atomic model (both displayed as stick representations with or without labels) or a contour balsa wood-like model created previously using FITZ. A file of alpha carbon coordinates is created prior to using FITZ in such a way that a backbone may be drawn with only one call to the 3D drawing routine, thus minimizing overheads and allowing many such representations to be displayed.

All rotations are relative to fixed screen axes and are about the current centre of mass of each object. To accomplish this, a reasonably complex set of transformation matrix concatenations are required4 which have been extended to cope with several objects for FITZ.

To facilitate the differentiation of several objects

Volume 1 Number 1 March 1983 0263-7855/83/010005-04 $3.00 0 1983 Butterworth & Co (Publishers) Ltd 5

displayed simultaneously, a small palette is provided on the menu to allow the representation of each object to be changed, eg blinking, dotted lines, etc (Table 1).

A facility exists for an object (eg a bilobal protein) to be split into several domains (two lobes) for independent manipulation.

A useful aid to discriminating between multisolutions is to eliminate those solutions which produce unreasonable packing arrangements in the crystal, eg

Table 1. Brief description of the FIT2 Menu

START DENSITY NEW CONTOUR

NEW PLANE

DELETE PLANE END DENSITY

SAVE DENSITY

CELL PACl PAC2

REST FIT

VIEW

1234

-TRANX+ -TRANY+ -TRANZ+ -ROTX+ -ROTY+ -ROTZ+ - ZOOM + - SLAB +

Invokes tablet drawing mode Closes last contour polygon drawn As above, then joins contours in adjacent sections Allows a section to be reentered As for NEW PLANE, then places centre of mass at the screen origin Allows drawn density to be saved on disc Invokes packing options:

0 CELL draws unit cell parallelpiped and allows input of (rotated) molecule

l PACl invokes static packing l PAC2 invokes real-time

packing

REST restores disc coordinates and draws molecule FIT invokes EZIFIT algorithm Selects global transformation of all objects Selects object for independent transformation

Transformations with respect to fixed screen axes, except in the case of PAC2 where TRANs are relative to cell axes

Scales whole picture Changes the thickness of data space being viewed

1 2 3 4 A pallet for selecting object TXT BLK OFF DIMrepresentations. 12 3 4 to select

object then

l TXT- full line/dotted line switch

0 BLK- blink on/off switch 0 OFF- object displayed/not

displayed switch l DIM- bright/dim switch

PLOT PERS INFO PLOT dumps screen to plotter PERS switches perspective on INFO prints current transformation matrices

CLR EXIT MENU CLR clears screen and reinitializes FITZ EXIT exits from FITZ MENU temporarily removes menu leaving an asterisk

when symmetry-related molecules interpenetrate. Programs to deal with packing are often laborious and expensive to run. However, visualization of packing gives an instant feedback. For this purpose FITZ has two options - static packing and realtime packing.

Static packing

A molecule can be placed within a unit cell (drawn as a parallelepiped with or without perspective) in a given orientation, expressed either as a matrix or in terms of Eulerian angles. This alone is of great value when interpreting the results of a rotation function search. The expression of rotations in terms of Eulerian angles is convenient for computational purposes but difficult to perceive mentally.

Given a file of symmetry operations, molecules are generated either within given distance restraints from the centre of mass of the original molecule, or by explicitly given operations. Colour plate 7 shows a packing diagram of hen eggwhite lysozyme molecules. Molecules were generated within a given volume, 32 molecules being present in this c-axis projection viewed in perspective. The whole picture may be rotated, zoomed, set to perspective, etc.

Realtime packing

A rapid means of investigating allowed packing arrangements is to be able to translate the centre of mass of one of the molecules in the unit cell and to allow the symmetry-related molecules, whose centres of mass fall within the unit cell, to move in response in real time. As in the previous option, the initial molecule can be oriented within the cell, a file of symmetry operations is again given and the cell is filled with as many molecules as there are symmetry operators. The initial molecule blinks and its centre of mass is displayed within the translation boxes on the menu which now correspond to translations parallel to the unit cell edges a, b, and c rather than parallel-to-fixed screen axes.

The realtime response is achieved by expressing the symmetry operators as 4 X 4 transformation matrices whose translation components are updated within the display loop and which are concatenated in turn to the global transformation, thus allowing the whole picture to be rotated while maintaining the translations along the cell edges.

Choosing the FIT option on the menu invokes the EZIFIT algorithm 5. This expects n atomic pairs from two molecules and will give the transformation for mapping one set onto the other which gives the lowest RMS deviation between the two sets. FITZ allows the n equivalent atoms, which are deemed to be related, eg topologically functionally similar, to be chosen in one of three ways:

l explicitly, by stating via the VDU the atom names l by proximity, by allowing FITZ to accept as

equivalent those atoms which are less than a specified distance apart, in this case a best by-eye fit must have been accomplished

l by choosing the atoms in pairs via the pen and tablet.

The fitting algorithm is almost instantaneous, giving a very satisfying visual feedback. It is planned to automate the proximity-fitting routine to allow two structures to

Journal of Molecular Graphics

gradually refine to their best fit from some rough starting position.

Comparisons investigated

y-Crystallin II FITZ was used to investigate the homology between the four structural Greek Key motifs of this eye lens protei#. This remarkable protein has very high internal symmetry because it is two-domain p-structure, the domains are related by an approximate diad and each domain contains two related motifs which are again related by an approximate diad. Colour plate 1 shows the alpha carbon backbone of the whole molecule and Colour plate 2 the superposition of the four motifs I-IV. Motif I contains residues l-39, motif II, 40-79, motif III, 84-122 and motif IV, 123-165. The best superposition was made by finding the best fit of III on to I, II on to I and finally IV on to II. Fits were first made by inputting topologically equivalent pairs explicitly and then refining the fit by several cycles of proximity fitting. An RMS fit of 0.174 nm was found for the alpha carbons of the 36 topologically equivalent residues between I and III, and of 0.105 nm for the alpha carbons of the 42 residues between II and IV.

Lysozymes Colour plate 3 shows two molecules of lysozyme: human and hen eggwhite in all atom representations with the menu switched off for clarity. A best fit of the two molecules was made based on 129 topologically equivalent alpha carbon atoms; this is shown in Colour plate 4. The RMS deviation of the 129 atoms was 0.061 nm. The plate shows the conserved tertiary fold of the lysozymes despite a 40 per cent change in primary sequence.

Acid proteases These proteins form a family of pancreatic enzymes with structures available from several sources; from mammalian (pepsin) to microbial (eg penicillopepsin). Comparison of these structures shows a marked structural homology’. Colour plate 5 shows alpha carbon backbones of the acid proteases from endothia parasitica and penicillium janthinellum after finding the best fit based on only three catalytically important conserved residues 32,75 and 215. Colour plate 6 shows the result of further proximity fitting.

CRYSTAL PACKING AND MOLECULAR REPLACEMENT

Two similar molecular structures crystallizing in two different crystal forms will have similar molecular transforms, but will be sampled at different positions in the resultant diffraction pattern. The technique of molecular replacement allows for the rotation and translation of a known transform over the unknown transform searching for a maximum overlap which will allow the known structure to be positioned correctly into the unknown structure’s crystal cell. This technique involves the comparison of interatomic vector sets (Patterson functions). Because of the large number of vectors in a protein, it often leads to ambiguous solutions, especially when determining the position of the known molecule in the unknown cell. The problem

Figure 1. A balsa wood-Like low-resolution representa- tion of mucor puss&s pepsin is shown along with the alpha carbon backbone of an homologous enzyme endothiapepsin. The density model was entered by using the pen and tablet

can be resolved into two parts: molecular orientation and translation. The former is straightforward’.

reasonably

This facility has proved invaluable in interpreting the multisolutions of a translation function of pepsin (using penicillopepsin as the search model) and of proinsulin (using an insulin dimer as the search model), as solutions can be disregarded immediately on the grounds of interpenetration of molecules. Surface contacts between symmetry-related molecules may also be investigated in detail to discriminate further between solutions. (See Colour plate 8.)

Molecular replacement - a special problem

The ability of the Picture System 2 to be able to enter an object via the pen and tablet was used in the particular case of trying to solve the structure of mucor pusillus pepsin (Figure 1). A medium quality low resolution (0.5 nm) multiple isomorphous replacement electron density map of this enzyme had been obtained, but the isomorphous protein derivatives did not give phase information beyond this spacing. A realspace molecular replacement was attempted using the graphics in a simplistic but effective fashion. A balsa wood-like representation of the electron density map was entered section by section into the Picture System memory by drawing around contours on Fourier sections placed on the tablet. The quality of the map dictated such an approach to delineating the molecular boundary rather

Volume 1 Number 1 March 1983 7

than by inputting a file of contours by other means. Contours were joined in the third dimension simply by joining closest contour points in adjacent levels. Once entered (a process taking about 20 min), the skeletal representation of the balsa wood model can be rotated, etc, overcoming the limitations of stacked perspex sheets.

The alpha carbon backbone of another acid protease, thought to have an homologous structure, was also displayed and fitted visually within the molecular boundary of mucor pusillus pepsin. This both helped in the interpretation of the electron density map and gave a rough solution to the molecular replacement problem - the transformation matrix relating the two molecules being obtained easily by choosing INFO on the menu.

The orientation and translation of the model is now serving as a starting model for whole rigid body refinement in an attempt to provide phase information to the native protein data which extends to 0.26 nm.

CONCLUSIONS

FITZ is a useful tool for problems involving structural comparisons, molecular symmetry, crystal symmetry and molecular replacement. The obvious visual dimension gained by use of such a powerful graphics station as the Evans and Sutherland Picture System 2 provides a useful adjunct to more conventional crystallographic techniques. It is hoped in the future to implement further facilities such as surface representations, a more powerful and flexible means of specifying molecular fragments to be drawn perhaps by fast direct access to the protein data base on disc; and a more general way of specifying crystallographic and

noncrystallographic symmetry and pseudosymmetry. The use of a colour monitor would obviously enhance superposition of structures and allow many more vectors to be displayed.

ACKNOWLEDGEMENTS

The majority of this work was carried out in the Laboratory of Molecular Biology, Department of Crystallography at Birkbeck College, London, and I am grateful to all of my colleagues there. Firstly to Professor Tom Blundell for encouragement and inspiration, secondly to Dr Ian Tickle for his expert help and advice and lastly to all who provided the problems which served to mould the program. I thank Drs Blake and Artymiuk for supplying the lysozyme coordinates. Finally I must thank Lance Mangold for his help in taking the colour photographs. The SERC is thanked for financial support.

REFERENCES

1 Tang, J, James, M N G, Hsu, I N, Jenkins, J A and Blundell, T L Nature Vol271 (1978) pp 618-621

2 Blundell, T L, Sewell, T, and Tumell, W G ‘Svmmetry in the structure and organisation of proteins’ in A volume in honour of Dorothy Hodgkin OUP (1981)

3 Richardson, J S Nature Vo1268 (1977) pp 495-500 4 Tickle, I J personal communication 5 McClachlau, A D J Mol. Biol. Vol 128 (1979) pp 49-80 6 Blundell, T L, Lindley, P F, Miller, L, Moss, D S, Slingsby, C,

Tickle, I J, TumeII, W G and Wistow, G Nature Vo1289 (1981) pp 771-777

7 Russman, M G (ed) The molecular replacement method Gordon and Breach, New York, USA (1972)

8 Journal of Molecular Graphics