bmmb597e protein evolution protein classification 1

31
BMMB597E Protein Evolution Protein classification 1

Upload: august-bridges

Post on 16-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BMMB597E Protein Evolution Protein classification 1

BMMB597EProtein Evolution

Protein classification

1

Page 2: BMMB597E Protein Evolution Protein classification 1

2

Protein families

• The first protein structures determined by X-ray crystallography, myoglobin and haemoglobin, were solved (in 1959—60) before the amino acid sequences were determined

• It came as a surprise that the structures were quite similar

• Soon it became clear, on the basis of both sequences and structures, that there were families of proteins

Page 3: BMMB597E Protein Evolution Protein classification 1

myoglobin haemoglobin

3

Page 4: BMMB597E Protein Evolution Protein classification 1

4

50 years earlier, there were some hints …

• E.T. Reichert & A.P. Brown. The differentiation and specificity of corresponding proteins and other vital substances in relation to biological classification and organic evolution: the crystallography of hemoglobins. (Carnegie Institution of Washington, 1909)

• Crystallography 3 years before discovery of X-ray diffraction?

Page 5: BMMB597E Protein Evolution Protein classification 1

5

Reichert and Brown studied interfacial angles in haemoglobin crystals

• Stenö’s law (1669): different crystals of the same substance may have differerent sizes and shapes, but the angles between faces are constant for each substance

• They found that the angles differed from species to species

• Similarities in values of interfacial angles were consistent with classical taxonomic tree

• They even found differences between oxy- and deoxyhaemoglobin

Page 6: BMMB597E Protein Evolution Protein classification 1

6

Most premature scientific result ever?

• These results implied:– That proteins adopted (or at least could adopt)

unique structures, to form a crystal– That protein structures varied between species– That this variation was parallel with the evolution

of the species– That proteins could change structure as a result of

changes in state of ligation• In 1909!

Page 7: BMMB597E Protein Evolution Protein classification 1

7

M.O. Dayhoff

• Pioneer of bioinformatics• Collected protein sequences• First curated ‘database’• Recognized that proteins form families, on the

basis of amino acid sequences• Computational sequence alignments• First evolutionary tree • First amino-acid substitution matrix (later

replaced by BLOSUM)

Page 8: BMMB597E Protein Evolution Protein classification 1

8

Can relationships among proteins be extended beyond families?

• Families = sets of proteins with such obvious similarities that we assume that they are related

• One question: how much similarity do we need to believe in a relationship?

• How far can evolution go?• Convergent evolution?• Cautionary tale: chymotrypsin / subtilisin

Page 9: BMMB597E Protein Evolution Protein classification 1

9

Chymotrypsin-subtilisin

• Both proteolytic enzymes– Chymotrypsin mammalian– subtilisin from B. subtilis

• Both have catalytic triads• Same function – same mechanism• Sequences 12% similar (near noise level)

• However, structures show them to be unrelated

Page 10: BMMB597E Protein Evolution Protein classification 1

10

Chymotrypsin / Subtilisin

Page 11: BMMB597E Protein Evolution Protein classification 1

Catalytic triad in serine proteinases

11

Page 12: BMMB597E Protein Evolution Protein classification 1

12

Chymotrypsin and subtilisin have similar catalytic triads

Page 13: BMMB597E Protein Evolution Protein classification 1

13

How can we classify proteins that belong to families?

• Align sequences• Calculate phylogenetic tree (various ways to

do this, depend on sequence alignment)• Usually, phylogenetic tree of homologous

proteins from different species follow phylogenetic tree based on classical taxonomy

• That is reassuring• But what happens as divergence proceeds?

Page 14: BMMB597E Protein Evolution Protein classification 1

14

How can we classify proteins that do not obviously belong to families?

• Base this on structure rather than sequence• Structural similarities are maintained as

divergence proceeds, better than sequence similarities

• For closely related proteins, expect no difference between sequence-based and structure based classification

• How far can classification be extended?

Page 15: BMMB597E Protein Evolution Protein classification 1

15

SCOP Structural Classification of Proteins

• Idea of A.G. Murzin, based on old work by C. Chothia and M. Levitt

• Even if two proteins are not obviously homologous, they may share structural features, to a greater or lesser degree.

• For instance, the secondary structures of some proteins are only -helices

• Others, have -sheets but no -helices

Page 16: BMMB597E Protein Evolution Protein classification 1

16

SCOP

• SCOP is a database that gives a hierarchical classification of all protein domains

• Recall that a domain is a compact subunit of a protein structure that ‘looks as if’ it would have independent stability

Fragment of fibronectin

Page 17: BMMB597E Protein Evolution Protein classification 1

17

Dissection of structure into domains

• It is not always quite so obvious how to divide a protein into domains

• There is some (not a lot) of room for argument• Note that sometimes the chain passes back

and forth between domains• In these cases one or both domains do not

consist entirely of a consecutive set of residues

Page 18: BMMB597E Protein Evolution Protein classification 1

18

lactoferrin

Page 19: BMMB597E Protein Evolution Protein classification 1

19

SCOP, CATH, DALI Database classify protein structures

• SCOP (Structural Classification of Proteins) • CATH (Class, Architecture, Topology, Homologous

superfamily)• DALI Database • These web sites have many useful features: – information-retrieval engines, including

search by keyword or sequence– presentation of structure pictures– links to other related sites including bibliographical

databases.

Page 20: BMMB597E Protein Evolution Protein classification 1

20

SCOPhttp://www.scop.mrc-lmb.cam.ac.uk

• SCOP organizes protein structures in a hierarchy according to evolutionary origin and structural similarity.

• Domains -- extracted from the Protein Data Bank entries.

• Sets of domains are grouped into families: sets domains for which imilarities in structure, function and sequence imply a common evolutionary origin.

Page 21: BMMB597E Protein Evolution Protein classification 1

21

The SCOP hierarchy

• Families that share a common structure, or even a common structure and a common function, but lack adequate sequence similarity – so that the evidence for evolutionary relationship is suggestive but not compelling – are grouped into superfamilies

• Superfamilies that share a common folding topology, for at least a large central portion of the structure, are grouped as folds.

• Finally, each fold group falls into one of the general classes.

Page 22: BMMB597E Protein Evolution Protein classification 1

22

Major classes in SCOP

• – secondary structure all helical• – secondary structure all sheet• / – helices and sheets, but in different parts of

structure• + – contain -- supersecondary structure• ‘small proteins’ – which often have little

secondary structure and are held together by disulphide bridges or ligands; for instance, wheat-germ agglutinin)

Page 23: BMMB597E Protein Evolution Protein classification 1

23

Summary of SCOP hierarchy

• Class• Fold• Superfamily• Family• Domain

Page 24: BMMB597E Protein Evolution Protein classification 1

24

SCOP classification of flavodoxin

Protein: Flavodoxin from Clostridium beijerinckii [TaxId: 1520]Lineage:Root: scop Class: Alpha and beta proteins (a/b) [51349] Mainly parallel beta sheets (beta-alpha-beta units) Fold: Flavodoxin-like [52171] 3 layers, a/b/a; parallel beta-sheet of 5 strand, order 21345 Superfamily: Flavoproteins [52218] Family: Flavodoxin-related [52219] binds FMN Protein: Flavodoxin [52220] Species: Clostridium beijerinckii [TaxId: 1520] [52226] PDB Entry Domains:5nul complexed with fmn; mutant chain a [31191]

2fax complexed with fmn; mutant chain a [31194]

… many others

Page 25: BMMB597E Protein Evolution Protein classification 1

25

Clostridium beijerinckii Flavodoxin(stereo pair)

Page 26: BMMB597E Protein Evolution Protein classification 1

26

Flavodoxin NADPH-cytochrome P450 reductase

same superfamily, different family

Page 27: BMMB597E Protein Evolution Protein classification 1

27

Flavodoxin CHEY same fold, different superfamily

Page 28: BMMB597E Protein Evolution Protein classification 1

28

Flavodoxin Spinach ferredoxin reductase

same class, different folds

Page 29: BMMB597E Protein Evolution Protein classification 1

29

Flavodoxin in the SCOP hierarchy• To give some idea of the nature of the similarities expressed by the

differentlevels of the hierarchy

• Flavodoxin from Clostridium beijerinckii and NADPH-cytochrome P450 reductase are in the same superfamily, but different families.

• Flavodoxin and the signal transduction protein CHEY are in the same fold category, but different superfamilies.

• Flavodoxin and Spinach ferredoxin reductase are in the same class – + – but have different folds.

Page 30: BMMB597E Protein Evolution Protein classification 1

30

CATH presents a classification scheme similar to that of SCOP

• CATH = Class, Architecture, Topology, Homologous superfamily, the levels of its hierarchy.

• In CATH, proteins with very similar structures, sequences and functions are grouped into sequence families.

• A homologous superfamily contains proteins for which similarity of sequence and structure gives evidence of common ancestry

• A topology or fold family comprises sets of homologous superfamilies that share the spatial arrangement and connectivity of helices and strands

• Architectures are groups of proteins with similar arrangements of helices and sheets, but with different connectivity. For instance, different four -helix bundles with different connectivities would share the same architecture but not the same topology in CATH

• General classes of architectures in CATH are: . , - (subsuming the / and + classes of SCOP), and domains of low secondary structure content.

Page 31: BMMB597E Protein Evolution Protein classification 1

31

Do different classification schemes agree?• To classify protein structures (or any other set of objects) you

need to be able to measure the similarities among them. • The measure of similarity induces a tree-like representation of

the relationships. • CATH, SCOP, DALI and the others, agree, for the most part, on

what is similar, and the tree structures of their classifications are therefore also similar.

• However, even an objective measure of similarity does not specify how to define the different levels of the hierarchy.

• These are interpretative decisions, and any apparent differences in the names and distinctions between the levels disguise the underlying general agreement about what is similar and what is different.