protein structure prediction matthew betts russell group, university of heidelberg, germany

Protein Structure Prediction

Matthew BettsRussell Group, University of Heidelberg, Germany

www.russelllab.org

http://www.landtag-bw.de/index.html

http://www.cellnetworks.uni-hd.de/pub/start_start_for.php

Sequence

Active/inactive?Binds/does not bind?Substrate specificity?

Function

Structure

• What we do to find out what a protein might be doing

• Looking at sequences, with a particular emphasis on finding out something about the protein structure

• Some background for practical work

What is this about?

• Functional domains (Pfam, SMART, COGS, CDD, etc.)• Intrinsic features

– Signal peptide, transit peptides (signalP)– Transmembrane segments (TMpred, etc)– Coiled-coils (coils server)– Low complexity regions, disorder (e.g. SEG, disembl)

• Hints about structure?

Given a sequence, what should you look for?

“Low sequence complexity”(Linker regions? Flexible? Junk?

Signal peptide(secreted or membrane attached)

Transmembrane segment(crosses the membrane)

Tyrosine kinase (phosphorylates Tyr)

Immunoglobulin domains(bind ligands?)

SMART domain ‘bubblegram’ for human fibroblast growth factor (FGF) receptor 1(type P11362 into web site: smart.embl.de)

Given a sequence, what should you look for?

• Intrinsic features general mean trouble for structure determination, so they are usually skipped• Knock on effect is that structures for large, flexible multi-domain proteins are rare• Structure determination/prediction therefore typically restricted to parts (with exceptions obviously)

3D 3D 3D

What about structure?

algorithm

Sequence Structure

Structure prediction

• Is your sequence homologous to a known structure?

• If yes, then often very good models of structure can be constructed.

• This is what we will do in the practical

Best predictions are by homology

+

algorithm

Homology Modelling

• Identify a homologue of known structure

• Get the best alignment of your sequence to the structure

• Model building– Side-chain replacement– Loop building– Optimisation/relaxation/minimisation

Homology Modelling Steps

Two subtilisin-like serine proteases

Problems with loops

Sanchez et al, Nature Struct. Biol. (Suppl), 7, 986-990, 2001

Based on Sander & Schneider, Proteins, 9, 56, 1991

Sander & Schneider (EMBL, ca. 1990)

1. Compared all known structures to each other using sequence comparison.

2. For each fragment of a particular length & sequence identity, simply asked the question: is the structure similar or different.

3. The line to the right is where one can be 90% confident that an alignment of a particular length & sequence identity

4. Below the line, structures can be either similar or different: the twilight zone.

(Basis for much of the sequence alignment statistics that are now in use today)

The Twilight Zone

…can we find these similarities without known structures if sequence searches fail?

Russell et al, J.Mol. Biol., 1997

sequence identity: 80% 8.8% 4.4%

Similar structures within the twilight zone

Does the sequence“fit” on any of a library of known3D structures?

? ? ? ? ?

>C562_RHOSHTQEPGYTRLQITLHWAIAGL…

Fold Recognition (‘Threading’)

Jones, Taylor, Thornton,Nature, 358, 86-89, 1992.

Fold Recognition (‘Threading’)

Asp

ArgAsp

Phe

Phe

PheGOOD

BAD

Residue pair potentials

• Works some of the time• Probably best at identifying distant

homologues, where sequence identity is in the twilight zone

• Useful sites:– 3D-PSSM, FUGUE, (Gen)-Threader

• Meta predictions are the best - combine all and get a consensus – E.g. bioinfo.pl/meta

Fold RecognitionExecutive Summary

• Is your sequence homologous to a known structure?

• If no then actual models are less accurate, but structural insights still possible

• First, secondary structure prediction

If no homology…

• Neural networks• Inductive logic programming• Spin-glass theory• Human intuition

algorithm

Secondary-structure prediction

E.g. Chou & Fasman, 1974

Helix forming: Glu, Ala, LeuHelix breaking: Pro, GlyStrand forming: Met, Val, IleStrand breaking: Glu, Lys, Ser, His, AsnEtc.Numerical approach + simple protocol = prediction of secondary structure

Said “80%” accuracy. Reality: 50-60%Tested the method on the same proteins used to derive the parameters… big no-no.

Secondary-structure prediction

SS pred

70% accuracy!

Homologous proteins adda lot of information

• Can you simulate folding using physics to predict the structure of a protein

• No, not usually.

• However, advances have been made…

• David Baker, co-workers and subsequent followers: fragment based structure prediction. De novo not ab initio

What about de novo or ab initio prediction?

Preferences learned from all stretches with a similar structure

Predicting Fragments

Database of structures

Fragments matching the target sequence

Assembly of fragmentsSelection of best model

Assembling Fragments

• General trend: increasing accuracy is more a function of data than algorithms

• In other words: as we know more structure, and indeed even sequence data, we get better at predicting

• Probably we will have a perfect algorithm for protein structure prediction when we know all of the answers

• Structural genomics & the generally increased pace of structure predictions means there aren’t many really “new” structures anymore

The Prediction Irony

• Methods have mostly been developed for soluble, globular proteins or domains

• Problems with membrane proteins, low-complexity, etc.

• Many segments in proteins should be studied with other methods:– Signal peptides– TM regions– Coiled-coils– Intrinsic Disorder (e.g. http://dis.embl.de)

Things to Remember

What we use this for…

We aim to:

• Understand molecular interactions• Predict molecular interactions• Focus on those interactions of biomedical importance• Apply tools to large datasets• Use interaction networks predictively

– To predict new interactions– To predict other details like pathologies, toxicities

Your favourite proteinN C

Your second favourite proteinN C

tRNA Synthetase

Histidyl adenylate

Templatesin contact?

Modelled Interaction

Match toknown structure

Match toknown structure

Modelling or predicting interactions by homology

homology(e.g. blast)

homology

homology

homology

X-rayFive component complex

Two-hybrid network

+

Electron microscopy & Mass SpectometryRussell et al, Curr. Opin. Struct Biol. 2004Aloy & Russell, Nature Rev. Mol. Cell. Biol. 2006Taverner et al, Adv Chem. Res. 2008

Prediction of Structures of Complexes

RGS-4

Adding Mechanisms to Interaction

Networks

RGS-3

G/i

G/q

Which piece from which protein?

What does the interaction look like?Who interacts with whom?

How strong? How fast?

PP

Modelled complexes

Aloy & Russell, Nature Rev. Mol. Cell. Biol., 2006.

Bridging the information gap

Kuehner et al, Science, 2010

From Proteomics to Cellular Anatomy?

From Proteomics to Cellular Anatomy?

Kuehner et al, Science, 2010

www.russelllab.org/aasGuide to the amino acids

www.russelllab.org/gtspGuide to Structure Prediction

meta.bioinfo.plMeta server (runs virtually all reliable prediction methods)

Some Links

Sequence

In groups of two or more you will attempt to answer functional questions about a particular protein target

Active/inactive?Binds/does not bind?Substrate specificity?

Function

Structure

Structure Prediction Practicalwww.russelllab.org/wiki

Acknowledgements

www.russelllab.orgCurrent group membersRob Russell (the boss), Matthew Betts, Leonardo Trabuco, Oliver Wichmann, Mathias Utz, Yvonne Lara

AlumniChad Davis, Olga Kalinina, Ricardo de la Vega, Victor Neduva, Evangelia Petsalaki Damien Devos

Complex modeling & interactions collaboratorsPatrick Aloy (IRB Barcelona)Anne-Claude Gavin (EMBL Heidelberg)Peer Bork (EMBL Heidelberg)Luis Serrano (CRG Barcelona)Achilleas Frangakis (Uni Frankfurt)Bettina Boettcher (Edinburgh)

http://www.landtag-bw.de/index.html

http://www.cellnetworks.uni-hd.de/pub/start_start_for.php

protein structure prediction matthew betts russell group, university of heidelberg, germany

Documents

structure similar

sequence homologous

sequence searches

sequence comparison

known structures

sequence alignment statistics

good models of structure

similar structures