protein sequences

Post on 06-Feb-2016

67 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Protein Sequences. The Genetic Code. The natural extension of the genetic code…. Overall amino acid structure Amino acid stereochemistry Amino acid sidechain structure & classification ‘ Non-standard ’ amino acids Amino acid ionization Formation of the peptide bond Disulfide bonds - PowerPoint PPT Presentation

TRANSCRIPT

Protein Sequences

The Genetic Code

The natural extension of the genetic code…

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

Q: How many amino acids are there?

The twenty alpha-amino acids that are encoded by the genetic code share the generic structure…

Atom nomenclature within amino acids (as used within the PDB)

CA

CB

C

O

N

OG1CG2

7

CACBCGCDCE

NZ

C

O, OXT

N

ATOM 1 N PRO A 2 22.126 26.173 0.149 1.00 28.61 N ATOM 2 CA PRO A 2 21.848 26.169 1.597 1.00 27.50 C ATOM 3 C PRO A 2 20.582 25.363 1.875 1.00 26.69 C ATOM 4 O PRO A 2 19.724 25.215 0.973 1.00 26.48 O ATOM 5 CB PRO A 2 21.874 27.626 1.981 1.00 28.55 C ATOM 6 CG PRO A 2 21.899 28.434 0.721 1.00 29.65 C ATOM 7 CD PRO A 2 21.761 27.465 -0.440 1.00 28.77 C ATOM 8 N LYS A 3 20.499 24.795 3.073 1.00 22.80 N ATOM 9 CA LYS A 3 19.360 23.972 3.469 1.00 22.07 C ATOM 10 C LYS A 3 18.610 24.700 4.597 1.00 18.49 C ATOM 11 O LYS A 3 19.262 25.140 5.536 1.00 17.98 O ATOM 12 CB LYS A 3 19.669 22.668 4.145 1.00 24.58 C ATOM 13 CG LYS A 3 20.495 21.675 3.360 1.00 36.59 C ATOM 14 CD LYS A 3 20.652 20.419 4.220 1.00 48.23 C ATOM 15 CE LYS A 3 19.341 19.779 4.628 1.00 53.43 C ATOM 16 NZ LYS A 3 19.502 19.003 5.891 1.00 57.07 N ATOM 17 N ALA A 4 17.319 24.698 4.389 1.00 17.98 N ATOM 18 CA ALA A 4 16.468 25.371 5.384 1.00 17.19 C

The .pdb file format

Ato

m n

um

ber

Ato

m n

ame

Res

idu

e n

ame

Ch

ain

ID

Res

idu

e n

um

ber

X-c

oo

rdin

ate

Y-c

oo

rdin

ate

Z-c

oo

rdin

ate

Occ

up

ancy

B-f

acto

r(a

ka T

emp

fac

tor)

Ato

m t

ype

Rec

ord

nam

e

Lys

ArgTo Do: Learn how to name the atoms of all amino acids.Hint: look at any generic PDB file to get a list of atom types.

-The alpha carbon (CA) is immediately adjacent the most oxidized carbon (which is the CO2- in amino acids)

-All the other heavy nuclei are named according to the Greek alphabet.

-Put otherwise, LYS can be described by: CA, CB, CG, CD, CE, and NZ.

Atom nomenclature within amino acids (as used within the PDB)

Numbers are used to discriminate between similar positions…

CB

CG

OD1 ND2

CB

CG

ND1

CE1NE2

CD2

Here are some harder examples…

CB

CGCD2

CE2CZ

OH

CD1

CE2

CB

CGCD2

CD1

NE1CE2 CH2

CE3

CZ2

CZ3

CB

CD2CD1

CG

CB

OG1CG2

Side-chain torsion angles-With the exception of Ala and Gly, all sidechains also have torsion angles.

-To Do on your own:- Count the # of chi’s in each amino acid.- Determine why Ala doesn’t have a chi angle.

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

Fischer projection

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

Terminologies

• Hydrophobic: Amino acids are those with side chains that do not like to reside in an aqueous environment. Hence, these amino acids buried within the hydrophobic core of the protein.

– Aliphatic: Hydrophobic group that contains only carbon or hydrogen atoms.

– Aromatic: A side chain is considered aromatic when it contains an

aromatic ring system.

• Polar: Polar amino acids are those with side-chains that prefer to reside in an aqueous environment and hence can be generally found exposed on the surface of a protein.

It’s actually a bit more complicated…

-OH -SH

Twenty Amino acids

Hydrophobic (non polar)

Polar

Polar Neutral Charged

Aromatic

(PHE, TRP)

Aliphatic

(ALA, VAL, LEU, ILE, MET, PRO)

Amide Acidic Basic(ASN, GLN) (THR, SER) (CYS) (ASP, GLU) (HIS,

LYS,ARG)

TYR: Amphipathic

GLY: Unclassifiable

HINT: You should definitely know this!!!

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

Not uncommon amino acids in biochemistry, but they are not encoded within the genetic code (meaning not incorporated into proteins)…

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

Primary structure = the complete set of covalent bonds within a protein

Polypeptides

Linear arrangement of n amino acid residues linked by peptide bonds.

Polymers composed of two, three, a few, and many amino acid residues are called as dipeptides, tripeptides, oligopeptides and polypeptides.

Proteins are molecules that consist of one or more polypeptide chains.

Q: why is the pentapeptide SGYAL different than LAYGS?

Amino acid to Dipeptide

Amino Acid 1 Amino Acid 2

Peptide bond is the amide linkage that is formed between two amino acids, which results in (net) release of a molecule of water (H2O).

The four atoms in the yellow box form a rigid planar unit and, as we will see next, there is no rotation around the C-N bond.

Peptide bond

Note: this chemistry will not work as

drawn!

The peptide bond has a partial double bond character, estimated at 40% under typical conditions. It is this fact that makes the peptide bond planar and rigid.

A quick aside…

+

+

+

+

A horrible leaving group

A viable leaving group

+

+

..

..

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

-- The primary structure is a complete description of the covalent bond network within a protein.

-- This is almost(!) completely described by the sequence of amino acids.

-- If you know that the protein is AVG…, you can look up the structures of A, V and G, plus what you know about peptide bonding allows you to complete the covalent bond structure.

-- So, when does the primary structure not fully describe the covalent bond network?

-- BTW, this is a HUGE pet peeve of mine…there is no such thing as a primary sequence, despite its rather common usage (including in journal article titles…UGG!).

A primary sequence implies a secondary sequence, which is nonsense. While there is of course primary, secondary, tertiary and quaternary structures, there is only the “sequence”.

1. Overall amino acid structure2. Amino acid stereochemistry3. Amino acid sidechain structure & classification4. ‘Non-standard’ amino acids5. Amino acid ionization6. Formation of the peptide bond7. Disulfide bonds8. Comparing protein sequences to describe

evolutionary processes.

Multiple sequence alignments

Given the sequences:

INDUSTRYINTERESTINGIMPORTANT

One example of a MSA is: But is it better than:

IN-DUST--RY INDU--ST-RYINTERESTING INTERESTINGIMPOR--TANT IMPOR-T-ANT

Multiple sequence alignments

I-N-DU-ST-RY I--NDU-ST-RY-I-NTERESTING I--NTERESTINGIMPO-R--TANT I-MPO-R--TANT

IN-DUTS--RY INDU--ST-RYINTERESTING INTERESTINGIMPOR--TANT IMPOR-T-ANT

I-NDUS--T-RY- I-N--D--U-S-T-RYINT-ERES-TING I-N-TE-RE-S-TINGIMPOR--TAN--T -M-PO--RTA-NT---

Multiple sequence alignments

Possible MSA Entire column can NOT have only gaps!

I-N-DU-ST-RY I--NDU-ST-RY-I-NTERESTING I--NTERESTINGIMPO-R--TANT I-MPO-R--TANT

Can NOT move residues around Possible

IN-DUTS--RY INDU--ST-RYINTERESTING INTERESTINGIMPOR--TANT IMPOR-T-ANT

Very few matches! Too many gaps!

I-NDUS--T-RY- I-N--D--U-S-T-RYINT-ERES-TING I-N-TE-RE-S-TINGIMPOR--TAN--T IM-PO--RTA-NT---

Which alignment pairs make the most sense?

AVGTLEVLASID

AVGTLEEKWVKV

VS.

A-VT-G-R-L-EAA-TA-Q-V-IE

AVTG-RLEAATAQ-IE

VS.

AVWF----VLIMALWFAMVFILIM

ESQG----KTDDTQADGKCRTD

VS.

More similar amino acids

Fewer gaps Gap location makes more sense because gaps are less frequent in nonpolar regions.

A multiple sequence alignment:-CAPSRPLNENDDGR-QAFELIGTAVNM...-CVPGRGEMEHDD-RDQVLELFGTVVNL...-AVPKRAALQNDDGR-QGWELYGTVSAQ...-AVPTKMNCFNDDGR-QSVNLIGTVSGN...-ILPARTSMCNDDGR-QTIEMKGTPAGG...--APGK--NGHKLV--Q-FELKGTYSRT...AFAPRRIKMVNKLGR-QNFTLLGTFERT...AYRPDRCNTCNKLGR-QDVELMGTDART...-YRPEEWFGENKLGR-QSAELIGTDERS...--APL-ETYWPKLGR-QTGALAGTNSAV...--RPY-KAGWNKLGR-QSYELGGTNPYI...---PARAKNMG---R-QSYHL--TMEWQ...

Chothia & Lesk. EMBO J. 5:823-826 (1986).

An example multiple sequence alignment.Conserved residues are indicated by color. Note that gaps tend to cluster together.

Also gaps at the N- and C-terminal ends are more common. Why?

Regular expressions and sequence logos.Regular expressions provide a coarse-grain summary of an alignment segment.

Sequence logos essentially due the same, but without information loss(cf. http://en.wikipedia.org/wiki/Sequence_logo).

A phylogenetic tree describes an evolutionary process.But from a more pragmatic viewpoint, it also visually describes the similarities and

dissimilarities between sequences within a multiple alignment.

top related