slide 1

Benjamin GoodBenjamin Good

March 17, 2008March 17, 2008

The Genetic Code

• Sequence constructed from 4 “letters” known as nucleotides or bases, denoted “A”, “G”, “C”, “U” / ”T”

• These letters form fixed length “words” known as codons.

• Groups of codons form “sentences” which encode proteins.

Codon

The Genetic Code

• A given codon can either stand for a specific amino acid or act as a “start/stop codon”, which signals either the beginning or end of a protein’s code respectively.

• There are 4*4*4=64 different codons but only 20 amino acids to code for, making a total of 21 different possible meanings for a given codon (including start/stop).

• How are codons distributed among the 21 different categories?

The Genetic Code• The “Canonical Code”

• But why this arrangement and not another?

• Crick: Canonical code is a frozen artifact of a code that was “good enough” to work

Why the canonical code?

• An alternative is that the canonical code itself evolved to optimize for some selected trait.

• Noting the connection between similar codons and similar amino acids, several researchers hypothesized that the canonical code evolved to optimize against copying/transcription errors.

The Polar Requirement

• Woese and Alf-Steinberger came up with a measure for error susceptibility in genetic code based on hydrophobicity.

• A given codon is subject to a single mutation. The polar difference between the new amino acid and the old one is calculated.

• The “error” resulting from the mutation is taken as the distance squared (mean squared distance).

How Optimal is the Canonical Code?

• Unfortunately, Alf-Steinberger’s results have not been reproducible.

• The first reproducible “test” of the polar requirement was published by Haig and Hurst in 1991.

• Using this method, they calculated the total error for a large sample of possible code assignments.

• Out of 10,000, only twoother codes had lowererror values than thecanonical code!

One in a million?

• Freeland and Hurst built upon H&H’s model to introduce more realistic assumptions.

• Two types of code errors possible: transition andtransversion.

• Introduced weighting fortwo types of errors because they are not equally probable in nature.

• Also introduced bias towardsmistranslation rather thanmutation (higher rates oferrors in 1st and 3rd slots)

One in a million?

Weighted errors make the canonicalcode even more optimized relativeto the rest.

Peak efficiencyAround w = 3

One in a million?

Out of a sample

of 1,000,000

random codes,

only 1 had a

lower error value

than the CC!

It was relatively

far away in

search space,

but behaved

similarly to CC.

Beyond the Polar Requirement

• In the paper we read for class, Freeland and Hurst question previous studies (including their own).

• Is the polar requirement a biased measurement?• Is using the (W)MSD a biased measurement?• Some biosynthetic acids might be tied to particular

codons, so code space could be artificially symmetric.

Proposed a new measurement based on PAM matrices, which measure the “similarity” of two amino acids on a functional level.


General error metric:

A code’s total error =

αi is the number oftransition errors leadingto substitution i.i.e. U ↔ C,A↔G

βi is the number oftransversion errors leadingto substitution i.i.e. U,C ↔ A,G

ei is the physical error resulting from substitution i

Polar requirement

PAM matrix


• Results:

PAM Matrix Polar Requirement

Far from overturning the adaptive hypothesis, this new study showed thecanonical code to be even more optimized than previously thought!

Other optimizations…

• Studies of the assignment of stop codons found that the canonical code is highly optimized against frameshift and nonsense mutations. (S. Naumenko et al., 2008)

• Furthermore, these same optimizations against frame shift errors allow the CC to be more efficient at encoding parallel information on top of a protein coding sequence. (Itzkovitz and Alon, 2007)

Is the canonical code optimized?

• YES!

• But many aspects are still unclear – e.g. a mechanism for code selection.

• Conditions in precanonical times are still relatively unknown and the canonical code seems to be universally adhered to in modern organisms.

slide 1

Technology

canonical code

proteins code

code space

code selection

genetic code sequence

types of code errors

polar requirement results

polar requirement woese