biological coding theory error-control code models for prokaryotic

22
Coding Theory and Protein Coding Theory and Protein Synthesis Synthesis Avogadro-Scale Engineering: Form and Function Avogadro-Scale Engineering: Form and Function November 18, 19 2003 November 18, 19 2003 Elebeoba E. May Elebeoba E. May Computational Biology Department Computational Biology Department Sandia National Laboratories Sandia National Laboratories *[email protected] *[email protected] Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.

Upload: trinhdiep

Post on 14-Feb-2017

226 views

Category:

Documents


2 download

TRANSCRIPT

Coding Theory and Protein Coding Theory and Protein SynthesisSynthesis

Avogadro-Scale Engineering: Form and FunctionAvogadro-Scale Engineering: Form and FunctionNovember 18, 19  2003November 18, 19  2003

Elebeoba E. MayElebeoba E. May

Computational Biology DepartmentComputational Biology DepartmentSandia National LaboratoriesSandia National Laboratories

*[email protected]*[email protected]

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.

AgendaAgenda It is the glory of God to conceal a matter; to search out a matter is the glory of kings. Proverbs 25:2 (NIV)

• Error Control at Diverse Molecular Scales

• Coding Theory Models of Protein Synthesis– Gatlin– Yockey– May et al.

• Applications of Coding Theory to– Genetic Classification– Molecular Computation– Construction and Control in Protein Synthesis

Nucleotides: Did nature select a parity check code?

D. A. Mac Dónaill : “ Numerical Interpretation of nucleotides depicted as positions on a B^4 hypercube: (a) even-parity nucleotides; (b) odd-parity nucleotides.  The natural alphabet is structured as an error-checking code.”

*D.A. Mac Dónaill, “A parity code interpretation of nucleotide alphabet composition,” Chem. Comm. (2002) 2062-2063 and http://www.tcd.ie/Chemistry/People/macdonaill/

Protein: Degeneracy of the genetic code

B. Hayes : “how quickly a biochemical puzzle … was reduced to an abstract problem in symbol manipulation.” B. Hayes, “The Invention of the Genetic Code,” Sc. Am. 1998 (Physicist George Gamow and coding-theorist Solomon W. Golomb. Experimental evidence from Marshall W. Nirenberg and J. Heinrich Matthaei, NIH)

http://ww

w.people.virginia.edu/~rjh9u/code.htm

l

Protein: Information theory and binding sites

T. D. Schneider : “Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation,” Nucleic Acids Research, 2001, Vol. 29, No. 23 4881-4891

Comparison of microbial genome base mutation rate to genome size: exhibits power law behavior; inverse relation between genome size and base mutation rate.

Mutation RatesMutation Rates• RNA viruses: 1 - 0.1• DNA microbes: 1/300• Higher eukaryotes: 1/300 EfGn

Genome: Increased length, increased fidelity

Comparison of higher eukaryotic genome base mutation rate to genome size: inverse relation between genome size and base mutation rate.

G. Battail: “… increasing the codeword length results in a decreasing probability of error…”

Evidence: Is there evidence of error control in protein synthesis process?

Liebovitch et al. 1996, Rosen and Moore 2003 computational experiments did not find evidence for linear block codesApproach not comprehensive, did not consider convolutional coding or noiseMay et al. Looked for optimal generator for translation initiation sitesHighly probable for encoding model not to conform to known error control codes.

AgendaAgenda It is the glory of God to conceal a matter; to search out a matter is the glory of kings. Proverbs 25:2 (NIV)

• Error Control at Diverse Molecular Scales

• Coding Theory Models of Protein Synthesis– Gatlin– Yockey– May et al.

• Applications of Coding Theory to– Genetic Classification– Molecular Computation– Construction and Control in Protein Synthesis

(eukaryotes)

Central Dogma of Genetics = Genetic Information Central Dogma of Genetics = Genetic Information TransmissionTransmission

(http://www-stat.stanford.edu/~susan/courses/s166/central.gif)

Encode

Channel

B

Decode

A

Coding Theory Models of Protein Synthesis Gatlin, LL., Information theory and the Living System. 1972.

Yockey, Hubert, Information Theory and Molecular Biology. 1992

Genetic Encoder

Coding Theory View of Protein Synthesis, May et al., JFI 2004

Genetic Information

Errors

Genetic Channel

3’AUG UAA

mRNA

Genetic Decoder

Principal Hypothesis:Principal Hypothesis: If mRNA is viewed If mRNA is viewed as a noisy encoded signal, it is feasible to use as a noisy encoded signal, it is feasible to use principles of error control coding theory to principles of error control coding theory to interpret the genetic translation initiation interpret the genetic translation initiation mechanismmechanism

111-000-000-111

111-001-000-110

1-0-0-1

k-bitInformation

Errors!111-001-000-110

n-bitInformation

Noise+n-bitInformation

A B

~ k-bitInformation

Channel

Encoder

Decoder

1-0-0-1

1-0-0-1

Engineering Communication System

Error Control

111-000-000-111

111-001-000-110

1-0-0-1

k-bitInformation

Errors!111-001-000-110

n-bitInformation

Noise+n-bitInformation

A B

~ k-bitInformation

Channel

Encoder

Decoder

1-0-0-1

1-0-0-1

Engineering Communication System

Error Control

????????

AgendaAgenda It is the glory of God to conceal a matter; to search out a matter is the glory of kings. Proverbs 25:2 (NIV)

• Error Control at Diverse Molecular Scales

• Coding Theory Models of Protein Synthesis– Gatlin– Yockey– May et al.

• Applications of Coding Theory to– Genetic Classification– Molecular Computation– Construction and Control in Protein Synthesis

• Efficient Coding for the Desoxyribonucleic Channel (S. W. Golomb 1962)– Applied Biorthogonal codes to genetic

coding problem (the codon to amino acid mapping challenge)

• Andrzej K. Konopka (1984)• Gerard Battail • Table-Based Convolutional Code for E. coli

Promoter (P. Bermel) – Based on the informational content of E.

coli promoter, approximates the coding rate for promoter region as 1/9.

– Developed a possible 1/5 binary code for E. coli promoter region.

Biological Coding TheoryBiological Coding Theory David Loewenstern, et. al

• Compression for DNA sequence classification

Leonard Adleman, et al.; Lila Kari, et al.• Molecular computation• Encoding for DNA computing• Error-control coding

Thomas Schneider, et al.• Biological information theory• Error-control via sphere packing

TransmissionTransmission

StorageStorage

Error-Control Coding Based MethodsError-Control Coding Based Methods

Horizontal axis is position relative to the first base of the initiation codon.Vertical axis is the mean of the aligned minimum Hamming distance values by position, for the 3 sequence groups (Hamming distance = # of positions where two vectors differ)

Coding Theory in RBS ClassificationCoding Theory in RBS Classification

AUGSD

NRD DB

May et al., BioSystems 2004

b-15, b-14, …, b-11, … , b-1, A U G

b-15 b-14 b-13 b-12 b-11 b-10 b-9 b-8 b-7

Davg-15 Davg-14 ………. Davg-

11

s

59.065

40.935

73.81

26.19

62.105

37.895

50 50

0

10

20

30

40

50

60

70

80

Correct Classification Incorrect Classification

PDF PDF (p=0.5) CDF CDF (p=0.5)

Coding Theory in RBS ClassificationCoding Theory in RBS Classification

v1

v2

v1

v2v1

v2

ligase

Coding Theory and Molecular ComputationCoding Theory and Molecular ComputationLeonard M. Adleman, et al.; Lila Kari, et al.

•Molecular computation•Encoding for DNA computing•Error-control coding

http://www.scs.uiuc.edu/~scott/index_files/ligation.gif

M. Stojanovic and D. Stefanovic, “A deoxyribozyme-based molecular automaton.” Nature Biotech. 2003

•Can achieve computational robustness using coding theory

Construction and control: Quantify and Optimize Protein Translation

InitiationFactors

3’AUG UAA

50s sub-unit

30s sub-unit

Polypeptide Protein

AUGGUGUUG

UAAUAGUGA

3’Leader* Coding Region*Ribosome binding site contained in leader region

Messenger RNA (mRNA)

5’

5’

•Phases of translation: initiation, elongation, termination•Initiation is most time consuming, affects overall gene expression level•Qualitative outline for initiation process exists: 1) 30S + Ifs bind to mRNA and fMet-tRNA; 2) Ternary complex binds 50S subunit; 3) IFs released prior to elongation.

mRNA is the only variable aspect of translation initiation.Information encoded in mRNA determines specificity and efficiency

Construction and control : Quantify and Optimize Protein Translation

AUGGUGUUG5’ 3’

mRNA Leader Region (UTR)

3’..AUUCCUCCACUAG….

Ribosome Binding Site

5’

Downstream box

Non-randomdomain

Modify E.coli Intergenic

AcknowledgmentsAcknowledgments• Collaborators

– NCSU: Mladen Vouk, Donald Bitzer, and Winser Alexander, Ann Stomp

– SNL: Anna Johnston, William Hart, Jean-Paul Watson, Richard Pryor

• NIEHS: John Drake (Mutagenesis data)• Support:

– SNL Tier 1 Seniors Council LDRD/DOE– NSF, Ford Foundation