molecular biology basics
DESCRIPTION
This note on the basics aspects of molecular biology e.g. fine structure of gene, wobble hypothesis, one gene one polypeptide theory, mechanism of transcription, translation, reverse transcription, different aspects of mutation, transcription in bacteriophase (Lamda phase) etc are compiled from different sources of internet for the students of degree level.TRANSCRIPT
1
History of Genetics - Fine Structure of The Gene
The hereditary units which are transmitted from one generation to the
next generation are called genes. A gene is the fundamental biologic unit,
like the atom which is the fundamental physical unit. Mendel while
explaining the result of his monohybrid and dihybrid crosses, first of all
conceived of the genes as particulate units and referred them by various
names such as hereditary factors or hereditary elements. But his concept
about the gene was entirely hypothetical and he remained ignorant about the physical and chemical nature of gene.
Even before the rediscovery of Mendel’s laws, it was already established
that chromosomes have a definite role in the inheritance because it was
found that chromosomes were the only link between one generation and
the next generation and a diploid chromosome set consists of two
morphologically similar sets, one is derived from the mother and the other
from the father at fertilization. Later on, a parallel behaviour among chromosomes and genes was discovered.
Earlier workers proposed various hypotheses to explain the nature of genes. For instance, De Vries postulated one gene one character hypothesis according to which a particular trait of an individual is controlled by a particular gene. Bateson and Punnett proposed the presence or absence theory. According to them, in a cross the character which dominates the other has a determiner, while, the recessive character has no such determiner. But all the theories were discarded by Morgan, who produced the particulate gene theory in 1926. He considered genes as corpuscles which are arranged in a linear order on the chromosomes and appear like beads on a string. Each gene was supposed to be different from all others. The particulate theory of gene was widely accepted and supported by cytological observations. But, the discovery of DNA molecule as a sole carrier of genetic informations has altogether discarded the Morgan's theory. Therefore, before defining the gene it will be advisable consider the both classical as well as modern definitions of gene.
Morgan's introduction of the fruit fly to genetics revolutionized it because the fly's
rapid life cycle and minute size enabled the scale of experimentation to be markedly
increased. Contrast the whole year required between generations of peas and corn
with the two weeks needed for the fruit fly. This meant that in a short time over a
hundred characters had been studied and many mutants found. Drosophila became
as a result the most prominent "model organism" of genetics. Bridges oversaw and
maintained the growing stock of the mutant types and made them freely available
internationally. The lab at Columbia, known as the "Fly Room," was an example of
team effort, led by a genial, exuberant boss. Morgan had to undergo quite a
conversion by his team, but the outcome was a giant step forward in genetics,
crowned with the award of the Nobel Prize in 1933.
2
H. J. Muller was less close to Morgan than the others and did not long remain in the
group. Their views on genetics differed. Whereas Morgan was happy to leave to one
side the question of the material basis of the gene, Muller wanted to know the answer.
His pioneer work on the production of mutations by X rays not only won him the
Nobel Prize but offered him the hope of establishing the size of the gene. This
approach was used by the brilliant Russian geneticist N. V. Timoféeff-Ressovsky, in
Germany, to yield an estimate of the "sensitive volume" of the gene as that space
needed by one thousand atoms, or about the size of an average protein.
Unfortunately, as later work revealed, the methodology and interpretation of this
experiment proved faulty.
Drosophila was by no means the only model organism for genetics. In addition to
commercial cereal crops, poultry, mice, and yeast, the bread mold Neurospora
figures prominently in the development of the field. Using this organism, George
Beadle and Edward Tatum concluded that there is a 1:1 relation between a gene and a
given enzyme, thus suggesting that the primary product of a gene is an enzyme. But
for the fine structure analysis of the gene the model system that was to bring the
analysis down to the molecular level was the viral infected colon bacillus (Escherichia
coli). Here the bacterial virus (bacteriophage, or phage) has just one chromosome,
and in mixed infections this chromosome can recombine with one from another, thus
permitting recombination and making fine structure mapping possible. By 1957
Seymour Benzer had used this system to make an estimate of the likelihood of
crossing-over between two mutants one DNA base apart in the bacteriophage T4 to
be 1 in 10,000. His own analysis had then reached 1 in 20,000.
Split gene
Split gene is a gene whose continuity is interrupted. An interrupted gene (also called a split gene) is simply a strand of DNA that contains both introns and exons.
Most higher-level eukaryotes have interrupted genes and have longer introns than exons, creating a gene that is longer than its coding region. Interrupted genes are also found in some bacteria. Some eukaryotes, including yeast, have many uninterrupted regions, as they contain long stretches of exons that create necessary mRNA, leading to the development of proteins. This does not mean, however, they are fully uninterrupted, as tRNA synthesis requires excision of a nucleotide sequence, followed by ligation.
Onco gene
A gene that causes normal cell to become cancerous either because the gene is mutated or because the gene is expressed at the wrong time in development.
3
An oncogene is a gene that, when mutated or expressed at high levels, helps turn a normal cell
into a tumor cell. Many abnormal cells normally undergo a programmed form of death
(apoptosis). Activated oncogenes can cause those cells to survive and proliferate instead.
Most oncogenes require an additional step, such as mutations in another gene, or
environmental factors, such as viral infection, to cause cancer. Since the 1970s, dozens of
oncogenes have been identified in human cancer. Many cancer drugs target those DNA
sequences and their products.
Proto-oncogene A proto-oncogene is a normal gene that can become an oncogene due to mutations or
increased expression. Proto-oncogenes code for proteins that help to regulate cell growth and
differentiation. Proto-oncogenes are often involved in signal transduction and execution of mitogenic signals, usually through their protein products. Upon activation, a proto-oncogene
(or its product) becomes a tumor-inducing agent, an oncogene. Examples of proto-oncogenes include RAS, WNT, MYC, ERK, and TRK.
Activation
The proto-oncogene can become an oncogene by a relatively small modification of its original
function. There are three basic activation types:
*A mutation within a proto-oncogene can cause a change in the protein structure, causing
- an increase in protein (enzyme) activity
- a loss of regulation
* An increase in protein concentration, caused by
- an increase of protein expression (through misregulation)
- an increase of protein (mRNA) stability, prolonging its existence and thus its activity in the
cell
- a gene duplication (one type of chromosome abnormality), resulting in an increased amount
of protein in the cell
*A chromosomal translocation (another type of chromosome abnormality), causing
- an increased gene expression in the wrong cell type or at wrong times
- the expression of a constitutively active hybrid protein. This type of aberration in a dividing stem cell in the bone marrow leads to adult leukemia
Mutations in microRNAs can lead to activation of oncogenes. New research indicates that
small RNAs 21-25 nucleotides in length called microRNAs (miRNAs) can control expression of these genes by downregulating them. Antisense messenger RNAs could theoretically be
used to block the effects of oncogenes.
Classification There are several systems for classifying oncogenes, but there is not yet a widely accepted
standard. They are sometimes grouped both spatially (moving from outside the cell inwards)
and chronologically (parallelling the "normal" process of signal transduction). There are
several categories that are commonly used:
Category Examples Description
Growth factors, or mitogens
c-Sis
Usually secreted by specialized cells to
induce cell proliferation in themselves,
nearby cells, or distant cells. An oncogene
may cause a cell to secrete growth factors
4
even though it does not normally do so. It
will thereby induce its own uncontrolled
proliferation (autocrine loop), and
proliferation of neighboring cells. It may
also cause production of growth hormones
in other parts of the body.
Receptor tyrosine
kinases
epidermal growth factor
receptor (EGFR), platelet-
derived growth factor
receptor (PDGFR), and vascular endothelial growth
factor receptor (VEGFR), HER2/neu
Kinases add phosphate groups to other proteins to turn them on or off. Receptor
kinases add phosphate groups to receptor proteins at the surface of the cell (which
receive protein signals from outside the
cell and transmit them to the inside of the
cell). Tyrosine kinases add phosphate
groups to the amino acid tyrosine in the target protein. They can cause cancer by
turning the receptor permanently on (constitutively), even without signals from
outside the cell.
Cytoplasmic tyrosine kinases
Src-family, Syk-ZAP-70 family, and BTK family of
tyrosine kinases, the Abl
gene in CML - Philadelphia
chromosome
-
Cytoplasmic
Serine/threonine kinases and their
regulatory subunits
Raf kinase, and cyclin-
dependent kinases (through
overexpression).
-
Regulatory GTPases Ras protein
Ras is a small GTPase which hydrolyses
GTP into GDP and phosphate. Ras is activated by growth factor signaling (ie.
EGF, TGFalpha) and acting like a binary switch (on/off) in growth signaling
pathways. Downstream effectors of Ras include Raf, MEK, MEKK, MAPK, ERK,
most of which in turn regulate genes that mediate cell proliferation.
Transcription factors
myc gene -
Conversion of proto-oncogenes:
There are two mechanisms by which proto-oncogenes can be converted to cellular oncogenes:
Quantitative: Tumor formation is induced by an increase in the absolute number of proto-oncogene products or by its production in inappropriate cell types.
Qualitative: Conversion from proto-oncogene to transforming gene (c-onc) with changes in the nucleotide sequence which are responsible for the acquisition of the new properties.
5
History: The first oncogene was discovered in 1970 and was termed src (pronounced sarc as in
sarcoma). Src was in fact first discovered as an oncogene in a chicken retrovirus. Experiments
performed by Dr G. Steve Martin of the University of California, Berkeley demonstrated that
the SRC was indeed the oncogene of the virus.
In 1976 Drs. J. Michael Bishop and Harold E. Varmus of the University of California,
San Francisco demonstrated that oncogenes were defective proto-oncogenes, found in many
organisms including humans. For this discovery Bishop and Varmus were awarded the Nobel
Prize in 1989.
Pseudogenes
What are pseudogenes? Pseudogenes are genomic DNA sequences similar to normal genes but non-functional; they are regarded as defunct relatives of functional genes.
What causes pseudogenes to arise? There are two accepted processes during which pseudogenes may arise:
• duplication - modifications (mutations, insertions, deletions, frame shifts) to the DNA sequence of a gene can occur during duplication.
These disablements can result in loss of gene function at the
transcription or translation level (or both) since the sequence no
longer results in the production of a protein. Copies of genes that
are disabled in such a manner are termed non-processed or
duplicated pseudogenes. • retrotransposition - reverse transcription of an mRNA transcript with
subsequent re-integration of the cDNA into the genome. Such copies
of genes are termed processed pseudogenes. These pseudogenes
can also accumulate random disablements over the course of
evolution.
6
Why are pseudogenes interesting? In any study of molecular evolution, it is necessary to compare and contrast genes from a variety of organisms to gauge how the organisms have adapted to ensure their survival. Pseudogenes are vitally important since they provide a record of how the genomic DNA has been changed without such evolutionary pressure and can be used as a model for determining the underlying rates of nucleotide substitution, insertion and deletion in the greater genome.
7
How can a pseudogene be identified? Once gene sequences have been identified in the genome, it is possible to use sequence alignment programs (such as FASTA or BLAST) to detect matching regions in the nucleotide sequence. These matching regions are potential gene homologs and are termed pseudogenes if there is some evidence that either of the causes (see above) are satisfied.
In these analyses, genes from annotated genomes and protein databases have first been clustered into paralog families and then used to survey whole genomes for copies or homologs. For each potential pseudogene (or fragment) match, a number of steps have been taken to assess its validity as a pseudogene. These steps include checking for overcounting and repeat elements, overlap on the genomic DNA with other homologs and cross-referencing with exon assignments from genome annotations. The resulting pseudogenes or pseudogenic fragments have then been assigned to the paralog family of the most homologous gene (or assigned to a singleton gene if the probe gene has no obvious paralog).
Relating pseudogenes to known protein structures In a number of cases, more distant evolutionary and functional relationships between proteins can only be elucidated through the analysis of the folds that their structures adopt. While it must not be forgotten that the assignment of function to a gene is often implied from that of a gene with a homologous sequence, the added information that protein structures can provide is very desirable in genome annotation.
In the case of pseudogenes, structural information can give extra evolutionary clues and facilitate analysis of the scope of folds in the pseudogene population ("pseudo"-folds) in contrast to those observed for the genes themselves. Where possible, i.e. where a gene can be matched to a SCOP domain, assignment of fold to a pseudogene or pseudogenic fragment is based upon the assignment of the most homologous gene.
8
Wobble Hypothesis
The triplet code is a degenerate one with many more codons than the number of amino acid
types coded. An explanation for this degeneracy is provided by the 'wobble hypothesis'
proposed by Crick (1966). Since there are 61 codons specifying amino acids, the cell should
contain 61 different tRNA molecules, each with a different anticodon. Actually, however, the
number of tRNA molecule types discovered is much Jess than 61. This implies that the
anticodons of some tRNAs read more than one codon on mRNA. According to the wobble
hypothesis only the first two positions of a triplet codon on mRNA have a precise pairing with
the bases of the tRNA anticodon. The pairing of the third position bases of the codon may be
ambiguous, and varies according to the nucleotide present in this position. Thus a single
tRNA type is able to recognize two or more codons differing only in the third base.
The anticodon UCG of serine tRNA recognizes two codons, AGC and AGU.
The bonding between UCG and AGC follows the usual Watson-Crick pairing pattern. In
UCGAGU pairing, however, hydrogen bonding takes place between G and U.
This is a departure from the usual Watson-Crick pairing mechanism where G pairs with C and
A with U. Such interaction between the third bases is referred to as 'wobble pairing'.
The degeneracy of the code is not random. Mostly, the different co dons for a particular
amino acid have the same first two letters (leucine, serine and arginine are exceptions). Thus
the first two letters of all the four codons for valine are GU and for alanine GC.
Wobble base pair
In molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two
nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-
uracil, inosine-adenine, and inosine-cytosine (G-U, I-U, I-A and I-C). The thermodynamic
stability of a wobble base pair is comparable to that of a Watson-Crick base pair. Wobble
base pairs are fundamental in RNA secondary structure and are critical for the proper
translation of the genetic code.
9
Fig. Wobble base pairs for inosine and guanine
tRNA wobble
In the genetic code there are 43 = 64 possible codons (tri-nucleotide sequences). For
translation each of these codons requires a tRNA molecule with a complementary anticodon.
If each tRNA molecule paired with its complementary mRNA codon using canonical Watson-
Crick base pairing, then 64 types (species) of tRNA molecule would be required. Since most
organisms have fewer than 45 species of tRNA[1], some tRNA species must pair with more
than one codon. In 1966 Francis Crick proposed the Wobble hypothesis to account for this.
He postulated that the 5' base on the anticodon, which binds to the 3' base on the mRNA, was
not as spatially confined as the other two bases, and could thus have non-standard base
pairing.[2]
10
As an example yeast tRNAPhe
has the anticodon 5'-GmAA-3' and can recognize the codons 5'-
UUC-3' and 5'-UUU-3'. It is, therefore, possible for non-Watson–Crick base pairing to occur
at the third codon position; i.e. the 3' nucleotide of the mRNA codon and the 5' nucleotide of
the tRNA anticodon.
tRNA Base pairing schemes
The original wobble pairing rules, as proposed by Crick. Watson-Crick base pairs are shown
in bold, wobble base pairs in italic:
tRNA 5' anticodon base mRNA 3' codon base
A U
C G
G C or U
U A or G
I A or C or U
Revised pairing rules
tRNA 5' anticodon base mRNA 3' codon base
G U,C
C G
k2C A
A U,C,(A),G
unmodified U U,(C),A,G
xm5s2U,xm5Um,Um,xm5U A,(G)
xo5U U,A,G
I A,C,U
References
1. http://gtrnadb.ucsc.edu/
2. Crick F (1966). "Codon–anticodon pairing: the wobble hypothesis". J Mol Biol 19 (2):
548–55. PMID 5969078. http://profiles.nlm.nih.gov/SC/B/C/B/S/_/scbcbs.pdf.
• Varani G, McClain W (2000). "The G × U wobble base pair. A fundamental building block of RNA structure crucial to RNA function in diverse biological systems".
EMBO Rep 1 (1): 18–23. doi:10.1093/embo-reports/kvd001. PMID 11256617. http://www.nature.com/cgi-
taf/DynaPage.taf?file=/embor/journal/v1/n1/full/embor635.html
11
One Gene-One Polypeptide Hypothesis
In 1941, George Beadle and Edward Lawrie Tatum proposed the one gene-one enzyme
theory. The four main tenets of this theory (as modified by Tatum in 1959) were:
• All biochemical processes in all living organisms are under genetic control.
• All biochemical reactions in an organism are resolvable into separate steps.
• Each step or reaction is under the control of a single gene.
Mutation of a single gene results in the loss of function of the appropriate enzyme. In other
words, each gene controls the reproduction, function, and specificity of a particular enzyme.
The theory was based on results originally obtained from Neurospora crassa, a fungus that
was grown in a medium containing only the bare minimum of nutrients necessary (the fungus
being capable of manufacturing the rest). After inducing mutations in the mold using
radiation, some of the progeny were unable to grow on the medium. By testing with different
supplements, it was found that the mutants had lost the ability to manufacture a single amino
acid. By breeding the lab specimens with wild specimens, it was found that the mutation was
transmitted in a simple Mendelian fashion. It was assumed that the ability to synthesize the
appropriate amino acid was caused by the loss of a single enzyme. The work was supported
by similar evidence found in humans, plants, and Drosophila (genus of fruit fly).
The hypothesis was further modified in 1962 by Vernon Ingram, and from it, the one gene-
one polypeptide hypothesis was born. The modification arose from research conducted on
sickle cell anemia and sickle cell trait. In 1949, it was proposed that sickling was caused by a
single gene mutation, which was heterozygous in sickle cell trait individuals and homozygous
in individuals with full sickle cell anemia. Simultaneously, it was also noted that the
hemoglobin from normal individuals and that from sickle cell anemic individuals migrated
differently on an electrophoresis plate, illustrating that there was a physical difference in the
hemoglobin types and supporting the single gene mutation. A normal hemoglobin molecule is
made of four different polypeptide chains--two identical alpha chains and two identical beta
chains. All of the chains are approximately the same length, but they can be distinguished by
their chemical and electrophoretic properties. Each of these chains contains approximately
140 amino acids, and Ingram analyzed them using a modified form of Frederick Sanger's
protein analysis. This technique gave a fingerprint of the different hemoglobin types. The
fingerprint showed that the differences between the two types of hemoglobin could be found
in one peptide section of eight amino acids. When this section was isolated and analyzed, the
12
only difference was in one amino acid (glutamic acid in normal and valine in sickle cell
hemoglobin). The difference between these amino acids was one base in the triplet codon.
Further analysis showed that amino acid changes in one chain were independent of changes in
the other chain, suggesting that the genes determining the alpha and beta chains were located
at different loci. The alpha and beta chains show independent assortment.
From this, it can be seen that hemoglobin is composed of two independent gene products,
each of which is a separate polypeptide. The gene is a section of DNA that determines the
amino acid sequence of a polypeptide. One gene codes for one polypeptide and several
polypeptides may be required for a functional protein or enzyme.
13
TRANSCRIPTION AND TRANSLATION
Central dogma of genetic transmission.
Information flow (with the exception of reverse transcription) is from DNA to RNA via the
process of transcription, and thence to protein via translation.
This process can be divided into two parts:
1. Transcription Before the synthesis of a protein begins, the corresponding RNA molecule is produced by
RNA transcription. One strand of the DNA double helix is used as a template by the RNA
polymerase to synthesize a messenger RNA (mRNA). This mRNA migrates from the nucleus
to the cytoplasm. During this step, mRNA goes through different types of maturation
including one called splicing when the non-coding sequences are eliminated. The coding
mRNA sequence can be described as a unit of three nucleotides called a codon.
2. Translation
The ribosome binds to the mRNA at the start codon (AUG) that is recognized only by the
initiator tRNA. The ribosome proceeds to the elongation phase of protein synthesis. During
this stage, complexes, composed of an amino acid linked to tRNA, sequentially bind to the
appropriate codon in mRNA by forming complementary base pairs with the tRNA anticodon.
The ribosome moves from codon to codon along the mRNA. Amino acids are added one by
one, translated into polypeptidic sequences dictated by DNA and represented by mRNA. At
the end, a release factor binds to the stop codon, terminating translation and releasing the
complete polypeptide from the ribosome.
One specific amino acid can correspond to more than one codon. The genetic code is said to
be degenerate.
Genetic Code
DNA transfers information to mRNA in the form of a code defined by a sequence of
nucleotides bases. During protein synthesis, ribosomes move along the mRNA molecule and
"read" its sequence three nucleotides at a time (codon) from the 5' end to the 3' end. Each
amino acid is specified by the mRNA's codon, and then pairs with a sequence of three
complementary nucleotides carried by a particular tRNA (anticodon).
Since RNA is constructed from four types of nucleotides, there are 64 possible triplet
sequences or codons (4x4x4). Three of these possible codons specify the termination of the
polypeptide chain. They are called "stop codons". That leaves 61 codons to specify only 20
different amino acids. Therefore, most of the amino acids are represented by more than one
codon. The genetic code is said to be degenerate.
1. Genes (DNA) are transcribed into RNA by the enzyme RNA polymerase
• control by promoters
• control by regulatory proteins
• control by "cell state" (G1, G2, S, etc)
2. RNA transcripts are subjected to post-transcriptional modification and control
• rRNA transcript cut into appropriate size classes and initial assembly in
nucleolar organizer
14
• tRNA transcript folds into shape
• mRNA transcripts are capped at 5' end, polyA tail added to 3' end, noncoding
sequences (introns) removed from interior of transcript
• all RNA types must move to the cytoplasm via the nuclear membrane pores
3. mRNA molecules are translated by ribosomes (rRNA + ribosomal proteins) which
match the 3-base codons of the mRNA to the 3-base anticodons of the appropriate
tRNA molecules
• ribosomes initiate using the first AUG codon to start protein synthesis with
methionine
• message is read three bases at a time, consecutive codons, no commas, no
overlap
• peptide bonds are made between adjacent amino acids to produce a growing polypeptide chain
• when a "stop" codon (UAA, UGA, UAG) is encountered, translation ceases 4. Newly synthesized proteins are often modified after translation (post-translation)
• proteins undergo a final conformation adaptation in conjunction with chaperon proteins
• soluble proteins may have sugars added • secreted proteins must be synthesized through the membrane and the initial
portion of the protein is removed in the process
• multimeric proteins must assemble from subunit proteins
• some proteins (e.g. insulin) have a portion of the protein removed and
discarded
5. the protein carries out its function
The first step in gene action is making an RNA copy of the gene, a process called
transcription. A. Only one of the two DNA strands is copied. The copying is catalyzed by an
enzyme, RNA polymerase. Transcription begins at a specific point on the DNA and
terminates at a specific point. The initial product of transcription is called the primary
transcript. Some of the primary transcript codes for proteins and is also called pre-
messenger RNA. It is further processed in the nucleus in eukaryotes and then moves
to the cytoplasm to serve as messenger RNA.
B. The remainder of the primary transcript functions in protein synthesis but does not
code for amino acid sequences.
C. A gene locus contains the information both for the amino acid sequence of a
polypeptide chain and the instructions that regulate the amount of transcription. The 5'
flanking region is the site at which RNA polymerase binds (also called the promoter
region) and is the primary region that regulates the extent of transcription. Many other
proteins, called transcription factors, also bind in the promoter region and regulate
the amount of transcription.
D. In most eukaryotic genes, pre-mRNA has introns that are spliced out. The
remaining exons are joined together to form mRNA. In order to function properly, a cap and a poly-A tail must also be added.
E. When mRNA is read by ribosomes, the direction is 5' to 3'. The DNA strand with
the same nucleotide sequence as the mRNA is often called the sense strand. In transcription, the complementary 3' to 5' strand is used for the template and is
sometimes called the antisense strand. The sense strand is also called the coding
15
strand, since it has the same nucleotide sequence as mRNA (considering T and U to
be equivalent).
III. Translation is the process of converting the information contained in the nucleic acid
code (DNA and RNA) into the amino acid sequences of proteins.
A. Three types of RNA are involved.
1. Messenger RNA (mRNA) carries information for amino acid sequences
from nucleus to cytoplasm.
2. Ribosomal RNA (rRNA), along with various proteins, forms ribosomes.
There are two rRNA components of ribosomes: a large one and a small one.
There are hundreds of copies of the genes that code for rRNA.
3. Transfer RNA (tRNA) are small molecules that provide the key for
translating the nucleic acid code into the protein code.
1. Each tRNA molecule can bind only to one of the 20 amino acids.
That binding is catalyzed by a series of enzymes, each of which
recognizes one tRNA and one amino acid.
2. In protein synthesis, each tRNA is "charged" with its specific amino acid by means of a covalent bond involving the carboxyl group of the
amino acid.
3. Every cell must be able to make amino acyl-tRNAs for each of the
20 amino acids used to make proteins. Most cells make many more
than 20.
B. The genetic code refers to the nucleotide sequences that cause a particular amino
acid to be added to the growing polypeptide chain.
1. Such nucleotide sequences are called codons. A codon consists of a
sequence of three nucleotides. Since mRNA is read 5' —> 3', codons are also
always written in the 5' —> 3' direction.
2. There are 64 possible nucleotide triplets.
a. Sixty-one code for amino acids.
b. Three code for termination of synthesis. No amino acid is added.
c. One codon, AUG, codes for the initiation of synthesis, always with
the amino acid methionine (Met). AUG also codes for Met in the
interior of polypeptides.
d. Many amino acids are coded by several codons. Therefore, the
genetic code is degenerate.
e. All forms of life use the same genetic code; it is universal.
C. In polypeptide synthesis, a ribosome binds to the 5' end of the mRNA and moves
toward the 3' end.
1. The first AUG is the initiation codon and starts the assembly of the
polypeptide chain from the N-terminal end. Only a tRNA with a
complementary sequence (UAC) in the anticodon region will bind, starting the polypeptide with the amino acid Met.
16
2. The ribosome then moves to the next codon. The tRNA with the matching
anticodon binds. The covalent bond that the Met formed with its tRNA is
transferred to the amino group of the second amino acid, forming a peptide
bond.
3. With its amino acid gone, the first tRNA is now released and recycles by
picking up another Met.
4. The ribosome now moves to the third codon, and the process repeats.
5. If the ribosome encounters a termination codon (UAA, UAG, or UGA) as it moves toward the 3' end of the mRNA, the growing polypeptide chain is
released and synthesis is over.
6. The initiation codon sets the reading frame. Thereafter, nucleotides are
read three at a time.
Coding Example
DNA Sequence TAC ATG
CAC GTG
GTG CAC
GAC CTG
TGA ACT
GGA CCT
CTC GAG
CTC GAG
ACT TGA
mRNA Sequence AUG GUG CAC CUG ACU CCU GAG GAG UGA
Amino Acid
Sequence Met Val His Leu Thr Pro Glu Glu Stop
Protein Synthesis: Transcription and Translation
Review
Central Dogma of Molecular Biology
Protein synthesis requires two steps: transcription and translation.
17
DNA contains codes
Three bases in DNA code for one amino acid. The DNA code is copied to produce mRNA.
The order of amino acids in the polypeptide is determined by the sequence of 3-letter codes in
mRNA.
DNA vs RNA
DNA RNA
Sugar: deoxyribose ribose
Bonds with Adenine: thymine uracil
# of Strands: two one
Kinds of RNA
Messenger RNA (mRNA)
Messenger RNA contains genetic information. It is a copy of a portion of the DNA.
It carries genetic information from the gene (DNA) out of the nucleus, into the cytoplasm of
the cell where it is translated to produce protein.
Ribosomal RNA (rRNA)
This type of RNA is a structural component of the ribosomes. It does not contain a genetic
message.
Transfer RNA (tRNA)
Transfer RNA functions to transport amino acids to the ribosomes during protein synthesis.
18
Transcription
Transcription is the synthesis of mRNA from a DNA template.
It is like DNA replication in that a DNA strand is used to synthesize a strand of mRNA.
Only one strand of DNA is copied.
A single gene may be transcribed thousands of times.
After transcription, the DNA strands rejoin.
Steps involved in transcription
DNA unwinds.
RNA polymerase recognizes a specific base sequence in the DNA called a promoter and binds to it. The promoter identifies the start of a gene, which strand is to be copied, and the
direction that it is to be copied.
Complementary bases are assembled (U instead of T).
A termination code in the DNA indicates where transcription will stop.
The mRNA produced is called a mRNA transcript.
Processing the mRNA Transcript
In eukaryotic cells, the newly-formed mRNA transcript (also called heterogenous nuclear RNA or hnRNA) must be further modified before it can be used.
A cap is added to the 5’ end and a poly-A tail (150 to 200 Adenines) is added to the 3’end of
the molecule.
The newly-formed mRNA has regions that do not contain a genetic message. These regions
are called introns and must be removed. Their function is unknown.
19
The remaining portions of mRNA are called exons. They are spliced together to form a
mature mRNA transcript.
The Nucleus
DNA is located in an organelle called the nucleus.
Transcription and mRNA processing occur in the nucleus.
The nucleus is surrounded by a double membrane. After the mature mRNA transcript is
produced, it moves out of the nucleus and into the cytoplasm through pores in the nuclear
membrane.
Translation
Translation is the process where ribosomes synthesize proteins using the mature mRNA
transcript produced during transcription.
Overview
The diagram below shows a ribosome attach to mRNA, and then move along the mRNA
adding amino acids to the growing polypeptide chain.
20
Translation - Details
A mature mRNA transcript, a ribosome, several tRNA molecules and amino acids are shown.
There is a specific tRNA for each of the 20 different amino acids.
Below: A ribosome attaches to the mRNA transcript.
21
A tRNA molecule transports an amino acid to the ribosome. Notice that the 3-letter anticodon
on the tRNA molecule matches the 3-letter code (called a codon) in the mRNA. The tRNA with the anticodon "UAC" bonds with methionine. It always transports methionine. Transfer
RNA molecules with different anticodons transport other amino acids.
A second tRNA molecule bonds to the mRNA at the ribosome. Again, the codes must match.
22
A bond is formed between the two amino acids.
The tRNA bonded to methionine drops off and can be reused later.
23
The ribosome moves along the mRNA to expose another codon (GAU) for a tRNA molecule.
The only tRNA molecule that can bond to the GAU site is a molecule with a CUA anticodon. Transfer RNA molecules with CUA anticodons are specific for asparagine.
24
Asparagine is now added to the growing amino acid chain.
25
Initiation and Termination Codes
An initiation code signals the start of a genetic message. As the ribosome moves along a mRNA transcript, it will not begin synthesizing protein until it reaches an initiation code.
Termination codes signal the end of the genetic message. Synthesis stops when the ribosome
reaches a terminator codon.
Genetic Code
The table below can be used to determine what amino acid corresponds to any 3-letter codon.
Second Base First
Base U C A G
Third
Base
UUU
phenylalanine
UCU
serine
UAU
tyrosine
UGU
cysteine U
UUC phenylalanine
UCC serine
UAC tyrosine
UGC cysteine
C
UUA leucine
UCA serine
UAA stop
UGA stop
A
U
UUG
leucine
UCG
serine
UAG
stop
UGG
tryptophan G
CUU
leucine
CCU
proline
CAU
histidine
CGU
arginine
U
C
CUC
leucine
CCC
proline
CAC
histidine
CGC
arginine C
26
CUA
leucine
CCA
proline
CAA
glutamine
CGA
arginine A
CUG
leucine
CCG
proline
CAG
glutamine
CGG
arginine G
AUU
isoleucine
ACU
threonine
AAU
asparagine
AGU
serine U
AUC isoleucine
ACC threonine
AAC asparagine
AGC serine
C
AUA isoleucine
ACA threonine
AAA lysine
AGA arginine
A
A
AUG (start)
methionine
ACG
threonine
AAG
lysine
AGG
arginine G
GUU valine
GCU alanine
GAU aspartate
GGU glycine
U
GUC valine
GCC alanine
GAC aspartate
GGC glycine
C
GUA
valine
GCA
alanine
GAA
glutamate
GGA
glycine A
G
GUG
valine
GCG
alanine
GAG
glutamate
GGG
glycine G
Mutation
Mutations are changes in the DNA.
Frameshift
A frameshift mutation is usually severe, producing a completely nonfunctional protein.
The priniciple of a frameshift can be explained using the sentence below. If the letters are read
three at a time and one is deleted, the second sentence becomes meaningless.
Original DNA:
Frameshift mutation:
THE BIG RED ANT ATE ONE FAT BUG
THB IGR EDA NTA TEO NEF ATB UG?
Point Mutation
Point mutations involve a single nucleotide, thus a single amino acid.
In the sentence below, eliminating one letter does not change in the remaining three-letter
words and therefore may not cause a significant change in the meaning of the sentence.
27
Original DNA:
Point mutation:
THE BIG RED ANT ATE ONE FAT BUG
THA BIG RED ANT ATE ONE FAT BUG
Silent, Missense, and Nonsense Mutations
Three kinds of point mutations can occur. A mutation that results in an amino acid
substitution is called a missense mutation.
A mutation that results in a stop codon so that incomplete proteins are produced, it is called a
nonsense mutation.
A mutation that produces a functioning protein is called a silent mutation.
28
Mechanism of Reverse Transcription
After the RNA retrovirus enters a host cell, its genomic RNA will be transcribed into a double
stranded DNA and then integrated into the host DNA. The RNA to DNA transcription is called
reverse transcription.
29
Figure 4-J-1. Mechanism of reverse transcription. The entire process is catalyzed by reverse
transcriptase which has both DNA polymerase and RNase H activities.
1. A retrovirus-specific cellular tRNA hybridizes with a complementary region called the primer-binding site (PBS).
2. A DNA segment is extended from tRNA based on the sequence of the retroviral genomic RNA.
3. The viral R and U5 sequences are removed by RNase H. 4. First jump: DNA hybridizes with the remaining R sequence at the 3' end.
5. A DNA strand is extended from the 3' end. 6. Most viral RNA is removed by RNase H.
7. A second DNA strand is extended from the viral RNA.
8. Both tRNA and the remaining viral RNA are removed by RNase H.
9. Second jump: The PBS region of the second strand hybridizes with the PBS region of the
first strand.
10. Extension on both DNA strands. LTR stands for "long terminal repeat".
30
MUTATIONS
Mutations are changes in the DNA sequence of a cell's genome and are caused by radiation,
viruses, transposons and mutagenic chemicals, as well as errors that occur during meiosis or
DNA replication. They can also be induced by the organism itself, by cellular processes such
as hypermutation.
Mutation can result in several different types of change in DNA sequences; these can either
have no effect, alter the product of a gene, or prevent the gene from functioning. Studies in
the fly Drosophila melanogaster suggest that if a mutation changes a protein produced by a gene, this will probably be harmful, with about 70 percent of these mutations having
damaging effects, and the remainder being either neutral or weakly beneficial. Due to the damaging effects that mutations can have on cells, organisms have evolved mechanisms such
as DNA repair to remove mutations.[1]
Therefore, the optimal mutation rate for a species is a trade-off between costs of a high mutation rate, such as deleterious mutations, and the
metabolic costs of maintaining systems to reduce the mutation rate, such as DNA repair enzymes. Viruses that use RNA as their genetic material have rapid mutation rates, which can
be an advantage since these viruses will evolve constantly and rapidly, and thus evade the
defensive responses of e.g. the human immune system.
Mutations can involve large sections of DNA becoming duplicated, usually through genetic recombination.[8] These duplications are a major source of raw material for evolving new
genes, with tens to hundreds of genes duplicated in animal genomes every million years. Most genes belong to larger families of genes of shared ancestry. Novel genes are produced by
several methods, commonly through the duplication and mutation of an ancestral gene, or by recombining parts of different genes to form new combinations with new functions. Here,
domains act as modules, each with a particular and independent function, that can be mixed together to produce genes encoding new proteins with novel properties. For example, the
human eye uses four genes to make structures that sense light: three for color vision and one
for night vision; all four arose from a single ancestral gene. Another advantage of duplicating
a gene (or even an entire genome) is that this increases redundancy; this allows one gene in
the pair to acquire a new function while the other copy performs the original function. Other
types of mutation occasionally create new genes from previously noncoding DNA.
Changes in chromosome number may involve even larger mutations, where segments of the
DNA within chromosomes break and then rearrange. For example, two chromosomes in the
Homo genus fused to produce human chromosome 2; this fusion did not occur in the lineage
of the other apes, and they retain these separate chromosomes. In evolution, the most
important role of such chromosomal rearrangements may be to accelerate the divergence of a
population into new species by making populations less likely to interbreed, and thereby
preserving genetic differences between these populations.
Sequences of DNA that can move about the genome, such as transposons, make up a major
fraction of the genetic material of plants and animals, and may have been important in the
evolution of genomes. For example, more than a million copies of the Alu sequence are
present in the human genome, and these sequences have now been recruited to perform
functions such as regulating gene expression. Another effect of these mobile DNA sequences
is that when they move within a genome, they can mutate or delete existing genes and thereby
produce genetic diversity.
31
In multicellular organisms with dedicated reproductive cells, mutations can be subdivided into
germ line mutations, which can be passed on to descendants through their reproductive cells,
and somatic mutations, which involve cells outside the dedicated reproductive group and
which are not usually transmitted to descendants. If the organism can reproduce asexually
through mechanisms such as cuttings or budding the distinction can become blurred.
For example, plants can sometimes transmit somatic mutations to their descendants asexually
or sexually where flower buds develop in somatically mutated parts of plants. A new mutation
that was not inherited from either parent is called a de novo mutation. The source of the
mutation is unrelated to the consequence[clarification needed], although the consequences are related
to which cells were mutated.
Nonlethal mutations accumulate within the gene pool and increase the amount of genetic
variation. The abundance of some genetic changes within the gene pool can be reduced by natural selection, while other "more favorable" mutations may accumulate and result in
adaptive evolutionary changes.
For example, a butterfly may produce offspring with new mutations. The majority of these
mutations will have no effect; but one might change the color of one of the butterfly's
offspring, making it harder (or easier) for predators to see. If this color change is
advantageous, the chance of this butterfly surviving and producing its own offspring are a
little better, and over time the number of butterflies with this mutation may form a larger
percentage of the population.
Neutral mutations are defined as mutations whose effects do not influence the fitness of an individual. These can accumulate over time due to genetic drift. It is believed that the
overwhelming majority of mutations have no significant effect on an organism's fitness. Also, DNA repair mechanisms are able to mend most changes before they become permanent
mutations, and many organisms have mechanisms for eliminating otherwise permanently mutated somatic cells.
Mutation is generally accepted by biologists as the mechanism by which natural selection
acts, generating advantageous new traits that survive and multiply in offspring as well as
disadvantageous traits, in less fit offspring, that tend to die out.
A mutation has caused this garden moss rose to produce flowers of different colours. This is a
somatic mutation that may also be passed on in the germ line.
32
Classification of mutation types
Illustrations of five types of chromosomal mutations.
33
Selection of disease-causing mutations, in a standard table of the genetic code of amino acids.
By effect on structure
The sequence of a gene can be altered in a number of ways. Gene mutations have varying
effects on health depending on where they occur and whether they alter the function of
essential proteins. Mutations in the structure of genes can be classified as:
• Small-scale mutations, such as those affecting a small gene in one or a few
nucleotides, including: o Point mutations, often caused by chemicals or malfunction of DNA
replication, exchange a single nucleotide for another. These changes are classified as transitions or transversions. Most common is the transition that
exchanges a purine for a purine (A ↔ G) or a pyrimidine for a pyrimidine, (C ↔ T). A transition can be caused by nitrous acid, base mis-pairing, or
mutagenic base analogs such as 5-bromo-2-deoxyuridine (BrdU). Less common is a transversion, which exchanges a purine for a pyrimidine or a
pyrimidine for a purine (C/T ↔ A/G). An example of a transversion is adenine
(A) being converted into a cytosine (C). A point mutation can be reversed by
another point mutation, in which the nucleotide is changed back to its original
state (true reversion) or by second-site reversion (a complementary mutation
elsewhere that results in regained gene functionality). Point mutations that
occur within the protein coding region of a gene may be classified into three
kinds, depending upon what the erroneous codon codes for:
� Silent mutations: which code for the same amino acid.
� Missense mutations: which code for a different amino acid.
34
� Nonsense mutations: which code for a stop and can truncate the
protein.
o Insertions add one or more extra nucleotides into the DNA. They are usually
caused by transposable elements, or errors during replication of repeating
elements (e.g. AT repeats[citation needed]
). Insertions in the coding region of a gene
may alter splicing of the mRNA (splice site mutation), or cause a shift in the
reading frame (frameshift), both of which can significantly alter the gene
product. Insertions can be reverted by excision of the transposable element.
o Deletions remove one or more nucleotides from the DNA. Like insertions,
these mutations can alter the reading frame of the gene. They are generally
irreversible: though exactly the same sequence might theoretically be restored
by an insertion, transposable elements able to revert a very short deletion (say 1–2 bases) in any location are either highly unlikely to exist or do not exist at
all. Note that a deletion is not the exact opposite of an insertion: the former is quite random while the latter consists of a specific sequence inserting at
locations that are not entirely random or even quite narrowly defined. • Large-scale mutations in chromosomal structure, including:
o Amplifications (or gene duplications) leading to multiple copies of all chromosomal regions, increasing the dosage of the genes located within them.
o Deletions of large chromosomal regions, leading to loss of the genes within
those regions.
o Mutations whose effect is to juxtapose previously separate pieces of DNA,
potentially bringing together separate genes to form functionally distinct fusion
genes (e.g. bcr-abl). These include:
� Chromosomal translocations: interchange of genetic parts from
nonhomologous chromosomes.
� Interstitial deletions: an intra-chromosomal deletion that removes a
segment of DNA from a single chromosome, thereby apposing
previously distant genes. For example, cells isolated from a human
astrocytoma, a type of brain tumor, were found to have a chromosomal
deletion removing sequences between the "fused in glioblastoma" (fig)
gene and the receptor tyrosine kinase "ros", producing a fusion protein (FIG-ROS). The abnormal FIG-ROS fusion protein has constitutively
active kinase activity that causes oncogenic transformation (a transformation from normal cells to cancer cells).
� Chromosomal inversions: reversing the orientation of a chromosomal segment.
o Loss of heterozygosity: loss of one allele, either by a deletion or recombination event, in an organism that previously had two different alleles.
By effect on function
• Loss-of-function mutations are the result of gene product having less or no function.
When the allele has a complete loss of function (null allele) it is often called an
amorphic mutation. Phenotypes associated with such mutations are most often
recessive. Exceptions are when the organism is haploid, or when the reduced dosage
of a normal gene product is not enough for a normal phenotype (this is called
haploinsufficiency).
• Gain-of-function mutations change the gene product such that it gains a new and abnormal function. These mutations usually have dominant phenotypes. Often called a
neomorphic mutation.
35
• Dominant negative mutations (also called antimorphic mutations) have an altered
gene product that acts antagonistically to the wild-type allele. These mutations usually
result in an altered molecular function (often inactive) and are characterised by a
dominant or semi-dominant phenotype. In humans, Marfan syndrome is an example of
a dominant negative mutation occurring in an autosomal dominant disease. In this
condition, the defective glycoprotein product of the fibrillin gene (FBN1) antagonizes
the product of the normal allele.
• Lethal mutations are mutations that lead to the death of the organisms which carry
the mutations.
• A back mutation or reversion is a point mutation that restores the original sequence
and hence the original phenotype.
By effect on fitness
In applied genetics it is usual to speak of mutations as either harmful or beneficial.
• A harmful mutation is a mutation that decreases the fitness of the organism. • A beneficial mutation is a mutation that increases fitness of the organism, or which
promotes traits that are desirable.
In theoretical population genetics, it is more usual to speak of such mutations as deleterious or
advantageous. In the neutral theory of molecular evolution, genetic drift is the basis for most variation at the molecular level.
• A neutral mutation has no harmful or beneficial effect on the organism. Such
mutations occur at a steady rate, forming the basis for the molecular clock. • A deleterious mutation has a negative effect on the phenotype, and thus decreases the
fitness of the organism. • An advantageous mutation has a positive effect on the phenotype, and thus increases
the fitness of the organism.
• A nearly neutral mutation is a mutation that may be slightly deleterious or
advantageous, although most nearly neutral mutations are slightly deleterious.
By inheritance
• inheritable generic in pro-generic tissue or cells on path to be changed to gametes.
• non inheritable somatic (eg, carcinogenic mutation)
• non inheritable post mortem aDNA mutation in decaying remains.
By pattern of inheritance
The human genome contains two copies of each gene – a paternal and a maternal allele.
• A heterozygous mutation is a mutation of only one allele.
• A homozygous mutation is an identical mutation of both the paternal and maternal
alleles.
• Compound heterozygous mutations or a genetic compound comprises two different
mutations in the paternal and maternal alleles.
• A wildtype or homozygous non-mutated organism is one in which neither allele is
mutated. (Just not a mutation)
36
By impact on protein sequence
• A frameshift mutation is a mutation caused by insertion or deletion of a number of
nucleotides that is not evenly divisible by three from a DNA sequence. Due to the triplet nature of gene expression by codons, the insertion or deletion can disrupt the
reading frame, or the grouping of the codons, resulting in a completely different
translation from the original. The earlier in the sequence the deletion or insertion
occurs, the more altered the protein produced is.
• A nonsense mutation is a point mutation in a sequence of DNA that results in a
premature stop codon, or a nonsense codon in the transcribed mRNA, and possibly a truncated, and often nonfunctional protein product.
• Missense mutations or nonsynonymous mutations are types of point mutations where
a single nucleotide is changed to cause substitution of a different amino acid. This in turn can render the resulting protein nonfunctional. Such mutations are responsible for
diseases such as Epidermolysis bullosa, sickle-cell disease, and SOD1 mediated ALS (Boillée 2006, p. 39).
• A neutral mutation is a mutation that occurs in an amino acid codon which results in the use of a different, but chemically similar, amino acid. The similarity between the
two is enough that little or no change is often rendered in the protein. For example, a change from AAA to AGA will encode lysine, a chemically similar molecule to the
intended arginine.
• Silent mutations are mutations that do not result in a change to the amino acid sequence of a protein. They may occur in a region that does not code for a protein, or
they may occur within a codon in a manner that does not alter the final amino acid sequence. The phrase silent mutation is often used interchangeably with the phrase
synonymous mutation; however, synonymous mutations are a subcategory of the
former, occurring only within exons. The name silent could be a misnomer. For
example, a silent mutation in the exon/intron border may lead to alternative splicing
by changing the splice site (see Splice site mutation), thereby leading to a changed
protein.
Special classes
• Conditional mutation is a mutation that has wild-type (or less severe) phenotype
under certain "permissive" environmental conditions and a mutant phenotype under certain "restrictive" conditions. For example, a temperature-sensitive mutation can
cause cell death at high temperature (restrictive condition), but might have no deleterious consequences at a lower temperature (permissive condition).
Causes of mutation
Two classes of mutations are spontaneous mutations (molecular decay) and induced
mutations caused by mutagens.
Spontaneous mutations on the molecular level include:
37
• Tautomerism – A base is changed by the repositioning of a hydrogen atom, altering
the hydrogen bonding pattern of that base resulting in incorrect base pairing during
replication.
• Depurination – Loss of a purine base (A or G) to form an apurinic site (AP site).
• Deamination – Hydrolysis changes a normal base to an atypical base containing a keto
group in place of the original amine group. Examples include C → U and A → HX
(hypoxanthine), which can be corrected by DNA repair mechanisms; and 5MeC (5-
methylcytosine) → T, which is less likely to be detected as a mutation because
thymine is a normal DNA base.
• Transition – A purine changes to another purine, or a pyrimidine to a pyrimidine.
• Transversion – A purine becomes a pyrimidine, or vice versa.
A covalent adduct between benzo[a]pyrene, the major mutagen in tobacco smoke, and DNA
Induced mutations on the molecular level can be caused by:
• Chemicals
o Hydroxylamine NH2OH o Base analogs (e.g. BrdU)
o Alkylating agents (e.g. N-ethyl-N-nitrosourea) These agents can mutate both
replicating and non-replicating DNA. In contrast, a base analog can only
mutate the DNA when the analog is incorporated in replicating the DNA. Each
of these classes of chemical mutagens has certain effects that then lead to
transitions, transversions, or deletions.
o Agents that form DNA adducts (e.g. ochratoxin A metabolites)
o DNA intercalating agents (e.g. ethidium bromide)
o DNA crosslinkers
o Oxidative damage
38
o Nitrous acid converts amine groups on A and C to diazo groups, altering their
hydrogen bonding patterns which leads to incorrect base pairing during
replication.
• Radiation
o Ultraviolet radiation (nonionizing radiation). Two nucleotide bases in DNA –
cytosine and thymine – are most vulnerable to radiation that can change their
properties. UV light can induce adjacent thymine bases in a DNA strand to pair
with each other, as a bulky dimer.
o Ionizing radiation
• Viral infections
DNA has so-called hotspots, where mutations occur up to 100 times more frequently than the normal mutation rate. A hotspot can be at an unusual base, e.g., 5-methylcytosine.
Mutation rates also vary across species. Evolutionary biologists have theorized that higher mutation rates are beneficial in some situations, because they allow organisms to
evolve and therefore adapt more quickly to their environments. For example, repeated exposure of bacteria to antibiotics, and selection of resistant mutants, can result in the
selection of bacteria that have a much higher mutation rate than the original population (mutator strains).
Nomenclature
Nomenclature of mutations specify the type of mutation and base or amino acid changes.
• Nucleotide substitution (e.g. 76A>T) - The number is the position of the nucleotide
from the 5' end, the first letter represents the wild type nucleotide, and the second
letter represents the nucleotide which replaced the wild type. In the given example, the
adenine at the 76th position was replaced by a thymine.
o If it becomes necessary to differentiate between mutations in genomic DNA,
mitochondrial DNA, and RNA, a simple convention is used. For example, if
the 100th base of a nucleotide sequence mutated from G to C, then it would be
written as g.100G>C if the mutation occurred in genomic DNA, m.100G>C if the mutation occurred in mitochondrial DNA, or r.100g>c if the mutation
occurred in RNA. Note that for mutations in RNA, the nucleotide code is written in lower case.
• Amino acid substitution (e.g. D111E) – The first letter is the one letter code of the wild type amino acid, the number is the position of the amino acid from the N
terminus, and the second letter is the one letter code of the amino acid present in the mutation. Nonsense mutations are represented with an X for the second amino acid
(e.g. D111X).
• Amino acid deletion (e.g. ∆F508) – The Greek letter ∆ (delta) indicates a deletion.
The letter refers to the amino acid present in the wild type and the number is the
position from the N terminus of the amino acid were it to be present as in the wild
type.
Harmful mutations
Changes in DNA caused by mutation can cause errors in protein sequence, creating partially or completely non-functional proteins. To function correctly, each cell depends on thousands
of proteins to function in the right places at the right times. When a mutation alters a protein that plays a critical role in the body, a medical condition can result. A condition caused by
mutations in one or more genes is called a genetic disorder. Some mutations alter a gene's
39
DNA base sequence but do not change the function of the protein made by the gene. Studies
of the fly Drosophila melanogaster suggest that if a mutation does change a protein, this will
probably be harmful, with about 70 percent of these mutations having damaging effects, and
the remainder being either neutral or weakly beneficial. However, studies in yeast have shown
that only 7% of mutations that are not in genes are harmful.
If a mutation is present in a germ cell, it can give rise to offspring that carries the mutation in
all of its cells. This is the case in hereditary diseases. On the other hand, a mutation may occur
in a somatic cell of an organism. Such mutations will be present in all descendants of this cell
within the same organism, and certain mutations can cause the cell to become malignant, and
thus cause cancer.
Often, gene mutations that could cause a genetic disorder are repaired by the DNA repair system of the cell. Each cell has a number of pathways through which enzymes
recognize and repair mistakes in DNA. Because DNA can be damaged or mutated in many ways, the process of DNA repair is an important way in which the body protects itself from
disease.
Beneficial mutations
Although most mutations that change protein sequences are harmful, some mutations have a positive effect on an organism. In this case, the mutation may enable the mutant organism to
withstand particular environmental stresses better than wild-type organisms, or reproduce more quickly. In these cases a mutation will tend to become more common in a population
through natural selection. For example, a specific 32 base pair deletion in human CCR5 (CCR5-∆32) confers
HIV resistance to homozygotes and delays AIDS onset in heterozygotes. The CCR5 mutation is more common in those of European descent. One possible explanation of the etiology of the
relatively high frequency of CCR5-∆32 in the European population is that it conferred resistance to the bubonic plague in mid-14th century Europe. People with this mutation were
more likely to survive infection; thus its frequency in the population increased. This theory
could explain why this mutation is not found in Africa, where the bubonic plague never
reached. A newer theory suggests that the selective pressure on the CCR5 Delta 32 mutation
was caused by smallpox instead of the bubonic plague.
Mutation subclasses
The following is a list of mutation subclasses that can fall into the three major classes of
mutation.
Morphological
Morphological mutants affect the outward appearance of an individual. Plant height mutations could changes a tall plant to a short one, or from having smooth to round seeds.
Biochemical
Biochemical mutations have a lesion in one specific step of an enzymatic pathway. For
bacteria, biochemical mutants need to be grown on a media supplemented with a specific nutrient. Such mutants are called auxotrophs. Often though, morphological mutants are the
direct result of a mutation in a biochemical pathway. In humans, albinism is the result of a mutation in the pathway from converts the amino acid tyrosine to the skin pigment melanin.
40
Similarly, cretinism results when the tyrosine to thyroxine pathway is mutated. Therefore, in a
strict genetic sense, if appropriate experiments are performed, a morphological mutation can
be explained at the biochemical level.
For some mutations to be expressed, the individual needs to be placed in a specific
environment. This is called the restrictive condition. But if the individual grow in any other
environment (permissive condition), the wild type phenotype is expressed. These are called
conditional mutations. Mutations that only expressed at a specific temperature (temperature
sensitive mutants), usually elevated, can be considered to be conditional mutations.
Lethal
Lethal mutations are mutations that lead to the death of the individual. Death does not have to occur immediately, it may take several months or even years. But if the expected longevity of
an individual is significantly reduced, the mutation is considered a lethal mutation.
Wild type alleles typically encode a product necessary for a specific biological function. If a
mutation occurs in that allele, the function for which it encodes is also lost. The general term
for these mutations is loss-of-function mutations. The degree to which the function is lost can
vary. If the function is entirely lost, the mutation is called a null mutation. If is also possible that some function may remain, but not at the level of the wild type allele. These are called
leaky mutations.
41
Gene Mutations at the Molecular Level (in brief)
42
43
Bacteriophage Lambda Gene Organization & Expression - Overview
Organization
The chromosome of bacteriophage lambda is organized more or less according to function:
The HEAD & TAIL genes code for the structural proteins of the bacteriophage capsid as well
as the terminase enzyme required to process rolling circle multimers into unit genome-length
pieces during packaging
The RECOMBINATION genes code for Int and Xis, which are required for integration of
the bacteriophage into the bacterial host chromosome during lysogenic growth and excision from the bacterial host chromosome during induction, as well as a number of other genes.
The REGULATION region includes the immunity region as well as the genes that are
responsible for controlling the switch between lysogenic and lytic growth. The Q
antiterminator protein, as well as the anti-Q RNA and PR' constitute a second regulation
region.
The REPLICATION region includes two replication protein genes O and P and the origin of
replication.
There are four genes in the LYSIS region.
Promoters
There are 7 promoters that are active at different stages of the bacteriophage lambda life cycles and which govern expression of bacteriophage lambda.
• PR expresses the replication genes as well as the anti-repressor, Cro, the
transcriptional activator CII, and the anti-terminator, Q protein.
• PL expresses the recombination genes as well as the anti-terminator, N, and the CIII
protein.
• PR' expresses the lysis proteins, and the head and tail proteins.
• PRE expresses the repressor gene, cI, to establish lysogeny.
• PRM expresses the repressor gene, cI, to maintain lysogeny.
• PI expresses the int gene to synthesize the Integrase protein.
• PaQ drives synthesis of a short anti-sense RNA which blocks translation of Q gene
mRNA.
44
Expression
In any bacteriophage (or viral) infection cycle, gene expression can be classified into 3
distinct phases. The first phase involves synthesis of proteins that will take over or hijack the
host cell. These proteins often include a phage-specific RNA polymerase. The second phase
involves replication of the bacteriophage. The third phase is the assembly and packaging of
mature bacteriophage capsids. Gene expression in the bacteriophage is generally coordinated
so that the appropriate proteins for each of these phases are synthesized at the appropriate
time.
The 3 phases in bacteriophage lambda are:
• very early • early
• either late lytic or late lysogenic
Note: the names given to the different life-cycle phases differ from phage to phage and virus to virus.
VERY EARLY EXPRESSION
Bacteriophage lambda has only three moderate or strong promoters that are recognized by the
host RNA polymerase. Transcription from PL causes expression of the anti-terminator protein,
N. Transcription from PR causes expression of the anti-repressor protein, Cro. Transcription from PR' pauses after a short distance and no protein is expressed.
EARLY EXPRESSION
The second phase of genes expression depends on the action of the N protein. N is an
antiterminator which causes expression from PR and PL to continue past transcription
terminators. Many genes are expressed including cII, cIII, O, P, and Q. CII and CIII favour
lysogenic growth; O & P are required for bacteriophage replication; and, Q favours lytic
growth by helping late lytic gene expression.
LATE LYTIC EXPRESSION
If the bacteriophage follows the lytic growth pathway, then the only genes expressed are the lysis genes and the genes coding for the head and tail proteins. The antiterminator protein, Q,
is required for expression of these genes. At the same time, Cro will prevent any further gene expression from PR or PL.
LATE LYSOGENIC EXPRESSION
If the bacteriophage follows the lysogenic growth pathway, then the only genes expressed are
int and cI. Once the bacteriophage has integrated into the bacterial host chromosome, cI is
the only gene that will continue to be expressed.