molecular biology basics

1

History of Genetics - Fine Structure of The Gene

The hereditary units which are transmitted from one generation to the

next generation are called genes. A gene is the fundamental biologic unit,

like the atom which is the fundamental physical unit. Mendel while

explaining the result of his monohybrid and dihybrid crosses, first of all

conceived of the genes as particulate units and referred them by various

names such as hereditary factors or hereditary elements. But his concept

about the gene was entirely hypothetical and he remained ignorant about the physical and chemical nature of gene.

Even before the rediscovery of Mendel’s laws, it was already established

that chromosomes have a definite role in the inheritance because it was

found that chromosomes were the only link between one generation and

the next generation and a diploid chromosome set consists of two

morphologically similar sets, one is derived from the mother and the other

from the father at fertilization. Later on, a parallel behaviour among chromosomes and genes was discovered.

Earlier workers proposed various hypotheses to explain the nature of genes. For instance, De Vries postulated one gene one character hypothesis according to which a particular trait of an individual is controlled by a particular gene. Bateson and Punnett proposed the presence or absence theory. According to them, in a cross the character which dominates the other has a determiner, while, the recessive character has no such determiner. But all the theories were discarded by Morgan, who produced the particulate gene theory in 1926. He considered genes as corpuscles which are arranged in a linear order on the chromosomes and appear like beads on a string. Each gene was supposed to be different from all others. The particulate theory of gene was widely accepted and supported by cytological observations. But, the discovery of DNA molecule as a sole carrier of genetic informations has altogether discarded the Morgan's theory. Therefore, before defining the gene it will be advisable consider the both classical as well as modern definitions of gene.

Morgan's introduction of the fruit fly to genetics revolutionized it because the fly's

rapid life cycle and minute size enabled the scale of experimentation to be markedly

increased. Contrast the whole year required between generations of peas and corn

with the two weeks needed for the fruit fly. This meant that in a short time over a

hundred characters had been studied and many mutants found. Drosophila became

as a result the most prominent "model organism" of genetics. Bridges oversaw and

maintained the growing stock of the mutant types and made them freely available

internationally. The lab at Columbia, known as the "Fly Room," was an example of

team effort, led by a genial, exuberant boss. Morgan had to undergo quite a

conversion by his team, but the outcome was a giant step forward in genetics,

crowned with the award of the Nobel Prize in 1933.

2

H. J. Muller was less close to Morgan than the others and did not long remain in the

group. Their views on genetics differed. Whereas Morgan was happy to leave to one

side the question of the material basis of the gene, Muller wanted to know the answer.

His pioneer work on the production of mutations by X rays not only won him the

Nobel Prize but offered him the hope of establishing the size of the gene. This

approach was used by the brilliant Russian geneticist N. V. Timoféeff-Ressovsky, in

Germany, to yield an estimate of the "sensitive volume" of the gene as that space

needed by one thousand atoms, or about the size of an average protein.

Unfortunately, as later work revealed, the methodology and interpretation of this

experiment proved faulty.

Drosophila was by no means the only model organism for genetics. In addition to

commercial cereal crops, poultry, mice, and yeast, the bread mold Neurospora

figures prominently in the development of the field. Using this organism, George

Beadle and Edward Tatum concluded that there is a 1:1 relation between a gene and a

given enzyme, thus suggesting that the primary product of a gene is an enzyme. But

for the fine structure analysis of the gene the model system that was to bring the

analysis down to the molecular level was the viral infected colon bacillus (Escherichia

coli). Here the bacterial virus (bacteriophage, or phage) has just one chromosome,

and in mixed infections this chromosome can recombine with one from another, thus

permitting recombination and making fine structure mapping possible. By 1957

Seymour Benzer had used this system to make an estimate of the likelihood of

crossing-over between two mutants one DNA base apart in the bacteriophage T4 to

be 1 in 10,000. His own analysis had then reached 1 in 20,000.

Split gene

Split gene is a gene whose continuity is interrupted. An interrupted gene (also called a split gene) is simply a strand of DNA that contains both introns and exons.

Most higher-level eukaryotes have interrupted genes and have longer introns than exons, creating a gene that is longer than its coding region. Interrupted genes are also found in some bacteria. Some eukaryotes, including yeast, have many uninterrupted regions, as they contain long stretches of exons that create necessary mRNA, leading to the development of proteins. This does not mean, however, they are fully uninterrupted, as tRNA synthesis requires excision of a nucleotide sequence, followed by ligation.

Onco gene

A gene that causes normal cell to become cancerous either because the gene is mutated or because the gene is expressed at the wrong time in development.

3

An oncogene is a gene that, when mutated or expressed at high levels, helps turn a normal cell

into a tumor cell. Many abnormal cells normally undergo a programmed form of death

(apoptosis). Activated oncogenes can cause those cells to survive and proliferate instead.

Most oncogenes require an additional step, such as mutations in another gene, or

environmental factors, such as viral infection, to cause cancer. Since the 1970s, dozens of

oncogenes have been identified in human cancer. Many cancer drugs target those DNA

sequences and their products.

Proto-oncogene A proto-oncogene is a normal gene that can become an oncogene due to mutations or

increased expression. Proto-oncogenes code for proteins that help to regulate cell growth and

differentiation. Proto-oncogenes are often involved in signal transduction and execution of mitogenic signals, usually through their protein products. Upon activation, a proto-oncogene

(or its product) becomes a tumor-inducing agent, an oncogene. Examples of proto-oncogenes include RAS, WNT, MYC, ERK, and TRK.

Activation

The proto-oncogene can become an oncogene by a relatively small modification of its original

function. There are three basic activation types:

*A mutation within a proto-oncogene can cause a change in the protein structure, causing

- an increase in protein (enzyme) activity

- a loss of regulation

* An increase in protein concentration, caused by

- an increase of protein expression (through misregulation)

- an increase of protein (mRNA) stability, prolonging its existence and thus its activity in the

cell

- a gene duplication (one type of chromosome abnormality), resulting in an increased amount

of protein in the cell

*A chromosomal translocation (another type of chromosome abnormality), causing

- an increased gene expression in the wrong cell type or at wrong times

- the expression of a constitutively active hybrid protein. This type of aberration in a dividing stem cell in the bone marrow leads to adult leukemia

Mutations in microRNAs can lead to activation of oncogenes. New research indicates that

small RNAs 21-25 nucleotides in length called microRNAs (miRNAs) can control expression of these genes by downregulating them. Antisense messenger RNAs could theoretically be

used to block the effects of oncogenes.

Classification There are several systems for classifying oncogenes, but there is not yet a widely accepted

standard. They are sometimes grouped both spatially (moving from outside the cell inwards)

and chronologically (parallelling the "normal" process of signal transduction). There are

several categories that are commonly used:

Category Examples Description

Growth factors, or mitogens

c-Sis

Usually secreted by specialized cells to

induce cell proliferation in themselves,

nearby cells, or distant cells. An oncogene

may cause a cell to secrete growth factors

4

even though it does not normally do so. It

will thereby induce its own uncontrolled

proliferation (autocrine loop), and

proliferation of neighboring cells. It may

also cause production of growth hormones

in other parts of the body.

Receptor tyrosine

kinases

epidermal growth factor

receptor (EGFR), platelet-

derived growth factor

receptor (PDGFR), and vascular endothelial growth

factor receptor (VEGFR), HER2/neu

Kinases add phosphate groups to other proteins to turn them on or off. Receptor

kinases add phosphate groups to receptor proteins at the surface of the cell (which

receive protein signals from outside the

cell and transmit them to the inside of the

cell). Tyrosine kinases add phosphate

groups to the amino acid tyrosine in the target protein. They can cause cancer by

turning the receptor permanently on (constitutively), even without signals from

outside the cell.

Cytoplasmic tyrosine kinases

Src-family, Syk-ZAP-70 family, and BTK family of

tyrosine kinases, the Abl

gene in CML - Philadelphia

chromosome

-

Cytoplasmic

Serine/threonine kinases and their

regulatory subunits

Raf kinase, and cyclin-

dependent kinases (through

overexpression).

-

Regulatory GTPases Ras protein

Ras is a small GTPase which hydrolyses

GTP into GDP and phosphate. Ras is activated by growth factor signaling (ie.

EGF, TGFalpha) and acting like a binary switch (on/off) in growth signaling

pathways. Downstream effectors of Ras include Raf, MEK, MEKK, MAPK, ERK,

most of which in turn regulate genes that mediate cell proliferation.

Transcription factors

myc gene -

Conversion of proto-oncogenes:

There are two mechanisms by which proto-oncogenes can be converted to cellular oncogenes:

Quantitative: Tumor formation is induced by an increase in the absolute number of proto-oncogene products or by its production in inappropriate cell types.

Qualitative: Conversion from proto-oncogene to transforming gene (c-onc) with changes in the nucleotide sequence which are responsible for the acquisition of the new properties.

5

History: The first oncogene was discovered in 1970 and was termed src (pronounced sarc as in

sarcoma). Src was in fact first discovered as an oncogene in a chicken retrovirus. Experiments

performed by Dr G. Steve Martin of the University of California, Berkeley demonstrated that

the SRC was indeed the oncogene of the virus.

In 1976 Drs. J. Michael Bishop and Harold E. Varmus of the University of California,

San Francisco demonstrated that oncogenes were defective proto-oncogenes, found in many

organisms including humans. For this discovery Bishop and Varmus were awarded the Nobel

Prize in 1989.

Pseudogenes

What are pseudogenes? Pseudogenes are genomic DNA sequences similar to normal genes but non-functional; they are regarded as defunct relatives of functional genes.

What causes pseudogenes to arise? There are two accepted processes during which pseudogenes may arise:

• duplication - modifications (mutations, insertions, deletions, frame shifts) to the DNA sequence of a gene can occur during duplication.

These disablements can result in loss of gene function at the

transcription or translation level (or both) since the sequence no

longer results in the production of a protein. Copies of genes that

are disabled in such a manner are termed non-processed or

duplicated pseudogenes. • retrotransposition - reverse transcription of an mRNA transcript with

subsequent re-integration of the cDNA into the genome. Such copies

of genes are termed processed pseudogenes. These pseudogenes

can also accumulate random disablements over the course of

evolution.

6

Why are pseudogenes interesting? In any study of molecular evolution, it is necessary to compare and contrast genes from a variety of organisms to gauge how the organisms have adapted to ensure their survival. Pseudogenes are vitally important since they provide a record of how the genomic DNA has been changed without such evolutionary pressure and can be used as a model for determining the underlying rates of nucleotide substitution, insertion and deletion in the greater genome.

7

How can a pseudogene be identified? Once gene sequences have been identified in the genome, it is possible to use sequence alignment programs (such as FASTA or BLAST) to detect matching regions in the nucleotide sequence. These matching regions are potential gene homologs and are termed pseudogenes if there is some evidence that either of the causes (see above) are satisfied.

In these analyses, genes from annotated genomes and protein databases have first been clustered into paralog families and then used to survey whole genomes for copies or homologs. For each potential pseudogene (or fragment) match, a number of steps have been taken to assess its validity as a pseudogene. These steps include checking for overcounting and repeat elements, overlap on the genomic DNA with other homologs and cross-referencing with exon assignments from genome annotations. The resulting pseudogenes or pseudogenic fragments have then been assigned to the paralog family of the most homologous gene (or assigned to a singleton gene if the probe gene has no obvious paralog).

Relating pseudogenes to known protein structures In a number of cases, more distant evolutionary and functional relationships between proteins can only be elucidated through the analysis of the folds that their structures adopt. While it must not be forgotten that the assignment of function to a gene is often implied from that of a gene with a homologous sequence, the added information that protein structures can provide is very desirable in genome annotation.

In the case of pseudogenes, structural information can give extra evolutionary clues and facilitate analysis of the scope of folds in the pseudogene population ("pseudo"-folds) in contrast to those observed for the genes themselves. Where possible, i.e. where a gene can be matched to a SCOP domain, assignment of fold to a pseudogene or pseudogenic fragment is based upon the assignment of the most homologous gene.

8

Wobble Hypothesis

The triplet code is a degenerate one with many more codons than the number of amino acid

types coded. An explanation for this degeneracy is provided by the 'wobble hypothesis'

proposed by Crick (1966). Since there are 61 codons specifying amino acids, the cell should

contain 61 different tRNA molecules, each with a different anticodon. Actually, however, the

number of tRNA molecule types discovered is much Jess than 61. This implies that the

anticodons of some tRNAs read more than one codon on mRNA. According to the wobble

hypothesis only the first two positions of a triplet codon on mRNA have a precise pairing with

the bases of the tRNA anticodon. The pairing of the third position bases of the codon may be

ambiguous, and varies according to the nucleotide present in this position. Thus a single

tRNA type is able to recognize two or more codons differing only in the third base.

The anticodon UCG of serine tRNA recognizes two codons, AGC and AGU.

The bonding between UCG and AGC follows the usual Watson-Crick pairing pattern. In

UCGAGU pairing, however, hydrogen bonding takes place between G and U.

This is a departure from the usual Watson-Crick pairing mechanism where G pairs with C and

A with U. Such interaction between the third bases is referred to as 'wobble pairing'.

The degeneracy of the code is not random. Mostly, the different co dons for a particular

amino acid have the same first two letters (leucine, serine and arginine are exceptions). Thus

the first two letters of all the four codons for valine are GU and for alanine GC.

Wobble base pair

In molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two

nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-

uracil, inosine-adenine, and inosine-cytosine (G-U, I-U, I-A and I-C). The thermodynamic

stability of a wobble base pair is comparable to that of a Watson-Crick base pair. Wobble

base pairs are fundamental in RNA secondary structure and are critical for the proper

translation of the genetic code.

9

Fig. Wobble base pairs for inosine and guanine

tRNA wobble

In the genetic code there are 43 = 64 possible codons (tri-nucleotide sequences). For

translation each of these codons requires a tRNA molecule with a complementary anticodon.

If each tRNA molecule paired with its complementary mRNA codon using canonical Watson-

Crick base pairing, then 64 types (species) of tRNA molecule would be required. Since most

organisms have fewer than 45 species of tRNA[1], some tRNA species must pair with more

than one codon. In 1966 Francis Crick proposed the Wobble hypothesis to account for this.

He postulated that the 5' base on the anticodon, which binds to the 3' base on the mRNA, was

not as spatially confined as the other two bases, and could thus have non-standard base

pairing.[2]

10

As an example yeast tRNAPhe

has the anticodon 5'-GmAA-3' and can recognize the codons 5'-

UUC-3' and 5'-UUU-3'. It is, therefore, possible for non-Watson–Crick base pairing to occur

at the third codon position; i.e. the 3' nucleotide of the mRNA codon and the 5' nucleotide of

the tRNA anticodon.

tRNA Base pairing schemes

The original wobble pairing rules, as proposed by Crick. Watson-Crick base pairs are shown

in bold, wobble base pairs in italic:

tRNA 5' anticodon base mRNA 3' codon base

A U

C G

G C or U

U A or G

I A or C or U

Revised pairing rules

tRNA 5' anticodon base mRNA 3' codon base

G U,C

C G

k2C A

A U,C,(A),G

unmodified U U,(C),A,G

xm5s2U,xm5Um,Um,xm5U A,(G)

xo5U U,A,G

I A,C,U

References

1. http://gtrnadb.ucsc.edu/

2. Crick F (1966). "Codon–anticodon pairing: the wobble hypothesis". J Mol Biol 19 (2):

548–55. PMID 5969078. http://profiles.nlm.nih.gov/SC/B/C/B/S/_/scbcbs.pdf.

• Varani G, McClain W (2000). "The G × U wobble base pair. A fundamental building block of RNA structure crucial to RNA function in diverse biological systems".

EMBO Rep 1 (1): 18–23. doi:10.1093/embo-reports/kvd001. PMID 11256617. http://www.nature.com/cgi-

taf/DynaPage.taf?file=/embor/journal/v1/n1/full/embor635.html

11

One Gene-One Polypeptide Hypothesis

In 1941, George Beadle and Edward Lawrie Tatum proposed the one gene-one enzyme

theory. The four main tenets of this theory (as modified by Tatum in 1959) were:

• All biochemical processes in all living organisms are under genetic control.

• All biochemical reactions in an organism are resolvable into separate steps.

• Each step or reaction is under the control of a single gene.

Mutation of a single gene results in the loss of function of the appropriate enzyme. In other

words, each gene controls the reproduction, function, and specificity of a particular enzyme.

The theory was based on results originally obtained from Neurospora crassa, a fungus that

was grown in a medium containing only the bare minimum of nutrients necessary (the fungus

being capable of manufacturing the rest). After inducing mutations in the mold using

radiation, some of the progeny were unable to grow on the medium. By testing with different

supplements, it was found that the mutants had lost the ability to manufacture a single amino

acid. By breeding the lab specimens with wild specimens, it was found that the mutation was

transmitted in a simple Mendelian fashion. It was assumed that the ability to synthesize the

appropriate amino acid was caused by the loss of a single enzyme. The work was supported

by similar evidence found in humans, plants, and Drosophila (genus of fruit fly).

The hypothesis was further modified in 1962 by Vernon Ingram, and from it, the one gene-

one polypeptide hypothesis was born. The modification arose from research conducted on

sickle cell anemia and sickle cell trait. In 1949, it was proposed that sickling was caused by a

single gene mutation, which was heterozygous in sickle cell trait individuals and homozygous

in individuals with full sickle cell anemia. Simultaneously, it was also noted that the

hemoglobin from normal individuals and that from sickle cell anemic individuals migrated

differently on an electrophoresis plate, illustrating that there was a physical difference in the

hemoglobin types and supporting the single gene mutation. A normal hemoglobin molecule is

made of four different polypeptide chains--two identical alpha chains and two identical beta

chains. All of the chains are approximately the same length, but they can be distinguished by

their chemical and electrophoretic properties. Each of these chains contains approximately

140 amino acids, and Ingram analyzed them using a modified form of Frederick Sanger's

protein analysis. This technique gave a fingerprint of the different hemoglobin types. The

fingerprint showed that the differences between the two types of hemoglobin could be found

in one peptide section of eight amino acids. When this section was isolated and analyzed, the

12

only difference was in one amino acid (glutamic acid in normal and valine in sickle cell

hemoglobin). The difference between these amino acids was one base in the triplet codon.

Further analysis showed that amino acid changes in one chain were independent of changes in

the other chain, suggesting that the genes determining the alpha and beta chains were located

at different loci. The alpha and beta chains show independent assortment.

From this, it can be seen that hemoglobin is composed of two independent gene products,

each of which is a separate polypeptide. The gene is a section of DNA that determines the

amino acid sequence of a polypeptide. One gene codes for one polypeptide and several

polypeptides may be required for a functional protein or enzyme.

13

TRANSCRIPTION AND TRANSLATION

Central dogma of genetic transmission.

Information flow (with the exception of reverse transcription) is from DNA to RNA via the

process of transcription, and thence to protein via translation.

This process can be divided into two parts:

1. Transcription Before the synthesis of a protein begins, the corresponding RNA molecule is produced by

RNA transcription. One strand of the DNA double helix is used as a template by the RNA

polymerase to synthesize a messenger RNA (mRNA). This mRNA migrates from the nucleus

to the cytoplasm. During this step, mRNA goes through different types of maturation

including one called splicing when the non-coding sequences are eliminated. The coding

mRNA sequence can be described as a unit of three nucleotides called a codon.

2. Translation

The ribosome binds to the mRNA at the start codon (AUG) that is recognized only by the

initiator tRNA. The ribosome proceeds to the elongation phase of protein synthesis. During

this stage, complexes, composed of an amino acid linked to tRNA, sequentially bind to the

appropriate codon in mRNA by forming complementary base pairs with the tRNA anticodon.

The ribosome moves from codon to codon along the mRNA. Amino acids are added one by

one, translated into polypeptidic sequences dictated by DNA and represented by mRNA. At

the end, a release factor binds to the stop codon, terminating translation and releasing the

complete polypeptide from the ribosome.

One specific amino acid can correspond to more than one codon. The genetic code is said to

be degenerate.

Genetic Code

DNA transfers information to mRNA in the form of a code defined by a sequence of

nucleotides bases. During protein synthesis, ribosomes move along the mRNA molecule and

"read" its sequence three nucleotides at a time (codon) from the 5' end to the 3' end. Each

amino acid is specified by the mRNA's codon, and then pairs with a sequence of three

complementary nucleotides carried by a particular tRNA (anticodon).

Since RNA is constructed from four types of nucleotides, there are 64 possible triplet

sequences or codons (4x4x4). Three of these possible codons specify the termination of the

polypeptide chain. They are called "stop codons". That leaves 61 codons to specify only 20

different amino acids. Therefore, most of the amino acids are represented by more than one

codon. The genetic code is said to be degenerate.

1. Genes (DNA) are transcribed into RNA by the enzyme RNA polymerase

• control by promoters

• control by regulatory proteins

• control by "cell state" (G1, G2, S, etc)

2. RNA transcripts are subjected to post-transcriptional modification and control

• rRNA transcript cut into appropriate size classes and initial assembly in

nucleolar organizer

14

• tRNA transcript folds into shape

• mRNA transcripts are capped at 5' end, polyA tail added to 3' end, noncoding

sequences (introns) removed from interior of transcript

• all RNA types must move to the cytoplasm via the nuclear membrane pores

3. mRNA molecules are translated by ribosomes (rRNA + ribosomal proteins) which

match the 3-base codons of the mRNA to the 3-base anticodons of the appropriate

tRNA molecules

• ribosomes initiate using the first AUG codon to start protein synthesis with

methionine

• message is read three bases at a time, consecutive codons, no commas, no

overlap

• peptide bonds are made between adjacent amino acids to produce a growing polypeptide chain

• when a "stop" codon (UAA, UGA, UAG) is encountered, translation ceases 4. Newly synthesized proteins are often modified after translation (post-translation)

• proteins undergo a final conformation adaptation in conjunction with chaperon proteins

• soluble proteins may have sugars added • secreted proteins must be synthesized through the membrane and the initial

portion of the protein is removed in the process

• multimeric proteins must assemble from subunit proteins

• some proteins (e.g. insulin) have a portion of the protein removed and

discarded

5. the protein carries out its function

The first step in gene action is making an RNA copy of the gene, a process called

transcription. A. Only one of the two DNA strands is copied. The copying is catalyzed by an

enzyme, RNA polymerase. Transcription begins at a specific point on the DNA and

terminates at a specific point. The initial product of transcription is called the primary

transcript. Some of the primary transcript codes for proteins and is also called pre-

messenger RNA. It is further processed in the nucleus in eukaryotes and then moves

to the cytoplasm to serve as messenger RNA.

B. The remainder of the primary transcript functions in protein synthesis but does not

code for amino acid sequences.

C. A gene locus contains the information both for the amino acid sequence of a

polypeptide chain and the instructions that regulate the amount of transcription. The 5'

flanking region is the site at which RNA polymerase binds (also called the promoter

region) and is the primary region that regulates the extent of transcription. Many other

proteins, called transcription factors, also bind in the promoter region and regulate

the amount of transcription.

D. In most eukaryotic genes, pre-mRNA has introns that are spliced out. The

remaining exons are joined together to form mRNA. In order to function properly, a cap and a poly-A tail must also be added.

E. When mRNA is read by ribosomes, the direction is 5' to 3'. The DNA strand with

the same nucleotide sequence as the mRNA is often called the sense strand. In transcription, the complementary 3' to 5' strand is used for the template and is

sometimes called the antisense strand. The sense strand is also called the coding

15

strand, since it has the same nucleotide sequence as mRNA (considering T and U to

be equivalent).

III. Translation is the process of converting the information contained in the nucleic acid

code (DNA and RNA) into the amino acid sequences of proteins.

A. Three types of RNA are involved.

1. Messenger RNA (mRNA) carries information for amino acid sequences

from nucleus to cytoplasm.

2. Ribosomal RNA (rRNA), along with various proteins, forms ribosomes.

There are two rRNA components of ribosomes: a large one and a small one.

There are hundreds of copies of the genes that code for rRNA.

3. Transfer RNA (tRNA) are small molecules that provide the key for

translating the nucleic acid code into the protein code.

1. Each tRNA molecule can bind only to one of the 20 amino acids.

That binding is catalyzed by a series of enzymes, each of which

recognizes one tRNA and one amino acid.

2. In protein synthesis, each tRNA is "charged" with its specific amino acid by means of a covalent bond involving the carboxyl group of the

amino acid.

3. Every cell must be able to make amino acyl-tRNAs for each of the

20 amino acids used to make proteins. Most cells make many more

than 20.

B. The genetic code refers to the nucleotide sequences that cause a particular amino

acid to be added to the growing polypeptide chain.

1. Such nucleotide sequences are called codons. A codon consists of a

sequence of three nucleotides. Since mRNA is read 5' —> 3', codons are also

always written in the 5' —> 3' direction.

2. There are 64 possible nucleotide triplets.

a. Sixty-one code for amino acids.

b. Three code for termination of synthesis. No amino acid is added.

c. One codon, AUG, codes for the initiation of synthesis, always with

the amino acid methionine (Met). AUG also codes for Met in the

interior of polypeptides.

d. Many amino acids are coded by several codons. Therefore, the

genetic code is degenerate.

e. All forms of life use the same genetic code; it is universal.

C. In polypeptide synthesis, a ribosome binds to the 5' end of the mRNA and moves

toward the 3' end.

1. The first AUG is the initiation codon and starts the assembly of the

polypeptide chain from the N-terminal end. Only a tRNA with a

complementary sequence (UAC) in the anticodon region will bind, starting the polypeptide with the amino acid Met.

16

2. The ribosome then moves to the next codon. The tRNA with the matching

anticodon binds. The covalent bond that the Met formed with its tRNA is

transferred to the amino group of the second amino acid, forming a peptide

bond.

3. With its amino acid gone, the first tRNA is now released and recycles by

picking up another Met.

4. The ribosome now moves to the third codon, and the process repeats.

5. If the ribosome encounters a termination codon (UAA, UAG, or UGA) as it moves toward the 3' end of the mRNA, the growing polypeptide chain is

released and synthesis is over.

6. The initiation codon sets the reading frame. Thereafter, nucleotides are

read three at a time.

Coding Example

DNA Sequence TAC ATG

CAC GTG

GTG CAC

GAC CTG

TGA ACT

GGA CCT

CTC GAG

CTC GAG

ACT TGA

mRNA Sequence AUG GUG CAC CUG ACU CCU GAG GAG UGA

Amino Acid

Sequence Met Val His Leu Thr Pro Glu Glu Stop

Protein Synthesis: Transcription and Translation

Review

Central Dogma of Molecular Biology

Protein synthesis requires two steps: transcription and translation.

17

DNA contains codes

Three bases in DNA code for one amino acid. The DNA code is copied to produce mRNA.

The order of amino acids in the polypeptide is determined by the sequence of 3-letter codes in

mRNA.

DNA vs RNA

DNA RNA

Sugar: deoxyribose ribose

Bonds with Adenine: thymine uracil

# of Strands: two one

Kinds of RNA

Messenger RNA (mRNA)

Messenger RNA contains genetic information. It is a copy of a portion of the DNA.

It carries genetic information from the gene (DNA) out of the nucleus, into the cytoplasm of

the cell where it is translated to produce protein.

Ribosomal RNA (rRNA)

This type of RNA is a structural component of the ribosomes. It does not contain a genetic

message.

Transfer RNA (tRNA)

Transfer RNA functions to transport amino acids to the ribosomes during protein synthesis.

18

Transcription

Transcription is the synthesis of mRNA from a DNA template.

It is like DNA replication in that a DNA strand is used to synthesize a strand of mRNA.

Only one strand of DNA is copied.

A single gene may be transcribed thousands of times.

After transcription, the DNA strands rejoin.

Steps involved in transcription

DNA unwinds.

RNA polymerase recognizes a specific base sequence in the DNA called a promoter and binds to it. The promoter identifies the start of a gene, which strand is to be copied, and the

direction that it is to be copied.

Complementary bases are assembled (U instead of T).

A termination code in the DNA indicates where transcription will stop.

The mRNA produced is called a mRNA transcript.

Processing the mRNA Transcript

In eukaryotic cells, the newly-formed mRNA transcript (also called heterogenous nuclear RNA or hnRNA) must be further modified before it can be used.

A cap is added to the 5’ end and a poly-A tail (150 to 200 Adenines) is added to the 3’end of

the molecule.

The newly-formed mRNA has regions that do not contain a genetic message. These regions

are called introns and must be removed. Their function is unknown.

19

The remaining portions of mRNA are called exons. They are spliced together to form a

mature mRNA transcript.

The Nucleus

DNA is located in an organelle called the nucleus.

Transcription and mRNA processing occur in the nucleus.

The nucleus is surrounded by a double membrane. After the mature mRNA transcript is

produced, it moves out of the nucleus and into the cytoplasm through pores in the nuclear

membrane.

Translation

Translation is the process where ribosomes synthesize proteins using the mature mRNA

transcript produced during transcription.

Overview

The diagram below shows a ribosome attach to mRNA, and then move along the mRNA

adding amino acids to the growing polypeptide chain.

20

Translation - Details

A mature mRNA transcript, a ribosome, several tRNA molecules and amino acids are shown.

There is a specific tRNA for each of the 20 different amino acids.

Below: A ribosome attaches to the mRNA transcript.

21

A tRNA molecule transports an amino acid to the ribosome. Notice that the 3-letter anticodon

on the tRNA molecule matches the 3-letter code (called a codon) in the mRNA. The tRNA with the anticodon "UAC" bonds with methionine. It always transports methionine. Transfer

RNA molecules with different anticodons transport other amino acids.

A second tRNA molecule bonds to the mRNA at the ribosome. Again, the codes must match.

22

A bond is formed between the two amino acids.

The tRNA bonded to methionine drops off and can be reused later.

23

The ribosome moves along the mRNA to expose another codon (GAU) for a tRNA molecule.

The only tRNA molecule that can bond to the GAU site is a molecule with a CUA anticodon. Transfer RNA molecules with CUA anticodons are specific for asparagine.

24

Asparagine is now added to the growing amino acid chain.

25

Initiation and Termination Codes

An initiation code signals the start of a genetic message. As the ribosome moves along a mRNA transcript, it will not begin synthesizing protein until it reaches an initiation code.

Termination codes signal the end of the genetic message. Synthesis stops when the ribosome

reaches a terminator codon.

Genetic Code

The table below can be used to determine what amino acid corresponds to any 3-letter codon.

Second Base First

Base U C A G

Third

Base

UUU

phenylalanine

UCU

serine

UAU

tyrosine

UGU

cysteine U

UUC phenylalanine

UCC serine

UAC tyrosine

UGC cysteine

C

UUA leucine

UCA serine

UAA stop

UGA stop

A

U

UUG

leucine

UCG

serine

UAG

stop

UGG

tryptophan G

CUU

leucine

CCU

proline

CAU

histidine

CGU

arginine

U

C

CUC

leucine

CCC

proline

CAC

histidine

CGC

arginine C

26

CUA

leucine

CCA

proline

CAA

glutamine

CGA

arginine A

CUG

leucine

CCG

proline

CAG

glutamine

CGG

arginine G

AUU

isoleucine

ACU

threonine

AAU

asparagine

AGU

serine U

AUC isoleucine

ACC threonine

AAC asparagine

AGC serine

C

AUA isoleucine

ACA threonine

AAA lysine

AGA arginine

A

A

AUG (start)

methionine

ACG

threonine

AAG

lysine

AGG

arginine G

GUU valine

GCU alanine

GAU aspartate

GGU glycine

U

GUC valine

GCC alanine

GAC aspartate

GGC glycine

C

GUA

valine

GCA

alanine

GAA

glutamate

GGA

glycine A

G

GUG

valine

GCG

alanine

GAG

glutamate

GGG

glycine G

Mutation

Mutations are changes in the DNA.

Frameshift

A frameshift mutation is usually severe, producing a completely nonfunctional protein.

The priniciple of a frameshift can be explained using the sentence below. If the letters are read

three at a time and one is deleted, the second sentence becomes meaningless.

Original DNA:

Frameshift mutation:

THE BIG RED ANT ATE ONE FAT BUG

THB IGR EDA NTA TEO NEF ATB UG?

Point Mutation

Point mutations involve a single nucleotide, thus a single amino acid.

In the sentence below, eliminating one letter does not change in the remaining three-letter

words and therefore may not cause a significant change in the meaning of the sentence.

27

Original DNA:

Point mutation:

THE BIG RED ANT ATE ONE FAT BUG

THA BIG RED ANT ATE ONE FAT BUG

Silent, Missense, and Nonsense Mutations

Three kinds of point mutations can occur. A mutation that results in an amino acid

substitution is called a missense mutation.

A mutation that results in a stop codon so that incomplete proteins are produced, it is called a

nonsense mutation.

A mutation that produces a functioning protein is called a silent mutation.

28

Mechanism of Reverse Transcription

After the RNA retrovirus enters a host cell, its genomic RNA will be transcribed into a double

stranded DNA and then integrated into the host DNA. The RNA to DNA transcription is called

reverse transcription.

29

Figure 4-J-1. Mechanism of reverse transcription. The entire process is catalyzed by reverse

transcriptase which has both DNA polymerase and RNase H activities.

1. A retrovirus-specific cellular tRNA hybridizes with a complementary region called the primer-binding site (PBS).

2. A DNA segment is extended from tRNA based on the sequence of the retroviral genomic RNA.

3. The viral R and U5 sequences are removed by RNase H. 4. First jump: DNA hybridizes with the remaining R sequence at the 3' end.

5. A DNA strand is extended from the 3' end. 6. Most viral RNA is removed by RNase H.

7. A second DNA strand is extended from the viral RNA.

8. Both tRNA and the remaining viral RNA are removed by RNase H.

9. Second jump: The PBS region of the second strand hybridizes with the PBS region of the

first strand.

10. Extension on both DNA strands. LTR stands for "long terminal repeat".

30

MUTATIONS

Mutations are changes in the DNA sequence of a cell's genome and are caused by radiation,

viruses, transposons and mutagenic chemicals, as well as errors that occur during meiosis or

DNA replication. They can also be induced by the organism itself, by cellular processes such

as hypermutation.

Mutation can result in several different types of change in DNA sequences; these can either

have no effect, alter the product of a gene, or prevent the gene from functioning. Studies in

the fly Drosophila melanogaster suggest that if a mutation changes a protein produced by a gene, this will probably be harmful, with about 70 percent of these mutations having

damaging effects, and the remainder being either neutral or weakly beneficial. Due to the damaging effects that mutations can have on cells, organisms have evolved mechanisms such

as DNA repair to remove mutations.[1]

Therefore, the optimal mutation rate for a species is a trade-off between costs of a high mutation rate, such as deleterious mutations, and the

metabolic costs of maintaining systems to reduce the mutation rate, such as DNA repair enzymes. Viruses that use RNA as their genetic material have rapid mutation rates, which can

be an advantage since these viruses will evolve constantly and rapidly, and thus evade the

defensive responses of e.g. the human immune system.

Mutations can involve large sections of DNA becoming duplicated, usually through genetic recombination.[8] These duplications are a major source of raw material for evolving new

genes, with tens to hundreds of genes duplicated in animal genomes every million years. Most genes belong to larger families of genes of shared ancestry. Novel genes are produced by

several methods, commonly through the duplication and mutation of an ancestral gene, or by recombining parts of different genes to form new combinations with new functions. Here,

domains act as modules, each with a particular and independent function, that can be mixed together to produce genes encoding new proteins with novel properties. For example, the

human eye uses four genes to make structures that sense light: three for color vision and one

for night vision; all four arose from a single ancestral gene. Another advantage of duplicating

a gene (or even an entire genome) is that this increases redundancy; this allows one gene in

the pair to acquire a new function while the other copy performs the original function. Other

types of mutation occasionally create new genes from previously noncoding DNA.

Changes in chromosome number may involve even larger mutations, where segments of the

DNA within chromosomes break and then rearrange. For example, two chromosomes in the

Homo genus fused to produce human chromosome 2; this fusion did not occur in the lineage

of the other apes, and they retain these separate chromosomes. In evolution, the most

important role of such chromosomal rearrangements may be to accelerate the divergence of a

population into new species by making populations less likely to interbreed, and thereby

preserving genetic differences between these populations.

Sequences of DNA that can move about the genome, such as transposons, make up a major

fraction of the genetic material of plants and animals, and may have been important in the

evolution of genomes. For example, more than a million copies of the Alu sequence are

present in the human genome, and these sequences have now been recruited to perform

functions such as regulating gene expression. Another effect of these mobile DNA sequences

is that when they move within a genome, they can mutate or delete existing genes and thereby

produce genetic diversity.

31

In multicellular organisms with dedicated reproductive cells, mutations can be subdivided into

germ line mutations, which can be passed on to descendants through their reproductive cells,

and somatic mutations, which involve cells outside the dedicated reproductive group and

which are not usually transmitted to descendants. If the organism can reproduce asexually

through mechanisms such as cuttings or budding the distinction can become blurred.

For example, plants can sometimes transmit somatic mutations to their descendants asexually

or sexually where flower buds develop in somatically mutated parts of plants. A new mutation

that was not inherited from either parent is called a de novo mutation. The source of the

mutation is unrelated to the consequence[clarification needed], although the consequences are related

to which cells were mutated.

Nonlethal mutations accumulate within the gene pool and increase the amount of genetic

variation. The abundance of some genetic changes within the gene pool can be reduced by natural selection, while other "more favorable" mutations may accumulate and result in

adaptive evolutionary changes.

For example, a butterfly may produce offspring with new mutations. The majority of these

mutations will have no effect; but one might change the color of one of the butterfly's

offspring, making it harder (or easier) for predators to see. If this color change is

advantageous, the chance of this butterfly surviving and producing its own offspring are a

little better, and over time the number of butterflies with this mutation may form a larger

percentage of the population.

Neutral mutations are defined as mutations whose effects do not influence the fitness of an individual. These can accumulate over time due to genetic drift. It is believed that the

overwhelming majority of mutations have no significant effect on an organism's fitness. Also, DNA repair mechanisms are able to mend most changes before they become permanent

mutations, and many organisms have mechanisms for eliminating otherwise permanently mutated somatic cells.

Mutation is generally accepted by biologists as the mechanism by which natural selection

acts, generating advantageous new traits that survive and multiply in offspring as well as

disadvantageous traits, in less fit offspring, that tend to die out.

A mutation has caused this garden moss rose to produce flowers of different colours. This is a

somatic mutation that may also be passed on in the germ line.

32

Classification of mutation types

Illustrations of five types of chromosomal mutations.

33

Selection of disease-causing mutations, in a standard table of the genetic code of amino acids.

By effect on structure

The sequence of a gene can be altered in a number of ways. Gene mutations have varying

effects on health depending on where they occur and whether they alter the function of

essential proteins. Mutations in the structure of genes can be classified as:

• Small-scale mutations, such as those affecting a small gene in one or a few

nucleotides, including: o Point mutations, often caused by chemicals or malfunction of DNA

replication, exchange a single nucleotide for another. These changes are classified as transitions or transversions. Most common is the transition that

exchanges a purine for a purine (A ↔ G) or a pyrimidine for a pyrimidine, (C ↔ T). A transition can be caused by nitrous acid, base mis-pairing, or

mutagenic base analogs such as 5-bromo-2-deoxyuridine (BrdU). Less common is a transversion, which exchanges a purine for a pyrimidine or a

pyrimidine for a purine (C/T ↔ A/G). An example of a transversion is adenine

(A) being converted into a cytosine (C). A point mutation can be reversed by

another point mutation, in which the nucleotide is changed back to its original

state (true reversion) or by second-site reversion (a complementary mutation

elsewhere that results in regained gene functionality). Point mutations that

occur within the protein coding region of a gene may be classified into three

kinds, depending upon what the erroneous codon codes for:

� Silent mutations: which code for the same amino acid.

� Missense mutations: which code for a different amino acid.

34

� Nonsense mutations: which code for a stop and can truncate the

protein.

o Insertions add one or more extra nucleotides into the DNA. They are usually

caused by transposable elements, or errors during replication of repeating

elements (e.g. AT repeats[citation needed]

). Insertions in the coding region of a gene

may alter splicing of the mRNA (splice site mutation), or cause a shift in the

reading frame (frameshift), both of which can significantly alter the gene

product. Insertions can be reverted by excision of the transposable element.

o Deletions remove one or more nucleotides from the DNA. Like insertions,

these mutations can alter the reading frame of the gene. They are generally

irreversible: though exactly the same sequence might theoretically be restored

by an insertion, transposable elements able to revert a very short deletion (say 1–2 bases) in any location are either highly unlikely to exist or do not exist at

all. Note that a deletion is not the exact opposite of an insertion: the former is quite random while the latter consists of a specific sequence inserting at

locations that are not entirely random or even quite narrowly defined. • Large-scale mutations in chromosomal structure, including:

o Amplifications (or gene duplications) leading to multiple copies of all chromosomal regions, increasing the dosage of the genes located within them.

o Deletions of large chromosomal regions, leading to loss of the genes within

those regions.

o Mutations whose effect is to juxtapose previously separate pieces of DNA,

potentially bringing together separate genes to form functionally distinct fusion

genes (e.g. bcr-abl). These include:

� Chromosomal translocations: interchange of genetic parts from

nonhomologous chromosomes.

� Interstitial deletions: an intra-chromosomal deletion that removes a

segment of DNA from a single chromosome, thereby apposing

previously distant genes. For example, cells isolated from a human

astrocytoma, a type of brain tumor, were found to have a chromosomal

deletion removing sequences between the "fused in glioblastoma" (fig)

gene and the receptor tyrosine kinase "ros", producing a fusion protein (FIG-ROS). The abnormal FIG-ROS fusion protein has constitutively

active kinase activity that causes oncogenic transformation (a transformation from normal cells to cancer cells).

� Chromosomal inversions: reversing the orientation of a chromosomal segment.

o Loss of heterozygosity: loss of one allele, either by a deletion or recombination event, in an organism that previously had two different alleles.

By effect on function

• Loss-of-function mutations are the result of gene product having less or no function.

When the allele has a complete loss of function (null allele) it is often called an

amorphic mutation. Phenotypes associated with such mutations are most often

recessive. Exceptions are when the organism is haploid, or when the reduced dosage

of a normal gene product is not enough for a normal phenotype (this is called

haploinsufficiency).

• Gain-of-function mutations change the gene product such that it gains a new and abnormal function. These mutations usually have dominant phenotypes. Often called a

neomorphic mutation.

35

• Dominant negative mutations (also called antimorphic mutations) have an altered

gene product that acts antagonistically to the wild-type allele. These mutations usually

result in an altered molecular function (often inactive) and are characterised by a

dominant or semi-dominant phenotype. In humans, Marfan syndrome is an example of

a dominant negative mutation occurring in an autosomal dominant disease. In this

condition, the defective glycoprotein product of the fibrillin gene (FBN1) antagonizes

the product of the normal allele.

• Lethal mutations are mutations that lead to the death of the organisms which carry

the mutations.

• A back mutation or reversion is a point mutation that restores the original sequence

and hence the original phenotype.

By effect on fitness

In applied genetics it is usual to speak of mutations as either harmful or beneficial.

• A harmful mutation is a mutation that decreases the fitness of the organism. • A beneficial mutation is a mutation that increases fitness of the organism, or which

promotes traits that are desirable.

In theoretical population genetics, it is more usual to speak of such mutations as deleterious or

advantageous. In the neutral theory of molecular evolution, genetic drift is the basis for most variation at the molecular level.

• A neutral mutation has no harmful or beneficial effect on the organism. Such

mutations occur at a steady rate, forming the basis for the molecular clock. • A deleterious mutation has a negative effect on the phenotype, and thus decreases the

fitness of the organism. • An advantageous mutation has a positive effect on the phenotype, and thus increases

the fitness of the organism.

• A nearly neutral mutation is a mutation that may be slightly deleterious or

advantageous, although most nearly neutral mutations are slightly deleterious.

By inheritance

• inheritable generic in pro-generic tissue or cells on path to be changed to gametes.

• non inheritable somatic (eg, carcinogenic mutation)

• non inheritable post mortem aDNA mutation in decaying remains.

By pattern of inheritance

The human genome contains two copies of each gene – a paternal and a maternal allele.

• A heterozygous mutation is a mutation of only one allele.

• A homozygous mutation is an identical mutation of both the paternal and maternal

alleles.

• Compound heterozygous mutations or a genetic compound comprises two different

mutations in the paternal and maternal alleles.

• A wildtype or homozygous non-mutated organism is one in which neither allele is

mutated. (Just not a mutation)

36

By impact on protein sequence

• A frameshift mutation is a mutation caused by insertion or deletion of a number of

nucleotides that is not evenly divisible by three from a DNA sequence. Due to the triplet nature of gene expression by codons, the insertion or deletion can disrupt the

reading frame, or the grouping of the codons, resulting in a completely different

translation from the original. The earlier in the sequence the deletion or insertion

occurs, the more altered the protein produced is.

• A nonsense mutation is a point mutation in a sequence of DNA that results in a

premature stop codon, or a nonsense codon in the transcribed mRNA, and possibly a truncated, and often nonfunctional protein product.

• Missense mutations or nonsynonymous mutations are types of point mutations where

a single nucleotide is changed to cause substitution of a different amino acid. This in turn can render the resulting protein nonfunctional. Such mutations are responsible for

diseases such as Epidermolysis bullosa, sickle-cell disease, and SOD1 mediated ALS (Boillée 2006, p. 39).

• A neutral mutation is a mutation that occurs in an amino acid codon which results in the use of a different, but chemically similar, amino acid. The similarity between the

two is enough that little or no change is often rendered in the protein. For example, a change from AAA to AGA will encode lysine, a chemically similar molecule to the

intended arginine.

• Silent mutations are mutations that do not result in a change to the amino acid sequence of a protein. They may occur in a region that does not code for a protein, or

they may occur within a codon in a manner that does not alter the final amino acid sequence. The phrase silent mutation is often used interchangeably with the phrase

synonymous mutation; however, synonymous mutations are a subcategory of the

former, occurring only within exons. The name silent could be a misnomer. For

example, a silent mutation in the exon/intron border may lead to alternative splicing

by changing the splice site (see Splice site mutation), thereby leading to a changed

protein.

Special classes

• Conditional mutation is a mutation that has wild-type (or less severe) phenotype

under certain "permissive" environmental conditions and a mutant phenotype under certain "restrictive" conditions. For example, a temperature-sensitive mutation can

cause cell death at high temperature (restrictive condition), but might have no deleterious consequences at a lower temperature (permissive condition).

Causes of mutation

Two classes of mutations are spontaneous mutations (molecular decay) and induced

mutations caused by mutagens.

Spontaneous mutations on the molecular level include:

37

• Tautomerism – A base is changed by the repositioning of a hydrogen atom, altering

the hydrogen bonding pattern of that base resulting in incorrect base pairing during

replication.

• Depurination – Loss of a purine base (A or G) to form an apurinic site (AP site).

• Deamination – Hydrolysis changes a normal base to an atypical base containing a keto

group in place of the original amine group. Examples include C → U and A → HX

(hypoxanthine), which can be corrected by DNA repair mechanisms; and 5MeC (5-

methylcytosine) → T, which is less likely to be detected as a mutation because

thymine is a normal DNA base.

• Transition – A purine changes to another purine, or a pyrimidine to a pyrimidine.

• Transversion – A purine becomes a pyrimidine, or vice versa.

A covalent adduct between benzo[a]pyrene, the major mutagen in tobacco smoke, and DNA

Induced mutations on the molecular level can be caused by:

• Chemicals

o Hydroxylamine NH2OH o Base analogs (e.g. BrdU)

o Alkylating agents (e.g. N-ethyl-N-nitrosourea) These agents can mutate both

replicating and non-replicating DNA. In contrast, a base analog can only

mutate the DNA when the analog is incorporated in replicating the DNA. Each

of these classes of chemical mutagens has certain effects that then lead to

transitions, transversions, or deletions.

o Agents that form DNA adducts (e.g. ochratoxin A metabolites)

o DNA intercalating agents (e.g. ethidium bromide)

o DNA crosslinkers

o Oxidative damage

38

o Nitrous acid converts amine groups on A and C to diazo groups, altering their

hydrogen bonding patterns which leads to incorrect base pairing during

replication.

• Radiation

o Ultraviolet radiation (nonionizing radiation). Two nucleotide bases in DNA –

cytosine and thymine – are most vulnerable to radiation that can change their

properties. UV light can induce adjacent thymine bases in a DNA strand to pair

with each other, as a bulky dimer.

o Ionizing radiation

• Viral infections

DNA has so-called hotspots, where mutations occur up to 100 times more frequently than the normal mutation rate. A hotspot can be at an unusual base, e.g., 5-methylcytosine.

Mutation rates also vary across species. Evolutionary biologists have theorized that higher mutation rates are beneficial in some situations, because they allow organisms to

evolve and therefore adapt more quickly to their environments. For example, repeated exposure of bacteria to antibiotics, and selection of resistant mutants, can result in the

selection of bacteria that have a much higher mutation rate than the original population (mutator strains).

Nomenclature

Nomenclature of mutations specify the type of mutation and base or amino acid changes.

• Nucleotide substitution (e.g. 76A>T) - The number is the position of the nucleotide

from the 5' end, the first letter represents the wild type nucleotide, and the second

letter represents the nucleotide which replaced the wild type. In the given example, the

adenine at the 76th position was replaced by a thymine.

o If it becomes necessary to differentiate between mutations in genomic DNA,

mitochondrial DNA, and RNA, a simple convention is used. For example, if

the 100th base of a nucleotide sequence mutated from G to C, then it would be

written as g.100G>C if the mutation occurred in genomic DNA, m.100G>C if the mutation occurred in mitochondrial DNA, or r.100g>c if the mutation

occurred in RNA. Note that for mutations in RNA, the nucleotide code is written in lower case.

• Amino acid substitution (e.g. D111E) – The first letter is the one letter code of the wild type amino acid, the number is the position of the amino acid from the N

terminus, and the second letter is the one letter code of the amino acid present in the mutation. Nonsense mutations are represented with an X for the second amino acid

(e.g. D111X).

• Amino acid deletion (e.g. ∆F508) – The Greek letter ∆ (delta) indicates a deletion.

The letter refers to the amino acid present in the wild type and the number is the

position from the N terminus of the amino acid were it to be present as in the wild

type.

Harmful mutations

Changes in DNA caused by mutation can cause errors in protein sequence, creating partially or completely non-functional proteins. To function correctly, each cell depends on thousands

of proteins to function in the right places at the right times. When a mutation alters a protein that plays a critical role in the body, a medical condition can result. A condition caused by

mutations in one or more genes is called a genetic disorder. Some mutations alter a gene's

39

DNA base sequence but do not change the function of the protein made by the gene. Studies

of the fly Drosophila melanogaster suggest that if a mutation does change a protein, this will

probably be harmful, with about 70 percent of these mutations having damaging effects, and

the remainder being either neutral or weakly beneficial. However, studies in yeast have shown

that only 7% of mutations that are not in genes are harmful.

If a mutation is present in a germ cell, it can give rise to offspring that carries the mutation in

all of its cells. This is the case in hereditary diseases. On the other hand, a mutation may occur

in a somatic cell of an organism. Such mutations will be present in all descendants of this cell

within the same organism, and certain mutations can cause the cell to become malignant, and

thus cause cancer.

Often, gene mutations that could cause a genetic disorder are repaired by the DNA repair system of the cell. Each cell has a number of pathways through which enzymes

recognize and repair mistakes in DNA. Because DNA can be damaged or mutated in many ways, the process of DNA repair is an important way in which the body protects itself from

disease.

Beneficial mutations

Although most mutations that change protein sequences are harmful, some mutations have a positive effect on an organism. In this case, the mutation may enable the mutant organism to

withstand particular environmental stresses better than wild-type organisms, or reproduce more quickly. In these cases a mutation will tend to become more common in a population

through natural selection. For example, a specific 32 base pair deletion in human CCR5 (CCR5-∆32) confers

HIV resistance to homozygotes and delays AIDS onset in heterozygotes. The CCR5 mutation is more common in those of European descent. One possible explanation of the etiology of the

relatively high frequency of CCR5-∆32 in the European population is that it conferred resistance to the bubonic plague in mid-14th century Europe. People with this mutation were

more likely to survive infection; thus its frequency in the population increased. This theory

could explain why this mutation is not found in Africa, where the bubonic plague never

reached. A newer theory suggests that the selective pressure on the CCR5 Delta 32 mutation

was caused by smallpox instead of the bubonic plague.

Mutation subclasses

The following is a list of mutation subclasses that can fall into the three major classes of

mutation.

Morphological

Morphological mutants affect the outward appearance of an individual. Plant height mutations could changes a tall plant to a short one, or from having smooth to round seeds.

Biochemical

Biochemical mutations have a lesion in one specific step of an enzymatic pathway. For

bacteria, biochemical mutants need to be grown on a media supplemented with a specific nutrient. Such mutants are called auxotrophs. Often though, morphological mutants are the

direct result of a mutation in a biochemical pathway. In humans, albinism is the result of a mutation in the pathway from converts the amino acid tyrosine to the skin pigment melanin.

40

Similarly, cretinism results when the tyrosine to thyroxine pathway is mutated. Therefore, in a

strict genetic sense, if appropriate experiments are performed, a morphological mutation can

be explained at the biochemical level.

For some mutations to be expressed, the individual needs to be placed in a specific

environment. This is called the restrictive condition. But if the individual grow in any other

environment (permissive condition), the wild type phenotype is expressed. These are called

conditional mutations. Mutations that only expressed at a specific temperature (temperature

sensitive mutants), usually elevated, can be considered to be conditional mutations.

Lethal

Lethal mutations are mutations that lead to the death of the individual. Death does not have to occur immediately, it may take several months or even years. But if the expected longevity of

an individual is significantly reduced, the mutation is considered a lethal mutation.

Wild type alleles typically encode a product necessary for a specific biological function. If a

mutation occurs in that allele, the function for which it encodes is also lost. The general term

for these mutations is loss-of-function mutations. The degree to which the function is lost can

vary. If the function is entirely lost, the mutation is called a null mutation. If is also possible that some function may remain, but not at the level of the wild type allele. These are called

leaky mutations.

41

Gene Mutations at the Molecular Level (in brief)

43

Bacteriophage Lambda Gene Organization & Expression - Overview

Organization

The chromosome of bacteriophage lambda is organized more or less according to function:

The HEAD & TAIL genes code for the structural proteins of the bacteriophage capsid as well

as the terminase enzyme required to process rolling circle multimers into unit genome-length

pieces during packaging

The RECOMBINATION genes code for Int and Xis, which are required for integration of

the bacteriophage into the bacterial host chromosome during lysogenic growth and excision from the bacterial host chromosome during induction, as well as a number of other genes.

The REGULATION region includes the immunity region as well as the genes that are

responsible for controlling the switch between lysogenic and lytic growth. The Q

antiterminator protein, as well as the anti-Q RNA and PR' constitute a second regulation

region.

The REPLICATION region includes two replication protein genes O and P and the origin of

replication.

There are four genes in the LYSIS region.

Promoters

There are 7 promoters that are active at different stages of the bacteriophage lambda life cycles and which govern expression of bacteriophage lambda.

• PR expresses the replication genes as well as the anti-repressor, Cro, the

transcriptional activator CII, and the anti-terminator, Q protein.

• PL expresses the recombination genes as well as the anti-terminator, N, and the CIII

protein.

• PR' expresses the lysis proteins, and the head and tail proteins.

• PRE expresses the repressor gene, cI, to establish lysogeny.

• PRM expresses the repressor gene, cI, to maintain lysogeny.

• PI expresses the int gene to synthesize the Integrase protein.

• PaQ drives synthesis of a short anti-sense RNA which blocks translation of Q gene

mRNA.

44

Expression

In any bacteriophage (or viral) infection cycle, gene expression can be classified into 3

distinct phases. The first phase involves synthesis of proteins that will take over or hijack the

host cell. These proteins often include a phage-specific RNA polymerase. The second phase

involves replication of the bacteriophage. The third phase is the assembly and packaging of

mature bacteriophage capsids. Gene expression in the bacteriophage is generally coordinated

so that the appropriate proteins for each of these phases are synthesized at the appropriate

time.

The 3 phases in bacteriophage lambda are:

• very early • early

• either late lytic or late lysogenic

Note: the names given to the different life-cycle phases differ from phage to phage and virus to virus.

VERY EARLY EXPRESSION

Bacteriophage lambda has only three moderate or strong promoters that are recognized by the

host RNA polymerase. Transcription from PL causes expression of the anti-terminator protein,

N. Transcription from PR causes expression of the anti-repressor protein, Cro. Transcription from PR' pauses after a short distance and no protein is expressed.

EARLY EXPRESSION

The second phase of genes expression depends on the action of the N protein. N is an

antiterminator which causes expression from PR and PL to continue past transcription

terminators. Many genes are expressed including cII, cIII, O, P, and Q. CII and CIII favour

lysogenic growth; O & P are required for bacteriophage replication; and, Q favours lytic

growth by helping late lytic gene expression.

LATE LYTIC EXPRESSION

If the bacteriophage follows the lytic growth pathway, then the only genes expressed are the lysis genes and the genes coding for the head and tail proteins. The antiterminator protein, Q,

is required for expression of these genes. At the same time, Cro will prevent any further gene expression from PR or PL.

LATE LYSOGENIC EXPRESSION

If the bacteriophage follows the lysogenic growth pathway, then the only genes expressed are

int and cI. Once the bacteriophage has integrated into the bacterial host chromosome, cI is

the only gene that will continue to be expressed.

molecular biology basics

Documents