u. n. dwivedi smita rastogi department of biotechnology...

72
MOLECULAR BIOLOGY Transcription U. N. Dwivedi Department of Biochemistry University of Lucknow, Lucknow-226007 and Smita Rastogi Department of Biotechnology, Integral University, Lucknow 20-Jul-2006 (Revised 25-Jan-2008) CONTENTS Introduction Transcription in prokaryotes (Synthesis of mRNA/rRNA/tRNA) Prokaryotic transcription apparatus RNA polymerase (RNA Pol) or DNA dependent RNA Polymerase Structure of RNA polymerase Synthesis of RNA in 5’ 3’ direction Requirement of Mg ++ Significance of σ subunit of RNA Pol Functions of RNA polymerase Fidelity of RNA synthesis Promoters Overall process of prokaryotic transcription Initiation Elongation Termination Transcription in eukaryotes Eukaryotic transcription apparatus RNA polymerase or DNA dependent RNA Polymerase (RNA Pol) Eukaryotic promoters Enhancers Transcription Factors Elongation factors Overall process of eukaryotic transcription Post transcriptional processing Post transcriptional processing of mRNA (maturation of mRNA) Post transcriptional processing of mRNA in prokaryotes Post transcriptional processing of mRNA in Eukaryotes Alternative mRNA processing Post transcriptional processing of tRNA and rRNA (maturation of tRNA and

Upload: phamque

Post on 23-Mar-2018

227 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

MOLECULAR BIOLOGY

Transcription

U. N. Dwivedi Department of Biochemistry

University of Lucknow, Lucknow-226007 and

Smita Rastogi Department of Biotechnology, Integral University, Lucknow

20-Jul-2006 (Revised 25-Jan-2008)

CONTENTS

IntroductionTranscription in prokaryotes (Synthesis of mRNA/rRNA/tRNA) Prokaryotic transcription apparatus

RNA polymerase (RNA Pol) or DNA dependent RNA PolymeraseStructure of RNA polymerase Synthesis of RNA in 5’ → 3’ direction Requirement of Mg++ Significance of σ subunit of RNA PolFunctions of RNA polymeraseFidelity of RNA synthesis

PromotersOverall process of prokaryotic transcription

InitiationElongationTermination

Transcription in eukaryotesEukaryotic transcription apparatus

RNA polymerase or DNA dependent RNA Polymerase (RNA Pol)Eukaryotic promotersEnhancersTranscription FactorsElongation factorsOverall process of eukaryotic transcription

Post transcriptional processingPost transcriptional processing of mRNA (maturation of mRNA)

Post transcriptional processing of mRNA in prokaryotesPost transcriptional processing of mRNA in EukaryotesAlternative mRNA processing

Post transcriptional processing of tRNA and rRNA (maturation of tRNA and

Page 2: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

rRNA)Post transcriptional processing of tRNAPost transcriptional processing of rRNA

Inhibitors of transcriptionRNA Pol binding inhibitors

DNA specific inhibitorsReverse transcriptase (RT) (RNA directed DNA polymerase)

Key words Synthesis of mRNA, rRNA and tRNA; Prokaryotic and eukaryotic RNA polymerases; Promoters; Transcription factors; Enhancers; Post transcriptional RNA processing: Capping; Splicing; Polyadenylation; Inhibition of transcription; Reverse transcriptase

2

Page 3: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Introduction

DNA stores genetic information in a stable form that can be readily replicated. However, the expression of this genetic information requires its flow from DNA to RNA to protein. The first step i.e. the conversion of DNA sequence information into RNA sequence information or more precisely the process of RNA synthesis according to the instructions of DNA template is called transcription. Before studying the details of transcription, few points that need mention are:

The two strands of double stranded DNA are coding strand and template strand. The coding strand of DNA has the same sequence as that of RNA transcript except for thymine (T) in place of uracil (U). The coding strand is also called sense or (+) strand. The template strand is also called antisense or (-) strand. The sequence of the template strand is the complement of the RNA transcript (Fig. 1).

Promoter5'

3' 5'Template, Minus (-), Antisense strand

Coding, Plus (+), Sense strand3'

Double stranded DNA

Fig. 1: Coding and non coding strands in a DNA

The first nucleotide of a transcribed DNA sequence is denoted as + 1 and is called start site.

The sequences towards the 5’ side of start site are referred to as upstream sequences and denoted with minus sign. The sequences towards the 3’ side are downstream sequences and denoted with plus sign. Thus, the second nucleotide downstream of + 1 site is + 2 and so on. The nucleotide preceding the start site is denoted as - 1 and so on. There is no 0 (zero) nucleotide. These designations refer to the coding strand of DNA. The coding strand for a particular gene may be located in either strand of a given DNA.

Different parts of the genome can be transcribed to different extents, choice of which part to transcribe and how extensively can be regulated by regulatory elements.

RNA synthesis occurs in 5’ → 3’ direction. Transcription in prokaryotes (Synthesisof mRNA/rRNA/tRNA)

RNA synthesis in prokaryotes, like all biological polymerization reactions takes place in three stages: Initiation, Elongation and Termination. The transcription is initiated by the binding of RNA polymerase to a specific DNA sequence called ‘promoter’, in a defined orientation, leading to the transcription of the same strand from that promoter. In order to study the transcription process, detailed information of RNA polymerase and promoter is important. The following sections deal with the properties and functions of RNA Pol and prokaryotic promoter.

3

Page 4: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Prokaryotic transcription apparatus

(1) RNA polymerase (RNA Pol) or DNA dependent RNA polymerase

RNA Pol is present in all prokaryotic cells and was first discovered in 1960 by Samuel Weiss and Jerard Hurwitz. In E. coli (eubacteria), a single type of RNA Pol appears to be responsible for almost all the synthesis of RNA such as mRNA, rRNA and tRNA. Various bacteriophages also encode RNA Pol that synthesizes only phage-specific RNAs. The RNA Pol moves along the template, synthesizing RNA starting from the promoter (described below) until it reaches a sequence called terminator. This action defines a transcription unit that extends from the promoter to the terminator and the immediate product of transcription is called primary transcript. The primary transcript is, however, almost always unstable, and is either degraded or cleaved to give the mature products, viz, mRNA / rRNA / tRNA.

(A) Structure of RNA polymerase

E. coli RNA Pol is a large multisubunit enzyme with a molecular weight of ~ 500 kD. It is one of the largest enzymes in the bacterial cell. The dimensions of the enzyme are 90 X 95 X 160 Å. The core RNA Pol of E. coli contains four types of subunits with a structure consisting of α2ββ’ω. The properties and functions of various subunits of RNA Pol are summarized in the Table 1. Another subunit called σ subunit binds only transiently to the core enzyme, forming a holoenzyme α2ββ’ωσ. E. coli has several σ factors which are summarized in Table 2. σ70 is used for general transcription while other σ factors are activated by specific environmental conditions. Thus, σ32, σ54, σ28 or F, σH, etc, are induced at the time of heat shock, nitrogen starvation, flagellar, shock respectively. E. coli RNA Pol has the overall shape of a ‘crab claw’, where the two ‘pincers’ are made up predominantly of two large subunits, namely β and β’ (Fig. 2). Further structural analysis shows that RNA Pol there is a ‘channel’ or ‘groove’ that allow DNA, RNA and ribonucleotides into and out of the enzyme’s active center cleft (Fig. 3). The channel for DNA lies at the interface of the β and β’ subunits. The NTP-uptake channel allows ribonucleotides to enter the active center. The RNA exit channel allows the growing RNA chain to leave the enzyme as it is synthesized during elongation. The downstream DNA (i.e. DNA ahead of the enzyme, yet to be transcribed) enters active center cleft in double stranded form through the downstream DNA channel (between the pincers). Within the active center cleft, the DNA strands separate from position +3. The non-template strand exits the active center cleft through the non-template strand (NT) channel and travels across the surface of the enzyme. The template strand, in contrast, follows a path through the active center cleft and exits through the template strand (T) channel. RNA Pol surrounds the DNA. The length of groove could hold 16 bp in bacterial enzyme and ~25 bp in eukaryotic enzyme.

4

Page 5: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Table 1: Properties and functions of subunits of E. coli RNA pol

S. No.

Gene coding for subunit

Product (subunit)

Size (kD)

Number of amino acid residues

Number of subunit per holoenzyme

Function Phases of transcription during which the subunit is required

1 rpo A α subunit 40 329 2 Function uncertain, Probably involved in enzyme assembly, promoter recognition at UP element, binding of some activators

All stages

2 rpo B β subunit 155 1342 1 Catalytic center (Phosphodiester bond formation)

All stages

3 rpo C β’ subunit 160 1407 1 Catalytic center (DNA template binding)

All stages

4 rpo Z ω subunit ~10 91 1 Unknown All stages

5 rpo D (rps D)

σ70 subunit 70 613 1 Promoter specificity, recognition and binding, RNA synthesis initiation (Increases binding efficiency at promoter, decreases non-specific binding, converts closed promoter complex to open promoter complex)

Only during initiation

5

Page 6: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Table 2: Types of E. coli σ factors

Promoter sequence S. No

Gene σ factor -35

sequence

Distance between -10 and -35 regions (bp)

-10 sequence Functions

1 rpo D σ70 TTGACA 16-18 TATAAT General 2 rpo H σ32 CCCTTGA

A 13-15 CCCGATNT Heat shock

3 rpo N σ54 CTGGNA 6 TTGCA Nitrogen starvation 4 fli A σ28 or

σFCTAAA 15 GCCGATAA Flagellar

5 sig H σH AGGANPuPu

11-12 GCTGAATCA Cytochrome biogenesis; Generation of potential nutrient sources; Transport, Cell wall metabolism important for competence and sporulation initiation

Upstream DNA

DNA enters jaws

RNA Pol movementRNA exit

Rudder

WallBridge

Nucleotides

Fig. 2: Crab claw structure of RNA Pol

6

Page 7: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

+1-10

-35

RNA exit channel

Upst D

nstream DNA

ream NA

Dow

NT channelT channel

Active siteβ’ pincer

β pincer β flap

Fig. 3: Channel structure of RNA Pol The map of the E. coli σ70 factor identifies four conserved regions, namely 1-4, which are further subdivided into sub-regions (Fig. 4). These sub-regions have different functions. The subregion 2.4 (also called -10 region or unwinding domain) confers specificity by recognizing -10 region of promoter, while subregion 4.2 (also called -35 region or recognition domain) provides binding energy by recognizing -35 region of the promoter. The details of other sub-regions are tabulated in Table 3.

1234

C N

Recognizes -35 region

Responsible for melting

Recognizes -10 region

Recognizes 'Extended -10' region

Fig. 4: Regions of σ factor and their functions

7

Page 8: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Table 3: Functions of regions and sub-regions of σ factor

S. No. Region Function / Properties of some sub-regions

1 1 The region 1 comprising of sub-regions 1.1 and 1.2 is present at the N-terminal end of σ factor;

This region is negatively charged and has regulatory function;

In free form of σ factor, sub-region 1.1 plays an autoinhibitory role by occluding its DNA binding domains (i.e., 2.4 and 4.2);

Association of σ factor with core enzyme changes conformation of σ factor leading to release of autoinhibition;

In holoenzyme, sub-region 1.1 being negatively charged occupies positively charged region in active center cleft of RNA Pol, thereby this sub-region acts as DNA mimic;

Upon melting of DNA, the sub-region 1.1 shifts by 20-50 Å and hence clears the DNA entry channel allowing DNA entry

2 2 The sub-regions 2.1 and 2.2 are highly conserved part of σ factor;

These are involved in interaction with core enzyme;

The sub-region 2.3 resembles protein that binds single stranded nucleic acid and is involved in melting reaction;

The sub-region 2.4 has α-helical structure that specifically recognizes -10 region (i.e., it determines specificity);

It is also called -10 region or unwinding domain of σ factor

3 3 The sub-region 3.1 binds intervening DNA sequence (distance between -10 and -35 regions i.e. ~75 Å)

When σ factor binds core enzyme, its N-terminal domain of sub-region 3.2 blocks RNA exit channel; It thus acts as molecular mimic of RNA; This is removed from RNA exit channel for elongation to occur;

Act of ejection of this sub-region from RNA exit channel takes several attempts and leads to abortive initiations

4 4 The sub-region 4.2 has α-helical structure that specifically recognizes -35 region of promoter;

It is also called -35 region or recognition domain of σ factor

(B) Synthesis of RNA in 5’ → 3’ direction

The results of labeling experiments with γ-32P substrates confirmed that RNA chains, like DNA chains grow in the 5’ → 3’ direction, which involves the movement of the enzyme RNA Pol in a 3’ → 5’ direction along the antisense DNA strand (template). So, the template DNA strand is copied in 3’→ 5’ direction and the 3’-OH group of the growing RNA chain attacks the α-P of the

8

Page 9: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

incoming rNTP.

For transcription RNA Pol requires DNA template, ribonucleoside triphosphates (rNTPs; viz. rATP, rGTP, rCTP and UTP), Mg++. There is no requirement of any primer. The enzyme is most active when bound to double stranded DNA, but only one of the two strands serve as a template. The 3’-OH group of the growing RNA chain attacks the α-P of the incoming NTP and releases pyrophosphate. This reaction is thermodynamically favorable and the subsequent degradation of the pyrophosphate to orthophosphate locks the reaction in the direction of RNA synthesis. The 5’ triphosphate group of the first residue in a nascent (newly formed) RNA molecule is not cleaved to release PPi, but remains intact throughout the transcription process. Thus, the reaction is driven by the release and subsequent hydrolysis of PPi as summarized in Scheme 1.

e

Rtepwthcstrthbindin (C

Tth

(NMP)n + NTP → (NMP)n+1 + PPi RNA Incoming Lengthened Pyrophosphat

ribonucleotide RNA where,

NMP signifies ribonucleoside monophosphate; n represents number of NMPs

Pyrophosphatase

PPi → 2PiPyrophosphate Orthophosphate

Scheme 1: Reaction catalyzed by RNA Pol

NA Pol requires that the initiating NTP be brought into its active site and held stably on its mplate whereas the next NTP is presented with correct geometry for chemistry of olymerization to occur. This is particularly difficult because RNA Pol starts most transcripts ith ‘A’ and that ribonucleotide binds the template nucleotide ‘T’ with only two H-bonds. Thus, e enzyme has to make specific interactions with the initiating NTP, holding it rigidly in the

orrect orientation to allow chemical attack on the incoming NTP. The requirement for such pecific interactions between the enzyme and the initiating NTP probably explains why most anscripts start with same nucleotide. The interactions are specific for that nucleotide (or A) and us only chains beginning with ‘A’ are held in a manner suitable for efficient initiation. It is

elieved that the interactions are provided by various parts of the RNA Pol holoenzyme, cluding part of σ. Consistent with this, in experiments using an RNA Pol containing a σ70

erivative lacking this part of σ, initiation requires much higher than normal concentrations of itiating it.

) Requirement of Mg++

he active site of the core enzyme is made up of regions from β and β’subunits which is found at e base of the pincers within a region called the ‘active center cleft’ and contains two metal ions

9

Page 10: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(Mg++) in its active form, consistent with the proposed ‘two metal ion catalytic mechanism for nucleotide addition’ proposed for all types of polymerases. One metal ion remains bound to the enzyme whereas the other appears to come in with the nucleoside tri phosphate and leave with the pyrophosphate. The β and β’ subunits extensively interact with one another, particularly at the base of the channel where the active site Mg++ ion is located. The β’ subunit binds a Zn++ ion via four cysteine residues that are invariant in prokaryotes but not in eukaryotes. Three conserved aspartate residues (Asp) of the enzyme participate in binding these metal ions. (D) Significance of σ subunit of RNA Pol

The transient binding of σ factor to the core enzyme is concerned specifically with promoter recognition. σ factor has domains that recognize the promoter sequence. σ alone does not bind to DNA because N-terminal region of σ behaves as an autoinhibition domain. It occludes the DNA binding domains when free thereby suppressing the activities of the DNA binding regions. When σ subunit binds to core enzyme (α2ββ’ω), it changes the conformation of σ factor so that the inhibition is released and the DNA binding domains can contact DNA.

Comparisons of the crystal structures of core enzyme and holoenzyme show that σ factor lies largely on the surface of the core enzyme. It has an elongated structure that extends past the DNA binding site. The σ subunit binds transiently to the core enzyme and directs the RNA Pol holoenzyme to specific binding sites on DNA where transcription begins. σ factor participates in initiation of RNA synthesis by formation of open complex. In contrast to -35 region, which simply provides binding energy to secure polymerase to the promoter, -10 region has a more elaborate role in transcription initiation, because it is within that element that DNA melting is initiated in the transition from the closed to open complex. Thus, the sub-region 2.4 (unwinding domain or -10 domain) of σ that interacts specifically with the -10 region of promoter is doing more than simply binding DNA, while the specific interactions of sub-region 4.2 (recognition domain or -35 domain) with -35 sequence of promoter just provides binding energy. In keeping with this expectation, the α-helix involved in recognition of the -10 region contains several aromatic amino acids that can interact with bases on the non-template strand in a manner that stabilizes the melted DNA Unwinding increases negative supercoiling of DNA. When the holoenzyme forms an open complex on DNA, the N-terminal σ domain is displaced from the active site. It swings 20-50 Å away and the two DNA binding regions separate by 15 Å, presumably to acquire a more elongated conformation appropriate for contacting DNA. The σ factor dissociates from the rest of the RNA Pol when RNA chain reaches 8-9 nucleotides in length. It is not necessary for elongation phase. When σ factor is released from core enzyme, it reverts to a general affinity for all DNA, irrespective of sequence, that suits it to continue transcription. It therefore becomes immediately available for use by another core enzyme. A change in association between σ and holoenzyme changes binding affinity for DNA so that core enzyme can move along DNA. RNA Pol encounters a dilemma in reconciling its needs for initiation with those for elongation. Initiation requires tight binding only to particular sequences (promoters), while elongation requires close association with all sequences that the enzyme encounters during transcription. This dilemma is solved by the reversible association between σ

10

Page 11: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

factor and core enzyme. σ factor is either released following initiation or changes its association with core enzyme so that it no longer participates in DNA binding. There is only 30% of the amount of σ factor present in the cell compared with core enzyme complexes. Therefore one-third of the polymerase complexes can exist as holoenzyme at any one time. Because there are fewer molecules of σ than of core enzyme, the utilization of core enzyme requires that σ recycles. This occurs immediately after initiation in about one third of cases, presumably σ and core dissociate at some later point in the other cases. Irrespective of the exact timing of its release from core enzyme, σ factor is involved only in initiation. After the release of σ factor from the RNA Pol, the core enzyme moves along the DNA synthesizing the growing RNA strand. The σ factor can then complex with a further core enzyme complex and reinitiate transcription. (E) Functions of RNA polymerase

RNA Pol performs multiple functions in the process of transcription:

Binding to DNA and recognition of promoters All sequence specific contacts that the holoenzyme makes with the DNA (with the -10 and -35 regions as well as the so-called ‘extended-10’ region just upstream of the -10 region) are mediated by the σ subunit via conserved residues. The binding of σ causes the core enzymes’ pincers to come together so as to narrow the channel between them by ~10 Å. The outer surface of the holoenzyme is almost uniformly negatively charged, whereas those surfaces presumed to interact with nucleic acids, particularly the inner walls of the main channel, are positively charged.

Melting of DNA This ‘melting’ occurs between positions -11 and +3, in relation to the transcription start site. The double helix reforms at -11 in the upstream DNA behind the enzyme. The β and β’ subunits contact DNA at many points downstream of the active site. They make several contacts with the coding strand in the region of the transcription bubble, thus stabilizing the separated single strands. The RNA is contacted largely in the region of the transcription bubble. As the enzyme moves along DNA, the base in the template strand at the start of the turn will be flipped to face the nucleotide entry site. The RNA-DNA hybrid is 9 bp long and the 5’ end of RNA is forced to leave the DNA when it hits a protein called rudder.

Once DNA has been melted, the individual strands have a flexible structure in the transcription bubble. This enables DNA to take its turn in the active site. But before transcription starts, the DNA double helix is a relatively rigid straight structure. This straight structure enters the polymerase without being blocked by the wall due to conformational shift that occur in enzyme. Adjacent to the wall is a clamp. In the free form of RNA Pol, this clamp swings away from the wall to allow DNA to follow a straight path through the enzyme. After DNA has been melted to create the transcription bubble, the clamp must swing back into position against the wall.

11

Page 12: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Selection of correct ribonucleotides It selects the correct ribonucleotide triphosphate and catalyzes the formation of a phosphodiester bond. This process is repeated many times as the enzyme moves unidirectionally along the DNA template. RNA Pol is completely processive, i.e., a transcript is synthesized from start to end by a single RNA Pol molecule.

Stabilization of single stranded regions It itself stabilizes single stranded regions.

Elongation It is involved in elongation. When RNA Pol forms initial elongation complex after the first 10 bp have been synthesized, the RNA Pol may lose σ factor and lose contacts from -35 and -55. At 15-20 bp, general elongation complex is formed and covers 30-40 bp. The elongating RNA Pol is a processive machine that synthesizes and proofreads RNA. DNA passes through the elongating enzyme in a manner very similar to its passage through the open complex. Thus, double stranded DNA enters the front of the enzyme between the pincers. At the opening of the catalytic cleft, the strands separate to follow different paths through the enzyme before exiting via their respective channels and reforming a double helix behind the elongating polymerase. Ribonucleotides enter the active site through their defined channel and are added to the growing RNA chain under the guidance of the template DNA strand. Only eight or nine nucleotides of the growing RNA chain remain base paired to the DNA template at any given time, the remainder of the RNA chain is peeled off and directed out of the enzyme through the RNA exit channel. RNA chain elongation requires that the double stranded DNA template be opened up at the point of RNA synthesis so that the template strand can be transcribed to its complementary RNA strand. In doing so, the RNA chain only transiently forms a short length of RNA-DNA hybrid duplex, as is indicated by the observation that transcription leaves the template duplex intact and yields single stranded RNA. The unpaired ‘bubble’ of DNA in the open initiation complex apparently travels along the DNA with RNA Pol. There are two ways this might occur: (i) If the RNA Pol followed the template strand in its helical path around the DNA, the DNA would build up little supercoiling because the DNA duplex would never be unwound by more than about a turn. However, the RNA transcript would wrap around the DNA, once per duplex turn. This model is implausible since it is unlikely that its DNA and RNA could be readily untangled. The RNA would not spontaneously unwind from the long and often circular DNA in any reasonable time and no known topoisomerase can accelerate this process. (ii) If the RNA Pol moves in a straight line while the DNA rotates, the RNA and DNA will not become entangled. Rather, the DNAs’ helical turn are pushed ahead of the advancing transcription bubble so as to more tightly wind the DNA ahead of the bubble (which promotes positive supercoiling) and the linking number of the entire DNA remains unchanged). This model is supported by the observations that the transcription of plasmids in E. coli causes their positive supercoiling in Gyrase mutants (which cannot relax positive supercoils) and their negative supercoiling in topoisomerase I mutants (which cannot relax negative supercoils). Infact, by tethering RNA Pol to a glass surface and allowing it to transcribe DNA that had been fluorescently labeled at one end, Kazuhiko Kinosita demonstrated, through fluorescence microscopy (using techniques similar to those showing that the F1F0ATPase is a rotary

12

Page 13: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

engine) that single DNA molecules rotated in the expected direction during transcription.

Proofreading In addition, RNA Pol carries out two proofreading functions as well. The first of these is called pyrophosphorilytic editing. In this, the enzyme uses its active site, in a simple back reaction, to catalyze the removal of an incorrectly inserted ribonucleotide, by reincorporation of PPi. The enzyme can then incorporate another ribonucleotide in its place in the growing RNA chain. Note that the enzyme can remove either correct or incorrect bases in this manner, but spends longer hovering over mismatches than matches and so removes the former more frequently. In the second proofreading mechanism, called hydrolytic editing, the polymerase back tracks by one or more nucleotides and cleaves the RNA product, removing the error containing sequence. Hydrolytic editing is stimulated by Gre factors, which as well as, enhancing hydrolytic editing function, also serve as elongation stimulating factors. That is, they ensure that polymerase elongates efficiently and helps overcome ‘arrest’ at sequences that are difficult to transcribe. This combination of functions is comparable to those imposed on the eukaryotic RNA Pol II by the transcription factor TFIIs. Another group of proteins, the Nus proteins, joins polymerase in the elongation phase and promotes, in still rather undefined ways, the process of elongation and termination.

Termination It detects termination signals that specify where a transcript ends. The length of RNA-DNA hybrid is determined by a structure within the enzyme that forces the RNA-DNA hybrid to separate, allowing the RNA chain to exit from the enzyme and the DNA chain to rejoin its DNA partner. The RNA product does not remain base paired to the template DNA strand, rather the enzyme displaces the growing chain only a few nucleotides behind where each ribonucleotide is added. Because this release follows so closely behind the site of polymerization, multiple RNA Pol molecules can transcribe the same gene at the same time, each following closely along behind another. Thus, a cell synthesizes large numbers of transcripts from a single gene (or other DNA sequence) in a short time.

Thus, RNA Pol has the facility to unwind and rewind DNA, to hold the separated strands of DNA and the RNA product, to catalyze the addition of ribonucleotides to the growing RNA chain and to adjust the difficulties in progressing by cleaving the RNA product and restarting RNA synthesis (with the assistance of some accessory factors). (F) Fidelity of RNA synthesis

Unlike DNA Pol, RNA Pol lacks a separate proof reading 3’→ 5’ exonuclease active site and hence error rate is high. Thus, in contrast with DNA Pol, RNA Pol does not correct the nascent polynucleotide chain. Consequently, the fidelity of transcription is much lower than that of replication. The error rate of RNA synthesis is of the order of one mistake per 104 or 105 nucleotides, about 105 times as high as that of DNA synthesis. The much lower fidelity of RNA synthesis can be tolerated because mistakes are not transmitted to progeny. Moreover, for most genes, many RNA transcripts are synthesized from a single gene and all RNAs are eventually degraded and replaced. A few defective transcripts are unlikely to be harmful to the cell than a mistake in the permanent information in DNA.

13

Page 14: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(2) Promoters

The promoter is the region of DNA where RNA polymerase binds to initiate transcription. The information for promoter function is provided directly by the DNA sequence; its structure is the signal for transcription. The promoter surrounds the first base pair that is transcribed into RNA, the start point. As the promoters are present on the same DNA molecule as genes being transcribed or regulated, these are called cis-acting elements. E. coli has about 2000 promoter sites in its 4.6 X 106 bp genome. There are different types of promoters in E. coli, but most prevalent one is σ70 promoter (standard promoter), which is dealt with in detail in the following discussion. (A) Consensus sequences

A comparison of many prokaryotic promoter sequences reveals RNA Pol binding sites. An essential nucleotide sequence, called conserved sequence, should be present in all the promoters. However, conserved sequence need not be necessarily conserved at every single position, some variation is permitted. Putative DNA recognition sites can be defined in terms of an idealized sequence that represents the base most often present at each position. A consensus sequence is defined by aligning all known examples so as to maximize their homology. For a sequence to be accepted as a consensus each particular base must be reasonably predominant at its position and most of the actual examples must be related to the consensus by rather few (1-2) substitutions.

The sequence of promoter in E. coli lack any extensive conservation of sequence over the 60 bp associated with RNA Pol. The sequence of much of the binding site is irrelevant. But some short stretches within the promoter are conserved and they are critical for its function. Bacterial promoters have following features: (i) Start point (+1 position): The initiating (+1) nucleotide is usually (>90% of the time) a purine nucleotide (A or G; A occurs more often than G). It is common for the start point to be the central base in the poorly conserved CAT or CGT sequence, but the conservation of the base triplet is not great enough to regard it as an obligatory signal. (ii) -10 sequence or Pribnow Box: The most conserved sequence recognizable in almost all promoters is a 6 bp long AT rich motif centered at ~10 nucleotides upstream of the start site. Because of its position, it is named as -10 sequence. This is also known as Pribnow Box (named after David Pribnow, who pointed out its existence in 1975). The center of the hexamer generally is close to 10 bp upstream of the start point; the distance varies in known promoters from -18 to -9. Its consensus is 5’TATAAT and its average can be summarized in the form 5’ T80A95T45A60A50T96 3’ where the subscript denotes the % occurrence of the most frequently found base, which in this case varies from 45-96%. If the frequency of occurrence indicates likely importance in binding RNA Pol, we would expect the initial highly conserved TA and the final almost completely conserved T in the -10 region to be the most important bases. The region is AT rich and hence low energy is required for strand separation at this region. A mutation in this region has been implicated to affect melting reaction. (iii) -35 sequence: A 6 bp long sequence centered at ~35 nucleotides upstream of the start site. The consensus is 5’TTGACA. In more detailed form the conservation is 5’ T82T84G78A65C54A45

14

Page 15: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

3’ where, the subscript denotes the % occurrence of the most frequently found base, which in this case varies from 45-84%. (iv) Distance between -10 and -35 sequences: The distance between these conserved sequences (-10 and -35 regions) is also very critical. It is between 16-19 bp in 90% of the promoters (a separation of 17 nucleotides is optimal). In the exceptions it is as little as 15 nucleotides and as large as 20 nucleotides. However, the actual sequence of this intervening DNA is unimportant. The distance represents a single turn of the helix, thereby providing appropriate separation for simultaneous interaction of σ factor with the two motifs (-10 and -35 sequences).

The promoters with the -10 and -35 sequences as 5’TATAAT and 5’TTGACA respectively are called standard promoters. These are recognized by σ70 subunit of RNA Pol. Individual promoters usually differ from the consensus at one or more positions. A typical bacterial promoter is represented in Fig. 5.

- 35 region [Recognition Domain]

+ 1 [Start site]

- 10 region [Pribnow Box] [Unwinding Domain]

5’……………TTGACA…..….16-18 bp…..…..TATAAT……….…Purine………...…..3’

Fig. 5: Constitution of a typical bacterial promoter

(v) Some other conserved sequences of σ70 promoters: σ70 promoters of some genes have additional consensus sequences such as: (a) Upstream promoter elements or UP elements: Richard Gourse discovered that promoters of certain highly expressed genes (for eg. genes encoding rRNA, the rrn genes) contain a third AT rich recognition element, called UP (upstream promoter) element and occurs between positions – 40 and – 60. This UP element binds the C-terminal domain (CTD) of the RNA Polymerase α subunit. UP elements stimulate transcription at promoters that contain them by providing additional specific interaction site between the RNA Pol and DNA. The efficiency with which an RNA Pol binds to a promoter and initiates transcription is determined in large measure by these sequences, the spacing between them and their distance from the transcription start site. The sequence of UP element is: 5’ NNAAAA/TA/TTA/TTTTTNNAAAANN (b) “Extended-10” element: Another class of σ70 promoters lack a -35 region and instead has a so called “extended -10” element. This comprises a standard -10 region with an additional short sequence element at its upstream end. These elements are recognized by the σ region of RNA Pol. Extra contacts made between polymerase and this additional sequence element compensate for the absence of a -35 region, for eg. gal genes of E. coli use such a promoter.

15

Page 16: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Various combinations of bacterial promoter elements are shown in Fig. 6.

- 35 - 10

~ 17 bp

+ 1

- 35 - 10 + 1 UP element

+ 1 - 10

“Extended-10”

Fig. 6: Combinations of bacterial promoter elements (B) Promoter efficiency

Promoters differ markedly in their efficacy. Depending upon the relatedness to the consensus sequences of the -10 and -35 sequences, the promoters are classified as strong promoters and weak promoters. Promoters with sequences closer to the consensus are generally stronger than those that match lesser. Strength of the promoter signifies the number of transcripts it can initiate in a given time. Genes with strong promoters are transcribed frequently, as often as every 2 minutes in E. coli. In contrast, genes with very weak promoters are transcribed about once in 10 minutes.

Mutation of a single base in either -10 or -35 sequences can alter promoter activity. Mutations in the -35 region usually affect initial binding of RNA Pol and mutations in the -10 region usually affect the melting reaction. (C) Supercoiling is an important feature regulating efficiency of promoters

Efficiency of some promoters is emphasized by the effects of supercoiling. Negative supercoiling increases the efficiency of some promoters by assisting the melting reaction by both prokaryotic and eukaryotic RNA Pol. As RNA Pol transcribes DNA unwinding and rewinding occurs. This requires that either the entire transcription complex rotates about the DNA or the DNA itself must rotate about its helical axis. The twin domain model for transcription illustrates the consequences of the rotation of the DNA. As RNA Pol pushes forward along the double helix, it generates positive supercoils (more tightly wound DNA) ahead and leaves negative supercoils (partially unwound DNA) behind. For each helical turn traversed by RNA Pol, +1 turn is generated ahead and -1 turn behind. Transcription therefore has a significant effect on the (local) structure of DNA. As a result, the enzyme gyrase, which introduces negative supercoils and topoisomerase I, which removes negative supercoils, are required to rectify the situation in

16

Page 17: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

front of and behind the polymerase, respectively. Inappropriate superhelicity in the DNA being transcribed halts transcription. Quite possibly the torsional tension in the DNA generated by negative superhelicity behind the transcription bubble is required to help drive the transcriptional process, whereas too much such tension prevents the opening and maintenance of the transcription bubble.

The dependence of a promoter on supercoiling is determined by its sequence. This would predict that some promoters have sequences that are easier to melt and are therefore less dependent on supercoiling, while others have more difficult sequences and have a greater need to be supercoiled. An alternative is that the location of the promoter might be important if different regions of the bacterial chromosome have different degrees of supercoiling. (D) Functions of promoter regions

The function of -35 sequences is to provide the signal for recognition by RNA polymerase, while the -10 sequence allows the promoter-polymerase complex to convert from ‘closed’ to ‘open’ form. Thus, -35 sequence comprise a ‘recognition domain’ while the -10 sequence comprises ‘unwinding domain’ of the promoter. The consensus sequence of the -10 site consists exclusively of AT base pairs, which assists the initial melting of DNA into single strands. The lower energy needed to disrupt AT base pairs as compared to GC base pairs, means that a stretch of AT pairs demands the minimum amount of energy for strand separation.

A typical promoter relies on its -35 and -10 sequences to be recognized by RNA Pol, but one or the other of these sequences can be absent from some (exceptional) promoters. In at least some of these cases, RNA Pol alone cannot recognize the promoter, and the reaction also requires ancillary proteins, which overcome the deficiency in intrinsic interaction between RNA Pol and the promoter. (E) Alternative promoter sequences

There are several alternative promoter sequences that are recognized by different σ subunits. These promoters have sequences that differ from the consensus sequence of a conventional or standard promoter. Some examples are listed in Fig. 7.

Heat Shock 5’……….…… CCCTTGAA………13-15 bp……… CCCGATNT….…..3’ Nitrogen starvation 5’.………....CTGGNA….6 bp….TTGCA………3’

Flagella 5’…………… CTAAA…………15 bp……… GCCGATAA……....3’

where, N can be any nucleotide

Fig. 7: Alternative promoter sequences

17

Page 18: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(3) Overall process of prokaryotic transcription

The process of transcription can be divided in three steps namely, Initiation, Elongation and Termination (Fig. 8).

DNA

Promoter

DNA++

RNA Pol

RNA Pol

RNA

Promoter recognition

Promoter binding(closed complex)

Promoter melting(open complex)

Initial transcription

Elongation afterabortive initiations & promoter clearance

Elongation

Termination,release of RNA & RNA Pol

Elon

gatio

nIn

itiat

ion

Term

inat

ion

RNA

RNA

+1

Fig. 8: The overall process of transcription in prokaryotes

18

Page 19: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(A) Initiation

Transcription begins with the insertion of the first ribonucleotide (usually a purine). The end of initiation is signified by promoter clearance, where the RNA Pol moves ahead (along the DNA template) from the promoter site without dissociating, freeing the promoter for further initiation events. Promoter clearance occurs only if the open promoter complex is stable and this usually follows a number of abortive initiations where short transcripts are generated. This is a general property of RNA Pol and appears to be required for denovo strand synthesis. Initiation is usually the rate-limiting step in transcription and is the primary level of gene regulation in both prokaryotes and eukaryotes. The pathway of transcription initiation consists of two major parts, binding and initiation, and each part has multiple steps, which are summarized below. RNA Pol recognizes the promoter region, leads to local unwinding at the site bound by RNA Pol and causes some abortive initiations. During this phase the RNA Pol remains stationary at the site of binding (i.e. promoter) and its conformation remains essentially the same. During this phase, the first ~8-9 nucleotides are added. The initiation phase ends when the enzyme succeeds in extending the RNA chain and clears the promoter. Regulatory proteins that bind to specific sequences near promoter sites and interact with RNA polymerase also markedly influence the frequency of transcription of many genes.

The initiating reaction is simply the coupling of two NTPs in the reaction given below:

ppp A + ppp N pppApN + PPi

Bacterial RNAs have 5’-triphosphate groups as was demonstrated by the incorporation of radioactive label into RNA when it was synthesized with [γ-32P] ATP. In such a case, only the 5’ terminus of the RNA can retain the label because the internal phosphodiester groups of RNA are derived from the α-phosphate groups of NTPs.

Initiation in transcription is further divided into discrete phases of DNA binding and initiation of RNA synthesis, which are described below: (i) Template and promoter recognition and formation of closed binary complex: The holoenzyme-promoter reaction starts by forming a closed binary complex. ”Closed” means that the DNA remains duplex. Initially, the σ subunit of the enzyme RNA Pol (σ subunit is involved in promoter selection) binds loosely and reversibly to duplex DNA and searches for the promoter sequence. This is the ‘closed binary complex’ or ‘closed promoter complex’ or ‘closed promoter-polymerase complex’. In E. coli, RNA Pol binding occurs within a region stretching ~50 bp before the transcription start site to ~20 bp beyond it. Because the formation of closed binary complex is reversible, it is usually described by equilibrium constant (KB). There is a wide range in values of the equilibrium constant for forming the closed complex. Formation of the closed complex is readily reversible and RNA Pol can as easily dissociate from the promoter as make the transition to the open complex. (ii) Formation of open binary complex or isomerization: The transition from the ‘closed promoter complex’ (in which DNA is double helical) to the ‘open promoter complex’ (in which

19

Page 20: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

a DNA segment is unwound) is an essential event in transcription. In the bacterial enzyme bearing σ70, this transition often termed isomerization, does not require energy derived from ATP hydrolysis and is instead the result of a spontaneous conformational change in the DNA-enzyme complex to a more energetically favorable form. Isomerization is essentially irreversible and once complete, typically guarantees that transcription will subsequently initiate (though regulation can still be imposed after this point in some cases).

Although RNA Pol can search for promoter sites when bound to double helical DNA, a segment of the helix must be unwound before synthesis can begin. A region of duplex DNA must be unpaired so that the nucleotides on one of its strands become accessible for base pairing with incoming ribonucleotides. When the correct sequence is recognized by RNA Pol holoenzyme, the DNA at the promoter site is intact and locally unwound (DNA melting). The series of events leading to formation of an open complex is called ‘tight binding’. Due to tight binding, the interaction between the RNA Pol holoenzyme and DNA becomes irreversible and the closed complex undergoes a transition to open complex. Thus, the closed complex is converted into an open complex by ‘melting’ of a short region of DNA within the sequence bound by the enzyme. This characterizes the ‘open binary complex’, ‘open promoter complex’ or ‘open promoter-polymerase complex’. Here, DNA strands separate locally over a distance of ~17 bp of DNA (from within the -10 region to position +2 or +3), which corresponds to 1.6 turns of the B-DNA helix. This opening frees the template strand to be available for base pairing with ribonucleotides. Unwinding increases the negative supercoiling of DNA. Negative supercoiling of circular DNA favors transcription of genes because it facilitates unwinding.

For strong promoters, conversion into an open binary complex is irreversible, so this reaction is described by a rate constant (k2). This reaction is fast. σ factor is involved in the DNA melting reaction. (iii) Formation of ternary complex (unstable) and Abortive initiations: The next step is to incorporate the first two nucleotides and then catalyze a phosphodiester bond formation between them. This generates a ternary complex that contains RNA as well as DNA and enzyme. The ribonucleotides are aligned on the template strand and joined together. The initiating ribonucleotide is usually a purine (A or G). RNA Pol makes specific interactions with the initiating purine, holding it rigidly in correct orientation to allow chemical attack on incoming NTP. The requirement for such specific interactions between the enzyme and the initiating NTP probably explains why most transcripts start with same nucleotide. The interactions are specific for that nucleotide (or A) and thus only chains beginning with ‘A’ are held in a manner suitable for efficient initiation. It is believed that the interactions are provided by various parts of the RNA Pol holoenzyme, including part of σ. Consistent with this, in experiments using an RNA Pol containing a σ70 derivative lacking this part of σ, initiation requires much higher than normal concentrations of initiating it. The region containing RNA Pol, DNA and nascent RNA is called a transcription bubble (called so because it contains a locally melted ‘bubble’ of DNA) or transcription complex. Formation of ternary complex is described by the rate constant ki; this is even faster than the rate constant k2.

Further nucleotides can be added without any enzyme movement to generate an RNA chain of up to 9 bases. Thus, RNA Pol forms an unstable ternary complex comprising of DNA-RNA hybrid helix (i.e. DNA template and short RNA) and RNA Pol holoenzyme. This RNA-DNA

20

Page 21: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

helix is thus ~8 bp long, which corresponds to about one complete turn of the double helix. The RNA-DNA hybrid also rotate each time a nucleotide is added so that 3’-OH end of RNA stays at the catalytic site of RNA Pol. Incorporation of first 9-10 ribonucleotides is a rather inefficient process. After each base is added, there is a certain probability that the enzyme will release the chain. At this stage the enzyme often releases short transcripts (each have less than ~10 ribonucleotides) and then starts synthesis of RNA again. Abortive initiations (i.e. synthesis of short RNA) probably involve synthesizing an RNA chain that fills the active site. If the RNA is released, the initiation is aborted and must start again. A cycle of abortive initiation usually occurs to generate a series of very short oligonucleotides.

Initiation is accomplished when the enzyme manages to move along the template to the next region of the DNA into the active site. The occurrence of a cycle of abortive initiations before the enzyme moves to the next phase is a general property of RNA Pol and appears to be required for denovo strand synthesis. (iv) Formation of ternary complex (stable) and Promoter clearance: Once an RNA Pol holoenzyme succeeds in synthesizing a nascent RNA chain of ~9-10 bases, i.e. when initiation succeeds, σ is no longer necessary. The enzyme makes the transition to the elongation ternary complex of core polymerase, DNA and nascent RNA. This involves a conformational change in polymerase that help it to grip the template more firmly converting the ternary complex to the elongation form. This conformational change is followed by movement of the RNA Pol away from the promoter site, without dissociating, thereby freeing the promoter (i.e. promoter clearance) for further initiation events. Thus, promoter clearance occurs only if the open complex is stable (stable ternary complex) and usually follows a number of abortive initiations. This signifies the end of the initiation phase and the transition to the elongation phase leading to the extension of RNA chain beyond 10 bases. The efficiency of promoter clearance is modulated by the nature of the first fifty or so bases in the transcribed region. The minimum value of the promoter clearance time (i.e. the time taken by the RNA Pol to leave the promoter so that another RNA Pol can initiate) is 1-2 sec, within which the RNA Pol establishes the maximum frequency of initiation as <1 event per sec. (B) Elongation

When the first ~9 nucleotides have been added, the transcribed template strand is ‘scrunched’ in the active site. The active site can hold a transcript of 6-9 nucleotides. The transcription bubble moves along DNA and the RNA chain is extended in the 5’ → 3’ direction (Fig. 9). As the RNA Pol holoenzyme clears the initiation site and enters the elongation phase of transcription, the σ subunit may either dissociate or remains associated with the core enzyme. It was discovered that σ factor is released after initiation. However, this may not be strictly true. Direct measurements of elongating RNA Pol complexes show that ~70% of them retain σ factor. Such a third of elongating polymerases lack σ, the original conclusion is certainly correct that it is not necessary for elongation. In those cases where it remains associated with core enzyme, the nature of the association has almost certainly changed. The core enzyme without σ binds more strongly to the DNA template. From this point onwards, the core enzyme undertakes RNA chain elongation beyond 10 bases. The core enzyme then

21

Page 22: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

moves along the template strand, opening (or unwinding) the DNA helix ahead of the site of polymerization (i.e. front or leading edge) so as to expose a new segment of the template in single stranded condition. During this time, subsequent ribonucleotides are added to the 3’ end of the growing RNA chain. Elongation involves the movement of the transcription bubble (a distance of 170 Å / second, corresponding to a rate of elongation of ~50 nucleotides / sec) by a disruption of DNA structure, in which the template strand of the transiently unwound region is paired with the nascent RNA at the growing point. As in the initiation phase, about 17 bp of DNA are unwound at a time throughout the elongation phase. It has been found that the RNA-DNA hybrid and the unwound region of DNA stay rather constant as RNA Pol moves along the DNA template, thereby indicating that the unwound DNA reseals (or rewinds) at the same rate behind (i.e. rear or trailing edge) the RNA Pol. The RNA-DNA hybrid must also rotate each time a nucleotide is added so that the 3’-OH end of the RNA stays at the catalytic site. When the RNA chain extends to 15-20 bases, the enzyme makes a further transition to form the complex that undertakes elongation and now it covers 30-40 bp (depending on the stage in elongation cycle).

Double helical DNA

Unwound DNA(17 bp opened)

RNA polymerase

Rewinding Unwinding

Nascent RNA

5'ppp

RNA -DNA hybrid

Coding strand

3' elongation site

Template strand

Movement of RNA polymerase

Fig. 9: Transcription bubble (C) Termination

Termination involves following steps: Cessation of formation of phosphodiester bonds Dissociation of RNA-DNA hybrid Rewinding of melted region of DNA Release of RNA Pol from DNA

Sequences called ‘terminators’ trigger the elongating polymerase to dissociate from the DNA and release the RNA chain it has made. E. coli has at least two classes of termination signals, one class relies on a protein factor called ρ (rho) and the other is ρ-independent. Both ρ dependent and independent terminators respond to a functioning signal that lies within the newly

22

Page 23: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

synthesized RNA rather than in template DNA. In both types of termination, pausing by RNA Pol is important in order to allow time for actual termination event to occur. (i) ρ-independent (intrinsic) termination: Many terminators require a hairpin to form in the secondary structure of the RNA being transcribed. This indicates that termination depends on the RNA product and is not determined simply by scrutiny of DNA sequence during transcription. ρ-independent terminators (Intrinsic terminators) have two structural features:

A hairpin in secondary structure The first feature is a region that produces an RNA transcript with self-complementary sequences, permitting the formation of a hairpin structure centered 15-20 nucleotides before the projected end of the RNA strand. Formation of the hairpin structure in the RNA disrupts several AU base pairs in the RNA-DNA hybrid segment. This also pauses RNA Pol immediately after it has synthesized a stretch of RNA that folds into a hairpin and disrupts important interactions between RNA and the RNA Pol, thereby facilitating dissociation of the transcript. Hairpin usually contains a GC rich region near base of stem. The typical distance between hairpin and U rich region is 7-9 bases. There are ~1100 sequences in E. coli genome that fit this criterion, suggesting that half of the genes have intrinsic terminator.

A region that is rich in U residues at the very end of the unit

The hairpin only works as an efficient terminator when it is followed by a stretch (4 or more AU) of AU base pairs. This is because under those circumstances, at the time the hairpin forms, the growing RNA chain will be held on the template at the active site by only AU base pairs. As AU base pairs, the weakest of all base pairs, (weaker even than AT base pairs), are more easily disrupted by the effects of the stem loop on the transcribing polymerase and so the RNA will more readily dissociate (Fig. 10).

UUUUUUU

GC rich region

UUUUUUU

UUUUUUU

GC rich

Formation of HairpinmRNA GC rich

region

mRNADouble stranded DNA

Template strand

Coding strand

+

Rho independent termination

Fig. 10: Rho (ρ) independent (intrinsic) termination

23

Page 24: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(ii) ρ-dependent termination: As already discussed, RNA Pol needs no help to terminate transcription at a hairpin followed by several U residues. At other sites, however, termination requires the participation of additional factor. This discovery was prompted by the observation that some RNA molecules synthesized in vitro by RNA Pol acting alone are longer than those made in vivo. The missing factor, a protein that caused the correct termination, was isolated and named rho (ρ), also called rho transcription terminator factor. Additional information about the action of the rho was obtained by adding this termination factor to an incubation mixture at various times after the initiation of RNA synthesis. RNAs with sedimentation coefficients of 10S, 13S and 17S were obtained when rho was added at initiation, a few seconds after initiation and 2 minutes after initiation, respectively. If no rho was added, transcription yielded a 23S RNA product. It is evident that the template contains at least three termination sites that respond to rho (yielding 10S, 13S and 17S RNA) and one termination site that does not (yielding 23S RNA). Thus, specific termination at a site producing 23S RNA can occur in the absence of rho. However, ρ detects additional termination signals that are not recognized by RNA Pol alone (Fig. 11).

RNA Transcripts

ρ (Rho) sites (Indicated by arrows)

Rho present at start of synthesis (10S species) Rho added 30 sec later (13S species) Rho added 2 min later (17S species)

No Rho (23S species)

Termination in absence of ρ Initiation DNA Template

Fig. 11: Effect of Rho protein on the size of the transcript The ρ-dependent terminators lack the sequence of repeated ‘A’ residues in the template strand but usually include a CA rich sequence called a ‘rut’ (rho utilization) element. Optimally these sites consist of stretches of about 40 nucleotides that do not fold into a secondary structure i.e. they remain largely single stranded. They are also C rich. The second level of specificity is that rho fails to bind any transcript that is being translated i.e. transcript bound to ribonucleotides. In bacteria transcription and translation are coupled tightly, translation initiates on growing RNA transcript as soon as they start exiting polymerase, while they are still being synthesized. Thus, rho typically terminates only those transcripts still being transcribed beyond the end of a gene or operon.

24

Page 25: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

ρ is a homo-hexameric terminator protein with a size of ~275 kD (each subunit size is 419 residues). The X-ray structure of ρ protein reveal that the six monomers form an open ring. The ring is not flat. The sixth subunit is further down in the plane of the page than the first. Its first and sixth subunits are separated by a gap of 12 Å and the helical pitch (rise along the helix axis) between them is 45 Å. The RNA transcript on which ρ acts, is believed to bind along the bottom of each subunit and then thread through the middle of the ring. Each ρ subunit consists of two domains that can be separated by proteolysis: Its N-terminal domain or RNA binding domain binds single stranded polynucleotides and its C-terminal domain or ATP-hydrolysis domain, which is homologous to the α and β subunits of the F1-ATPase, binds an NTP. It hydrolyzes ATP in the presence of single stranded RNA, probably through recognition of a specific structural feature rather than a consensus sequence. The RNA, which is only partially visible in the structure, binds to the so-called primary RNA binding sites on the N-terminal domains that face the interior of the helix and to the so-called secondary RNA binding sites on the C-terminal domain that have been implicated in mRNA translocation and unwinding. The ρ protein has an ATP-dependent RNA-DNA helicase activity. It binds to nascent RNA at specific binding sites or recognition sequences (Fig. 12). It then uses its RNA-dependent ATPase activity to provide the energy to translocate along the RNA in the 5’ → 3’ direction to a sequence that is rich in C and poor in G residues preceding the actual termination site. C is by far the most common base (41%) and G is the least common base (14%). As a general rule, the efficiency of ρ-dependent terminators increases with the length of C-rich or G-poor region. Rho hydrolyzes ATP in presence of single stranded RNA, probably through recognition of a specific structural feature rather than a consensus sequence.

RNA polymerase RNA

5'ppp

RNA -DNA hybrid

Coding strand

Template strand

Rho protein

Mechanism of the termination of transcription by rho protein

ATP + H O2ADP + Pi

Fig. 12: Mechanism of Rho dependent termination

25

Page 26: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Proteins, in addition to ρ, mediate and modulate termination. For eg. Nus A protein enables RNA Pol in E. coli to recognize a characteristic class of termination sites. In E. coli, specialized termination signals called attenuators are regulated to meet the nutritional needs of the cell. Transcription in eukaryotes

Robert Roeder and William Rutter discovered that eukaryotic transcription machinery is much more complex as compared to that of prokaryotes, as large number of polypeptides are associated with the eukaryotic transcription machinery. The mechanism of eukaryotic transcription is, however, similar to that in prokaryotes.

Unlike in bacteria, eukaryotic genome is packaged into the chromatin structure (nucleosomal structure) and therefore is inaccessible to the transcription machinery. Prior to transcription of a specific gene, its chromatin structure is modified to become more accessible to the transcription apparatus. The two most well understood mechanisms of chromatin modifications are:

(i) Specific modifying complexes: Many eukaryotic gene activator proteins modify chromatin structures by recruiting histone acetyltransferases.

(ii) Nucleosome remodeling by chromatin remodeling complexes. Acetylation and remodeling prepares the gene promoter to initiation assembly of RNA Pol, other accessory proteins and gene specific transcription factors to initiate the transcription process. In transcription, only some regions of the genome are transcribed and the regions chosen vary in different cells or in the same cell at different times i.e. one to several thousand transcripts can be made of a given region in a single cell.

Eukaryotic transcription apparatus

Eukaryotic transcription machinery involves three RNA polymerases, number of general transcription factors, several elongation factors and large repertoire of gene specific transcription factors and activators. Furthermore, the entire transcription machinery is coupled with an enormously complex signal transduction cascade that integrates the external stimuli with the transcription machinery.

(1) RNA polymerase or DNA dependent RNA polymerase (RNA Pol)

Eukaryotic cells have three kinds of nuclear RNA polymerases, RNA Pol I, II and III. These are distinct complexes but have certain subunits in common. Each RNA Pol is large and has 12 or more different subunits. In S. cerevisiae, RNA Pol I, II and III have 14, 12 and 17 subunits respectively. While some of these subunits are exclusive for one RNA Pol, others are either identical or structurally related. Each polymerase has a specific function and is recruited to a specific promoter sequence. They differ in their template specificity and location in the nucleus. Although all eukaryotic RNA Pols are homologous to one another and to prokaryotic RNA Pol, RNA Pol II contains a unique carboxyl terminal domain called ‘tail’. Another major distinction

26

Page 27: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

among the polymerases lies in their responses to the fungal toxin α-amanitin, a cyclic octapeptide that contains several modified amino acids. The activities of different RNA Pols are distinguished by their different sensitivities to the toxin. Properties of different eukaryotic RNA polymerases have been summarized in Table 4. In addition to these three different nuclear RNA Pols, eukaryotic cells contain separate polymerases in mitochondria and chloroplast. These small (~100 kD) single subunit RNA Pols, which resemble those encoded by certain bacteriophages are much simpler than the nuclear RNA Pols, although they catalyze the same reaction.

Table 4: Properties of different eukaryotic RNA polymerases

S. No. Properties RNA Pol I RNA Pol II RNA Pol III

1 Location Nucleoli Nucleoplasm Nucleoli

2 Function (cellular transcripts)

Synthesis of precursors of most rRNA (5.8S, 18S and 28S)

Synthesis of precursors of mRNA and some small nuclear RNAs (snRNAs)

Synthesis of precursors of 5S rRNA and tRNA and small nuclear and cytosolic RNAs

3 Sensitivity to α-amanitin fungal toxin (cyclic octapeptide)

Insensitive Very sensitive (Strongly inhibited); binds tightly and inhibit elongation phase

Moderately sensitive (inhibited by high concentrations)

4 Number of subunits

14 12 17

5 Polymerase activity / cell

50-70% 20-40% 10%

6 Class of genes transcribed

Class I Class II Class III

(A) RNA Pol I (RNA Pol A)

RNA Pol I (Pol I or Pol A) is located in the nucleoli. It is responsible for continuous synthesis of rRNA during interphase. The continuous transcription of multiple gene copies of the RNAs is essential for sufficient production of the processed rRNAs, which are packaged into ribosomes. Human cells contain 5 clusters of around 40 copies of rRNA gene situated on different chromosomes. Each rRNA cluster is known as a nucleolar organizer region, since the nucleolus contains large loops of DNA corresponding to the gene clusters. After a cell emerges from mitosis, rRNA synthesis restarts and tiny nucleoli appear at the chromosomal locations of the rRNA genes. Each rRNA gene produces a 45S rRNA transcript called pretranscript or preribosomal RNA or pre-rRNA, which is ~13000 nucleotide long. During active rRNA

27

Page 28: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

synthesis, the pre-rRNA transcripts are packed along the rRNA genes and may be visualized in electron microscope as ‘Christmas tree structures’. In these structures, the RNA transcripts are densely packed along the DNA and stick out perpendicularly from the DNA. The 45S pretranscript is cleaved to give one copy each of 28, 18, 5.8S rRNAs, which are 5000, 2000 and 160 nucleotides long respectively. (B) RNA Pol II (RNA Pol B)

Among the three RNA polymerases, RNA Pol II (Pol II or Pol B) is functionally most versatile as it transcribes the mRNAs and some specialized RNAs such as most of the small nuclear RNAs (snRNAs). RNA Pol II is central to eukaryotic gene expression and has been studied extensively. RNA Pol II is located in the nucleoplasm. This enzyme can recognize thousands of promoters that vary greatly in sequence. Although this RNA Pol II is strikingly more complex than its bacterial counterpart, the complexity masks a remarkable conservation of structure, function and mechanism. (i) Structure: RNA Pol II is somewhat larger than and has several subunits that have no counterpart in Thermus aquaticus / bacterial RNA Pol. Pol II is a huge enzyme with a molecular mass of up to 600 kD. The enzyme contains two nonidentical ‘large’ (>120 kD) subunits comprising ~65% of its mass that are homologs of the prokaryotic RNA Pol β and β’ subunits and up to 12 additional ‘small’ (<50 kD) subunits, two of which are homologs of prokaryotic RNA Pol α subunits and one of which is a homolog of prokaryotic RNA Pol ω subunit. Of these small subunits, five are identical in all three eukaryotic RNA Pols and two others (the RNA Pol α homologs) are identical in RNA Pol I and III. Thus, 10 of the 12 RNA Pol II subunits are either identical or closely similar to subunits of RNA Pol I and III. Moreover, the sequences of these subunits are highly conserved (~50% identical) across species from yeast to humans (and to a less extent between eukaryotes and bacteria). In fact, in all ten cases tested, a human RNA Pol II subunit could replace its counterpart in yeast without loss of cell viability.

Roger Kornberg determined the X-Ray crystallographic structure of RNA Pol II in yeast. Overall the shape of yeast RNA Pol II enzyme resembles a ‘crab claw’, which is similar to bacterial Taq RNA Pol. The yeast enzyme has positions and core folds similar to their homologous subunits in bacterial RNA Pol. The two ‘pincers’ of the crab claw (RNA Pol II) are made up predominantly of the RPB1 and RPB2. The active site, which is made up of regions from both these subunits, is found at the base of the pincers within a region called the ‘active center cleft’. The highly conserved helical segment of RBP1 called ‘bridge’ bridges the two pincers forming the enzyme’s cleft. This helix is straight in all X-Ray structures of RNA Pol II yet determined, but it is bent in that of Taq RNA Pol. A massive (~59 kD) portion of RPB1 and RPB2 named the ‘clamp’ swings down over the DNA to trap it in the cleft. A portion of RPB2 called the ‘wall’ directs the template strand out of the cleft in a ~90° turn. A loop called the ‘rudder’ extends from the clamp. There are various ‘channels’ that allow DNA, RNA and ribonucleotides into and out of the enzyme’s active center cleft.

Various subunits of RNA Pol II are summarized below (Table 5). (a) RBP1 having C-terminal Domain (CTD) and RBP2: RBP1 is the largest subunit and exhibits a high degree of homology to the β’ subunit of a bacterial RNA Pol. It contains the

28

Page 29: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

active site of the enzyme RNA Pol II.

It has an unusual feature, a long carboxyl terminal domain (CTD) called ‘tail’. The tail consists of many highly conserved repeats of a heptad amino acid sequence Tyr-Ser-Pro-Thr-Ser-Pro-Ser (YSPTSPS). There are 27 repeats in the yeast enzyme (18 exactly matching the consensus), 52 (21 exact) in the mouse enzyme and 53 in human enzyme. This CTD is separated from the main body of the enzyme by an unstructured linker sequence. These repeats are essential for viability. The CTD sequence may be subjected to phosphorylation at Ser and Tyr. Five of the 7 residues in these particularly hydrophilic repeats bear OH groups and at least 50 of them, predominantly those on Ser residues, are subject to reversible phosphorylation by CTD kinases and CTD phosphatases. In vitro studies have shown that RNA Pol II initiates transcription only when the CTD is unphosphorylated. Phosphorylation of CTD occurs during transcription elongation as RNA Pol leaves the promoter. Charge-charge repulsions between nearby phosphate groups probably cause a highly phosphorylated CTD to project as far as 500 Å from the globular portion of RNA Pol II. The phosphorylated CTD provides the binding sites for numerous auxillary factors that have essential roles in the transcription process. The CTD has been shown to be an important target for differential activation of transcription elongation. Such so-called ‘tail’ is absent in bacterial enzyme.

RBP2 is structurally similar to the bacterial β subunit. (b) RBP3 and RBP11: These two subunits show some structural homology to the bacterial α subunits. (c) Rbp4 and Rbp7: Genetic studies have demonstrated that some of the Pol II specific subunits are dispensable. Thus, two subunits, Rbp4 and Rbp7, are not essential for activity and are present in RNA Pol II in less than stoichiometric amounts. Rbp7 has a 102-residue segment that is 30% identical to a portion of σ70 of E. coli. These subunits are absent in yeast (Saccharomyces cerevisiae) RNA Pol II. (d) RBP6: RPB6 is homologous to the ω subunits of bacterial RNA Pol.

Although Pol II has the smallest number of subunits, it transcribes the largest and most diverse array of promoters. A number of other proteins, which are not part of the Pol II complex, are used by RNA Pol II as subsidiary proteins, thereby contributing to its functional diversity. (ii) Nucleotide addition and RNA Pol II translocation: RNA Pol II binds two Mg++ ions at its active site in the vicinity of 5 conserved acidic residues, which suggests that RNA Pol catalyze RNA elongation via a two-metal ion catalytic mechanism for nucleotide addition similar to that proposed for all types of polymerase. As is the case with Taq RNA Pol, the surface of the RNA Pol II is almost entirely negatively charged except for the DNA binding cleft and the region about the active site, which are positively charged. (C) RNA Pol III (also called as RNA Pol C)

RNA Pol III occurs in the nucleoplasm and synthesizes the precursors of 5SrRNA, tRNA, U6snRNAs and a variety of other small nuclear and cytosolic RNAs. It has 16 or more subunits.

29

Page 30: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

RNA Pol III transcribes the 5S rRNA component of large ribosomal subunit. This is the only rRNA subunit to be transcribed separately. Like the other rRNA genes, which are transcribed by RNA Pol I, the 5S rRNA genes are tandemly arranged in a gene cluster. In humans, there is a single cluster of around 2000 genes. Less is known about signals and ancillary factors involved in termination for eukaryotic polymerases. Each class of polymerase uses a different mechanism.

Genetic studies have demonstrated that in contrast to Pol I and Pol II, all subunits of Pol III are essential. Table 5 summarizes various prokaryotic and eukaryotic RNA polymerase subunits.

Table 5: Comparison of prokaryotic and eukaryotic subunits of RNA polymerases

S. No. Prokaryotic Eukaryotic

Bacterial Archaeal RNA Pol I RNA Pol II RNA Pol III

1 β’ A’ / A’’ RPA1 RPB1 RPC1

2 β B RPA2 RPB2 RPC2

3 α’ D RPC5 RPB3 RPC5

4 α’’ L RPC9 RPB11 RPC9

5 ω K RPB6 RPB6 RPB6

[+6 others] [+9 others] [+7 others] [+12 others]

Note: The subunits in each column are listed in order of decreasing molecular weight.

(2) Eukaryotic promoters

Unlike bacterial promoters, which have relatively simple structures, eukaryotic promoters are highly complex in nature. The various promoters are described in the following sections: (A) RNA Pol I promoter

Since, the numerous rRNA genes in a given eukaryotic cell have essentially identical sequences, its RNA Pol I only recognizes one promoter. Yet, in contrast to the case for RNA Pol II and III, RNA Pol I promoters are specific, i.e., an RNA Pol I only recognizes its own promoter and those of closely related species. Pol I promoters vary greatly in sequence from one species to another. Thus, e.g. mammalian RNA Pol I has a bipartite promoter consisting of two transcription control regions: (i) Core promoter element: It refers to minimal set of sequence element required for accurate transcription initiation. It spans positions -31 to +6. It includes transcription start site and hence overlaps the transcribed region. It has a short conserved sequence element, a short AT rich sequence around start point called initiator sequence (Inr). This sequence is essential for transcription (Fig. 13).

30

Page 31: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(ii) Upstream control element (UCE) or Upstream promoter element (UPE): It is located between residues -187 and -107 bp upstream from the start site (Fig. 13). The element is GC rich. The UCEs are ~85% identical and ~50-80 bp long. The sequence is bound by specific transcription factors, which then recruit RNA Pol I to the transcription start site. The UCE is thus responsible for an increase in efficiency of transcription by 10- to 100-fold compared to that from the core element alone.

Pre-rRNA gene

Transcription start site (+1)

Core promoter

-31 +6 about-100

Upstream control element (UCE) (~50-80 bp long; GC rich)

Fig. 13: RNA Pol I promoter

(B) RNA Pol II Promoter

The promoters recognized by RNA Pol II are considerably longer, complex and more diverse than those of prokaryotic genes. Like RNA Pol I, RNA Pol II promoter consists of core promoter and regulatory regions, which are described below: (Fig. 14).

(i) Core promoter (Basal elements): The eukaryotic core promoter refers to the minimal set of sequence elements required for accurate transcription initiation by the Pol II machinery. A core promoter is ~40 nucleotides long, extending either upstream or downstream of the transcription start site. Four elements found in Pol II core promoters are TATA box, BRE, Inr and DPE. Typically, a promoter includes only two or three of these four elements. Many Pol II promoters have a few sequence features in common, including a TATA box (eukaryotic consensus sequence TATAAA) near base pair -30 and an Inr sequence (initiator) near the RNA start site at +1. However, few Pol II promoters lack a TATA box or a consensus Inr element or both. The sequence elements summarized here are more variable among the Pol II promoters of eukaryotes than among E. coli promoters.

(a) TATA box or Hogness box: An A/T rich sequence (TATAA/TAA/T) called TATA

box is located -25 to -30 bp upstream of the transcription start site. The consensus sequence (homologous segment, TATA box) is T82A97T93A85A63/T37A83A50/T37 and the subscripts indicate the % occurrence of corresponding base. This TATA box resembles the -10 region of prokaryotic promoters (TATAAT), although they differ in their locations relative to the transcription start site (-27 vs -10). This conserved region was first discovered by Goldberg Hogness and is also called (GH) box or Hogness box.

31

Page 32: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

The TATA box is the major assembly point for the proteins of the preinitiation complexes of Pol II. The deletion of the TATA box does not necessarily eliminate transcription; rather it generates heterogeneities in the transcriptional start site, thereby indicating that the TATA box participates in selecting this site.

(b) TFIIB recognition element (BRE): Immediate upstream of the TATA box is the TFIIB recognition element, which is targeted by TFIIB. The consensus sequence is: G/CG/CG/ACGCCC.

(c) Initiator sequence (Inr): The initiator element (Inr) is located around the

transcription start site (+1). The consensus sequence of Inr is: C/TC/TANT/AC/TC/T. Many initiator elements have a C at position -1 and an A at +1. The DNA is unwound at the initiator sequence and the transcription start site is usually within or very near this sequence.

(d) Downstream promoter element (DPE): Further downstream in the transcribed

element is downstream promoter element having the consensus sequence: A/GGA/TCGTG.

-37 -31-30 -26 -2 +4 +28 +32

BRE TATA Inr DPE

Binding sites for: TFIIB TBP of TFIID TFIID TFIID

Consensus G/CG/CG/ACGCCC YYANT/AYY sequences: TATAA/TAA/T A/GGA/TCGTG where, N represents any nucleotide and Y is pyrimidine nucleotide

Fig. 14: RNA Pol II promoter

Promoters contain different combinations of conserved elements. No element is common to all the promoters. The elements found in any individual promoter differ in number, location and orientation. Some eukaryotic genes contain an initiator element instead of a TATA box. Other promoters have neither a TATA box nor an initiator element. These genes are generally transcribed at low rates and initiation may occur at different start sites over a length of up to 200 bp. These genes often contain a GC rich 20-50 bp region within the first 100-200 bp upstream from start site (described below).

(ii) Upstream regulatory elements (URE): The basal elements primarily determine the location of the start point, but also sponsor initiation only at a rather low level. Thus, the

32

Page 33: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

basal elements are not sufficient for strong promoter activity. Additional elements called upstream regulatory elements located between -40 and -200 bp (present on template strand) upstream of transcription start site are important in order to increase the low activity of basal promoters. These sequences are important in regulating Pol II promoters and vary greatly in type and number. They serve as binding sites for a wide variety of proteins that affect the activity of Pol II. These elements are found in many genes, which vary widely in their levels of expression in different tissues. The examples are:

(a) GC box: The structural genes expressed in all tissues, eg. House keeping genes or

constitutive genes (genes that are continuously expressed rather than regulated), have one or more copies of the sequence 5’-GGGCGG-3’ located upstream from their transcription start sites. They are located at about -90 position, however, the positions of these upstream sequences vary from one promoter to another. Often multiple copies are present in the promoter and they occur in either orientation. The structural genes that are selectively expressed in one or a few types of cells often lack these GC rich sequences.

(b) CAAT box: The gene region extending between -50 and -110 also contains promoter elements. They can occur in either orientation. For instance, many eukaryotic structural genes, including those encoding the various globins, have a conserved sequence of consensus 5’-GGNCAATCT-3’ (the CAAT box) located between about -70 and -90 whose alteration greatly reduces the transcription rate of the gene. Globin genes have, in addition, a conserved CACCC box upstream from CCAAT box that has also been implicated in transcriptional initiation.

The CAAT and GC boxes in eukaryotes differ from that of the similar regions in prokaryotes. The positions of these upstream sequences vary from one promoter to another, in contrast with the quite constant location of the -35 region in prokaryotes. The CAAT box and the GC box can be effective when present on the template strand, unlike the -35 region, which must be present on the coding strand. These differences between prokaryotes and eukaryotes reflect fundamentally different mechanisms for the recognition of cis acting elements. The -10 and -35 sequences in prokaryotic promoters correspond to binding sites for RNA Pol and its associated σ factor. In contrast, the TATA, CAAT, GC boxes and other cis acting elements in eukaryotic promoters are recognized by proteins other than RNA Pol itself. Although the promoter conveys directional information (transcription proceeds only in the downstream direction), the GC and CAAT boxes seem to be able to function in either orientation. They can function at distances that vary considerably from the start point. This implies that the elements function solely as DNA binding sites to bring transcription factors into the vicinity of the start point; the structure of a factor must be flexible enough to allow it to make protein-protein contacts with the basal apparatus irrespective of the way in which its DNA-binding domain is oriented and its exact distance from the start point. GC and CAAT boxes thus play a strong role in determining the efficiency of the promoter, but do not influence its specificity.

33

Page 34: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(C) RNA Pol III promoter

The promoters recognized by RNA Pol III are well characterized. Interestingly, some of the sequences required for the regulated initiation of Pol III are located within the gene itself, whereas others are in more conventional locations upstream of the RNA start site (Fig. 15).

(i) 5S rRNA genes: The genes for 5S rRNA are organized in a tandem cluster. The promoters of genes transcribed by RNA Pol III can be located entirely within the transcribed region (i.e. internal) of the gene. These sequences are therefore conserved sequences in both 5S rRNA and DNA.

Donald Brown established this through the construction of a series of deletion mutants of a Xenopus borealis 5S RNA gene. The 5S rRNA promoter contains the following conserved sequences, which are depicted in Fig. 15.

(a) C box: It is located 81-99 bases downstream from the transcription start site.

(b) A box: It is located at around 50-65 bases downstream of the transcription start site.

The sequence of the Box A is: 5’-TGGCNNAGTGG-3’.

Conserved sequences: TGGCNNAGTGG

Transcription start site (+1)

Box C Box A

+81+55

Fig. 15: RNA Pol III promoter for 5S rRNA

(ii) tRNA genes: RNA Pol III promoters of tRNA genes contain two highly conserved sequences within the DNA encoding the tRNA (internal transcription control regions), namely Box A and Box B. These regions lie downstream from the transcription start site i.e. after the transcription start site and within the transcription unit (Fig. 16).

(a) Box A: It is located around 50-65 bases downstream of transcription start site. The

sequence of the Box A is: 5’-TGGCNNAGTGG-3’. (b) Box B: It is located downstream of transcription start site. The sequence of Box B is:

5’-GGTTCGANNCC-3’.

As both of these sequences lie within the gene, these are conserved in both tRNA and DNA. Thus, these sequences also encode important sequences in the tRNA itself, called the D-loop and the TψC loop.

34

Page 35: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Conserved sequences: TGGCNNAGTGG GGTTCGANNCC

Box B Box A

Transcription start site (+1)

+55

Fig. 16: RNA Pol III promoter for tRNA

(iii) Alternative RNA Pol III promoters: A number of RNA Pol III promoters are regulated

by upstream as well as downstream promoter sequences.

Further studies have shown, however, that the promoters of other RNA Pol III-transcribed genes lie entirely upstream of their start sites. These upstream sites also bind transcription factors that recruit RNA Pol III. These promoters require only upstream sequences including the TATA box and other sequences found in RNA Pol II promoters. Some promoters such as the U6 small nuclear RNA (U6 snRNA) and small RNA genes from the Epstein-Barr virus use only regulatory sequences upstream from their transcription start sites. The coding region of the U6 snRNA has a characteristic A box. However, this sequence is not required for transcription. The U6 snRNA upstream sequence contains sequences typical of RNA Pol II promoters, including a TATA box at bases -30 to -23. These promoters also share several other upstream transcription factor binding sequences with many URNA genes, which are transcribed by RNA Pol II. These observations suggest that common transcription factors can regulate both RNA Pol II and RNA Pol III genes.

(3) Enhancers

Promoters are not the only types of cis acting sequences. Transcription from many eukaryotic promoters can be stimulated by control elements that are located many thousands of base pairs away from the transcription start site. This was first observed in the genome of the DNA virus SV40. A sequence of around 100 bp from SV40 DNA can significantly increase transcription from a basal promoter even when it is placed far upstream or downstream. Such distal sequences are called ‘enhancers’. The enhancer elements thus constitute the distal part of the promoter and can be located either upstream or downstream of the transcription start site. Enhancers are common in eukaryotes and rare in prokaryotes (exception: present with σ54 factor). Enhancers have the following general characteristics and functions:

Enhancer sequences are short sequence elements. They are generally a few hundred base pair long (100-200 bp) and contain multiple sequence elements, which contribute to the

35

Page 36: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

total activity of the enhancer. They consist of sets of elements, similar to upstream promoter, but density of sequences is more i.e. these are more compactly organized as compared to upstream promoter.

Like promoters, they are cis-acting regulatory elements. They are able to function over long distance of more than 1000 bp whether from an

upstream or downstream position relative to start site. They are therefore also called ‘long-range regulatory elements’. In contrast, promoters are small range elements.

They can modulate (activate) transcription of the cognate genes when placed in either orientation with respect to linked genes. They are active even when placed in reverse orientation. They thus contain bidirectional elements and are orientation-independent (Fig. 17).

E P

Transcription

P E

Transcription

Upstream enhancer activates promoter

Downstream enhancer activates promoter

5’

5’

Fig. 17: Activation of transcription by enhancer is orientation and direction independent

Interestingly, the positions of enhancers relative to promoters are not fixed and they can vary substantially. They can modulate (activate) transcription of the cognate genes even when moved away from its original location either upstream or downstream of the coding sequence. Thus, in natural genomes, enhancers can be located within genes also. They are thus position-independent.

Enhancers contain the same sequence elements that are found at promoter. The density of sequence components is greater in the enhancer than in the promoter.

They may be ubiquitous or tissue / cell type-specific. They may be active in only certain cells. Enhancers play key roles in regulating gene expression in a specific tissue or developmental stage.

A given enhancer binds regulators at a given time and place. Alternative enhancers bind different groups of regulators and control expression of the same gene at different times and places in response to different signals.

They exert strong activation of transcription of a linked gene from the correct start site. They exert preferential stimulation of the closest of two tandem promoters. These DNA sequences, although not promoter themselves, can enormously increase the effectiveness of promoters.

36

Page 37: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Enhancer sequences are targeted by a number of sequence-specific DNA binding proteins called gene specific transcription factors and activators. The assembly or clustered group of activators at enhancer region is called enhancons. It is believed that enhancers can regulate transcription of a specific gene from a distant location by ‘bending’ or ‘looping out’ of the intervening DNA sequence (interstitial DNA between promoter and enhancer regions) so that the transcription factors bound to it can directly interact with the RNA Pol II machinery bound at promoter and influences its action.

Activation at a distance raises a problem. When an activator binds at an enhancer, there may be several genes within its range, yet a given enhancer typically regulates only one gene. Other regulatory sequences called insulators or boundary elements are found between enhancers and some promoters. Insulators block activation of the promoter by activators bound at the enhancer. These elements, although still poorly understood, ensure activators do not work indiscriminately.

Elements analogous to enhancers in yeast are called ‘Upstream Activator Sequences’ (UASs). It, however, works only upstream of the promoter and cannot function when located downstream.

(4) Transcription factors

RNA Pol II requires an array of other proteins for its activity, called transcription factors in order to form the active transcription complex. In contrast to somewhat smaller prokaryotic RNA Pol holoenzymes, eukaryotic RNA Pols do not independently bind their target DNAs. Rather they are recruited to their target promoters through the mediation of very large and complicated complexes of transcription factors and their ancillary proteins. Eukaryotic system requires two types of transcription factors: (A) General transcription factors (B) Gene Specific transcription factors (A) General Transcription factors (GTFs)

These are set of proteins, which bind to RNA Pol II promoters and together initiate transcription. They are collectively known as general transcription factors. These multisubunit factors are named as transcription factors TFIIA, TFIIB and TFIIC etc (TF stands for transcription factor and II refers to RNA Pol II).

The general transcription factors collectively perform the functions similar to that performed by σ in bacterial transcription. However, these factors do not show any significant sequence homology to σ factor. They have been shown to assemble on basal promoters in a specific order and they may be subject to multiple levels of regulation. They help polymerase to bind to the promoter.

The binding of a transcription factor to its cognate DNA sequence enables the RNA Pol to locate the proper initiation site. Such highly complex assembly of RNA Pols and associated proteins is absent in prokaryotes. The binding of the TFs to the promoter leads to the melting of DNA (comparable to the transition from closed to open complex in bacteria). They also help polymerase escape from the promoter and embark on elongation phase.

37

Page 38: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

The general transcription factors, TFIIs, required at every Pol II promoter are highly conserved in all eukaryotes. The properties of various GTFs required by RNA Pols are summarized in Table 6.

Table 6: Properties of RNA Pol II (yeast) promoters associated general transcription factors

S. No.

Transcription protein

Number of subunits

Subunit (s) Mr (D)

Properties / Function(s)

1 TBP (TFIID) 1 38000 TBP (38 kD) is part of TFIID (700 kD); TFIID also contains TBP associated factors (TAFs); TBP has saddle like structure and its concave surface recognizes TATA box in the minor groove; TBP is regulated by TAFII230 that binds to its concave surface thereby preventing the binding of TBP to DNA.

2 TFIIA 3 12000, 19000, 35000

Stabilizes binding of TFIIB and enhances transcription; Allows binding of TBP (as TFIID) to the promoter; Prevents binding of DR1 and DR2 inhibitors to TFIID; Removes inhibition of TBP by TAFII230

3 TFIIB 1 35000 Binds to TBP; Interacts with upstream of TATA box in major groove (at BRE) and downstream of TATA box in minor groove and allows asymmetric assembly of complex thereby allowing unidirectional transcription; Recruits Pol II-TFIIF complex

4 TFIIE 4 34000, 57000

Heterotetramer of two subunits; Recruits TFIIH; Has ATPase and helicase activities; Stimulates kinase activity of TFIIH

5 TFIIF 2 30000,74000 Binds tightly to Pol II; Binds to TFIIB and prevents binding to Pol II to nonspecific DNA sequences; Acts as elongation factor later

6 TFIIH 12 35000-89000

Largest; Two subunits have ATPase activity; One subunit has protein kinase activity; Unwinds DNA at promoter (helicase activity); Phosphorylates Pol II (within the CTD); Recruits nucleotide excision repair proteins for DNA repair

7 TFIIJ Not characterized

Not characterized

Required for transcription (at least in vitro); Probably plays role in promoter clearance and elongation

38

Page 39: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Many RNA Pol II promoters, which do not contain a TATA box, have an initiator element overlapping their start site. It seems that at these promoters, TBP is recruited to the promoter by a further DNA binding protein, which binds to the initiator element. TBP then recruits the other transcription factors and RNA Pol in manner similar to that, which occurs in TATA box promoters.

Similarly, transcription factors, TFI and TFIII, are required to stimulate the transcription by RNA Pol I and III, respectively. (B) Gene specific transcription factors

Although RNA Pol II and its associated factors (TFIIs) play a major role in initiation of transcription of various mRNA encoding genes, the extent of their transcription is modulated by another set of transcription factors called gene specific transcription factors. The term “gene specific transcription factors” is used because the combination of such factors may actually direct the transcription of one gene as opposed to others. Gene specific transcription factors play a major role in tissue specific gene expression and eliciting certain responses such as immune response, apoptosis, cell differentiation etc. Gene specific transcription factors are characterized by a DNA binding domain, which recognizes specific cis-regulatory sequences located in the proximal and the distal regions of the promoter. Following binding to the cognate sequence, gene specific transcription factors mediate their effect on RNA Pol II through another domain called transactivation domain. Transactivation domain communicates with the Pol II machinery through a group of proteins called mediators or activators. Activators do not bind DNA directly but act as bridging molecule between Pol II and the gene specific transcription factors. Many eukaryotic gene specific transcription factors have been characterized till date and a few of them are listed in Table 7.

Table 7: Gene specific transcription factors and their functions

S. No. Name Species Function

1 MyoD Human, Mouse etc. Skeletal muscle specific gene expression

2 NF kappa B Human, Mouse etc. Immune response, cytokine gene expression

3 Glucocorticoid Human, Mouse etc., receptors

Activation of glucocorticoid, responsive genes

One of the characteristics of the gene specific transcription factors is that they possess distinct structural motifs essential for DNA recognition and transactivation function. They are often classified on the basis of such structural features such as homeodomain, helix turn helix, helix loop helix and Zn finger. Quite often two gene specific transcription factors belonging to the same structural family dimerize and bind to the target sequence in a bipartite manner. One such eg. is the transcription factor AP-1 which is a dimer of Jun (39 kD) and Fos (65 kD) proteins.

39

Page 40: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

They belong to the leucine zipper family and target the sequence TGACTCA. Gene specific transcription factors are often targeted by various signal-transducing kinases such as MAP kinase, which phosphorylates them to induce their activities. Some gene specific transcription factors are also localized in the cytoplasm in an inactive form and upon activation are translocated to the nucleus for activity. For eg. transcription factor NK kappa B remains bound to an inhibitory protein called I kappa B which retains it in the cytoplasm. Upon receiving appropriate signal, I kappa B is ubiquintylated and degraded, resulting in the release of NK kappa B that is then translocated to the nucleus.

(5) Elongation factors

During elongation, the activity of the Pol II is greatly enhanced by proteins called elongation factors. The transition from initiation to elongation phase involves the shedding of most of the initiation factors and mediator. In their place another set of factors is recruited. This exchange of initiation factors for those factors required for elongation and RNA processing involves phosphorylation of the CTD of RNA Pol II. Properties of various elongation factors are summarized in Table 8. (6) Overall process of eukaryotic transcription

(A) Synthesis of precursor of mRNA by RNA Pol II

The process of transcription by Pol II can be described in terms of several phases - assembly and initiation, elongation and termination - each associated with characteristic proteins (Fig. 18). (i) Assembly and Initiation: The eukaryotic transcription involves the assembly of RNA Pol II and transcription factors at a promoter. The step-by-step pathway described below leads to active transcription in vitro. In the cell, many of the proteins may be present in larger, preassembled complexes, simplifying the pathways for assembly on promoters. Two major points of differences in the initiation phase of transcription in prokaryotes and eukaryotes are: melting requires ATP hydrolysis and secondly promoter escape occurs after phosphorylation of polymerase. The formation of preinitiation complex or basal transcription apparatus thus involves following steps:

Binding of TBP: In the first step, TBP, a component of TFIID transcription factor, binds TATA box 105 times as tightly to the TATA box as to noncognate sequences. Both DPE and initiator sequences are also targeted by TFIID. TBP bound to TATA box is the center point of the initiation complex. This binding induces large conformational changes in the bound DNA. When TBP binds to TATA box, it distorts the DNA using a β-sheet inserted into the minor groove. This distortion generates a binding site for TFIIB, which in turn provides a platform for the recruitment of the Pol II and TFIIF. This complex is distinctly asymmetric. The asymmetry is crucial for specifying a unique start site and ensuring that transcription proceeds unidirectionally.

40

Page 41: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Table 8: Elongation factors involved in eukaryotic transcription S. No.

Transcription protein

No. of subunits

Subunit (s) Mr

Properties / Function(s)

Elongation factors required for elongation stage 1 Elongin (S III) 3 15000,

18000, 110000

Involved in elongation; Enhances the elongation rate (2000 nucleotides per minute); Suppresses the pausing of RNA Pol II

2 ELL 1 80000 Name derived from Eleven-nineteen lysine rich leukemia; The gene for ELL is the site of chromosomal recombination events frequently associated with acute myeloid leukemia; Involved in elongation; Enhances the elongation rate (2000 nucleotides per minute); Suppresses the pausing of RNA Pol II

3 TFIIS (S II) 1 38000 Involved in elongation; Reduces the length of time for which the Pol II pauses at sequences that could slower its progress and hence Pol II does not transcribe all regions at constant rate; Stimulates proof reading activity of RNA Pol II

4 p-TEFb 2 43000, 124000

Positive Transcription Elongation Factor b; Involved in elongation; Phosphorylates Pol II (within the CTD) at Ser 2; Contains CDK9 protein kinase which also helps in phosphorylation; Recruits elongation factor TAT-SF1; Phosphorylates and activates elongation factor hSPT5; Also involved in RNA processing; Recruits capping enzyme and splicing machinery

5 TFIIF 4 (2 each type)

30000, 74000

Binds tightly to Pol II

Elongation factors used in processing 6 hSPT5 - - Involved in RNA processing; Recruits and

stimulates 5’ capping enzyme 7 TAT-SF1 - - Involved in RNA processing; Recruits the

components of the splicing machinery

41

Page 42: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

DNAPromoter

++

RNA Pol IIwith CTD tail

RNA

Prom

oter

reco

gniti

on,

b

indi

ng, m

eltin

g

&

cle

aran

ce

Elon

gatio

nIn

itiat

ion

Term

inat

ion

TFIIBTFIIA

RNA Pol II with CTD tail

TFIIF

TATA box

+1

TFIIE TFIIH

TFIIDTBP

PPPP

P P

RNA synthesis

DNA

RNA Pol II movement

p y y

Fig. 18: Transcription by eukaryotic RNA Pol II

42

Page 43: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Binding of TFIIA: In the next step, TFIIA binds directly to TBP and stabilizes its interaction with DNA and thereby enhances transcription. TFIIA binding, although not always essential, can be important at non-consensus promoters where TBP binding is relatively weak.

Binding of TFIIB: The formation of a closed complex begins when the TBP binds to the factor TFIIB, which also binds to DNA on either side of TBP.

Recruitment of TFIIF-Pol II: The TFIIB-TBP complex is next bound by another complex consisting of TFIIF and Pol II. TFIIF helps target Pol II to its promoters, both by interacting with TFIIB and by reducing the binding of the polymerase to nonspecific sites on the DNA.

Binding of TFIIE and TFIIH: Following recruitment of Pol II-TFIIF, two more transcription factors viz TFIIE and TFIIH are recruited to complete the assembly of the closed preinitiation complex. They bind upstream of Pol II.

TFIIH is a complex factor having multiple enzymatic activities including ATPase, helicase, kinase and DNA repair activities. The DNA helicase activity of TFIIH promotes the unwinding of DNA near the RNA start site (i.e. Inr), thereby creating an open complex. This process requires the hydrolysis of ATP. The helicase activity is required for unwinding the DNA and the DNA repair activity presumably couples transcription with DNA repair to avoid transcription of any faulty gene. TFIIH has an additional function during the initiation phase. A kinase activity in one of its subunits phosphorylates Pol II at many places in the CTD. Several other protein kinases, including CDK9, which is part of the complex p-TEFb, also phosphorylate the CTD. In the preinitiation complex, TFIIE stimulates the kinase activity of TFIIH resulting in the hyperphosphorylation of the carboxyl terminal domain (CTD) of Pol II. Sometimes in the formation of this complex, the carboxyl terminal domain of the polymerase is phosphorylated on the serine and threonine residues and then the Pol II escapes the promoter to begin transcription. The importance of the CTD is highlighted by the finding that yeast cell containing mutant Pol II with fewer than 10 repeats is not viable. Phosphorylation of CTD causes a conformational change in the overall complex that weakens the interaction of Pol II with TBP, thereby aiding in initiation of transcription. Most of the factors are released before the Pol leaves the promoter and can than participate in another round of initiation.

Requirement of additional proteins including mediator complex, nucleosome modifiers and remodellers: One reason for the additional requirements of mediator complex, nucleosome modifiers and remodellers is that the DNA template in vivo is packaged into nucleosomes and chromatin. This condition complicates binding of polymerase and its associated factors to the promoter.

Transcription regulatory proteins called activators help recruit polymerase to the promoter, stabilizing its binding there. This recruitment is mediated through interactions between DNA bound activators and parts of the transcription machinery. Often the interaction is with the CTD tail of the large polymerase subunit through one surface, while presenting other surfaces for interaction with DNA-bound activators. This explains the need for mediator to achieve significant transcription in vivo. Despite this central role in transcriptional activation, deletion of individual subunits of mediator often leads to loss of expression of only a small subset of genes, different for each subunit (it is made up of many subunits). This result likely reflects the fact that different activators are believed to interact with different mediator subunits to bring polymerase

43

Page 44: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

to different genes. In addition, mediator aids initiation by regulating the CTD kinase in TFIIH.

The need of nucleosome modifiers and remodellers also differs at different promoters or even at the same promoter under different circumstances. When and where required, these complexes are also recruited by the DNA-bound activators. Nucleosome modifying enzymes include histone acetyltransferase, histone deacetylase and histone methylase.

Promoter melting, abortive initiation, synthesis of nascent RNA, phosphorylation of CTD and promoter clearance or escape: After promoter melting, synthesis of nascent RNA is initiated. Just as in bacterial case, there occurs a period of abortive initiations before the Pol II escapes the promoter and enters the elongation phase. During abortive initiation, the Pol II synthesizes a series of short transcripts. As Pol II continues the elongation, TFIIB, TFIIF and TFIIH are also released from the promoter by a so-called promoter clearance. TFIIF, however, remains associated with Pol II and helps the elongation by suppressing pausing. In contrast to the situation in bacteria, promoter melting in eukaryotes also requires hydrolysis of ATP and is mediated by TFIIH. In contrast to bacteria, promoter escape in eukaryotes also involves phosphorylation of polymerase. The form of Pol II recruited to the promoter initially contains a largely unphosphorylated tail, but the species found in the elongation complex bears multiple phosphoryl groups on its tail. Addition of these phosphate groups help polymerase shed most of the general transcription factors used for initiation and which the enzyme leaves behind as it escapes the promoter. Indeed, in addition to TFIIH, a number of other kinases (eg. p-TEFb) have been identified that act on CTD as well as a phosphatases that removes the phosphates added by those kinases. Regulating the phosphorylation state of the CTD of Pol II controls late steps, those involving processing of the RNA as well.

(ii) Elongation: Once RNA Pol has initiated transcription, it shifts into the elongation phase. This transition involves the Pol II enzyme shedding most of its initiation factors, for eg. general transcription factors and mediator. During synthesis of the initial 60-70 nucleotides of RNA, TFIIE is released. Subsequently, TFIIH is released. However, TFIIF remains associated with Pol II throughout elongation. Pol II then enters the elongation phase of transcription. In the place of transcription factors and mediator, another set of factors is recruited. This new set of factors stimulates Pol II elongation and RNA proof reading. These proteins that greatly enhance the activity of the Pol II are called elongation factors. Examples include TFIIS, pTEFb, hSPT5, Elongin and ELL. The elongation factors suppress pausing or arrest of transcription by the Pol II-TFIIF complex and also coordinate interactions between protein complexes involved in posttranscriptional processing of mRNAs. The enzymes involved in all these processes are, like several of the initiation factors, recruited to the C-terminal tail of large subunit of Pol II, the CTD. In this case, however, the factors favor the phosphorylated form of the CTD. Thus, phosphorylation of the CTD leads to an exchange of initiation factors for those factors required for elongation and RNA processing. As is evident from the crystal structure of yeast Pol II, the polymerase CTD lies directly adjacent to the channel through which the newly synthesized RNA exits the enzyme. This, together with its length (it can extend some 800 Å from the body of enzyme) allows the tail to bind several components of the elongation and processing machinery and to deliver them to the emerging RNA. Some other elongation factors are required for RNA processing.

44

Page 45: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(iii) Termination and release: Once the RNA transcript is completed, transcription is terminated. The enzyme RNA Pol II does not terminate immediately. Rather, it continues to move along the template, generating a second RNA molecule that can become as long as several hundred nucleotides before terminating i.e. termination of mRNA synthesis is combined with polyadenylation (hence the details of termination step are described after polyadenylation). Pol II is dephosphorylated, dissociated from the template, recycled and is then ready to initiate another transcript. In the process, new RNA is released, which may be degraded without ever leaving the nucleus. (B) Synthesis of precursors of rRNA by RNA Pol I

The pre-rRNA transcription units contain three sequences that encode the 5.8S, 18S and 28S rRNAs (Fig. 19). Pre-rRNA transcription units are arranged in clusters in the genome as long tandem arrays separated by non-transcribed spacer sequences. RNA Pol I in nucleolus synthesizes pre-rRNA. The arrays of rRNA genes loop together to form the nucleolus and are known as nucleolar organizer regions.

18S rRNA 5.8S rRNA 28S rRNA

5S rRNA subunit transcribed separately

Fig. 19: Pre-rRNA transcript in eukaryotes (45S) (~13000 nt)

The synthesis of rRNA (5.8S, 18S and 28S) involves transcription factors and complexes, for eg. Upstream binding factor (UBF) and eukaryotic transcription complex called Selectivity factor (SL-1) (similar complex in different species are called TIF-IB, Rib1). UBF is a specific DNA binding protein, which binds to UCE. It greatly stimulates the transcription rate. In its absence, a low rate of basal transcription is seen. SL-1 contains four subunits: one TBP (TATA binding protein) and three TAFIs (TBP associated factors for RNA Pol I).

The process of transcription of rRNA (5.8S, 18S and 28S) is outlined below and depicted in Fig. 20.

UBF binding: UBF binds to the sequence in the upstream part of core element, called upstream control element (UCE) of RNA Pol I promoter. Other UBF also binds to the upstream region of the core element (core promoter). The sequences in the two UBF binding sites have no obvious similarity. One molecule of the UBF is thought to bind to each sequence element. UBF-UBF binds by protein-protein interaction causing intervening DNA to form loop between the two binding sites. (Some are of the view that a single UBF binds to two different sites, viz UCE and the upstream part of the core element).

45

Page 46: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

UBF

UBF

UBF

UBF

UBF

CoreUCE

+1

TBP

TAFIs

TAFIs

TAF Is SL1

SL1

RNA Pol I

UBF

UBFSL1

Fig. 20: rRNA transcription initiation

Selectivity factor binding: Selectivity factor (SL-1) binds to and stabilizes the UBF-DNA

46

Page 47: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

complex. It interacts with the free downstream part of the core element. Binding of UBF increases transcription initiation activity by SL-1. Acanthamoeba has a simple transcription control system. This has a single control element and a single factor TIF-1, which are required for RNA Pol I binding and initiation at the rRNA promoter.

RNA Pol I binding: SL-1 binding allows RNA Pol I to bind the complex and initiate transcription and is essential for rRNA transcription.

(C) Synthesis of precursors of tRNA and 5S rRNA by RNA Pol III

(i) tRNA: The promoter of tRNA genes has two consensus sequences downstream of transcription start site, namely Box ‘A’ and Box ‘B’, as described in earlier section. Two complex DNA binding factors have been identified which are required for transcription initiation by RNA Pol III. These are transcription factors TFIIIC and TFIIIB (Fig. 20). TFIIIC is large protein complex having six subunits and has a size of >500 kD. It is the assembly factor for positioning TFIIIB at right location. TFIIIB is the true initiation factor for Pol III. It has three subunits – TBP, B’’ and BRF (TFIIB related factor; it has homology to TFIIB-the RNA Pol II initiation factor). B’’ is comparable to sigma factor of prokaryotes and functions to initiate transcription bubble. TFIIIB has no sequence specificity and therefore its binding site appears to be determined by the position of the TFIIIC binding to DNA. Once TFIIIB has bound, TFIIIC can be removed without affecting transcription. TFIIIC is therefore an assembly factor for the positioning of the initiation factor TFIIIB.

The process of transcription, involving following steps is outlined in Fig. 21.

Transcription initiation at eukaryotic tRNA promoter:

A Box B Box

+1

TFIIIC

B''

BRF

TBPTFIIIB

B''

BRF

TBPB Box

B Box

A Box

A Box

TFIIIC

RNA Pol III

B''

BRF

TBP

TFIIIC

B BoxA BoxTFIIIC

RNA Pol III

Fig. 21: Transcription initiation by eukaryotic tRNA promoter

47

Page 48: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

TFIIIC binding: TFIIIC binds to both Box ‘A’ and Box ‘B’ of the tRNA promoter.

TFIIIB binding: TFIIIB binds TFIIIC-DNA complex and interacts with DNA upstream from TFIIIC binding site (TFIIIB binds 50 bp upstream from A box).

RNA Pol III binding: TFIIIB helps in recruitment of RNA Pol III. The enzyme RNA Pol III then initiates transcription, presumably displacing TFIIIC from DNA template as it goes.

Termination of transcription occurs without accessory factors. A cluster of dA residues is often sufficient for termination and the termination efficiency depends on surrounding sequence. An example of an efficient termination signal in somatic 5S rRNA genes of Xenopus borealis is 5’-GCAAAAGC-3’.

(ii) 5S rRNA: The promoter of tRNA genes has two consensus sequences downstream transcription start site, namely Box ‘A’ and Box ‘C’, as described in earlier section. The process of transcription of 5S rRNA genes involves the transcription factors TFIIIA, TFIIIC and TFIIIB. TFIIIA is assembly factor for positioning TFIIIB at right location. TFIIIB is true initiation factor for Pol III. TFIIIB has no sequence specificity and therefore its binding site appears to be determined by the position of the TFIIIC binding to DNA.

The process of transcription involves following steps.

TFIIIA binding: TFIIIA binds strongly to Box ‘C’ promoter sequence. TFIIIC binding: TFIIIC then binds to TFIIIA-DNA complex interacting also with Box

‘A’ sequence. TFIIIB binding: Once TFIIIC has bound, TFIIIB can interact with the complex. RNA Pol III binding: TFIIIB then recruits RNA Pol III to initiate transcription.

Fig. 22 depicts the schematic representation of the process of transcription of 5S rRNA.

Post transcriptional RNA processing

Transcription products of all three eukaryotic RNA polymerases undergo various alterations to yield the mature product. RNA processing is the collective term used to describe these alterations to the primary transcript. The various post transcriptional processing occurring to RNA are summarized in Table 9.

(A) Post transcriptional processing of mRNA

In prokaryotes, there is little or no processing of prokaryotic mRNA after synthesis by RNA Pol. Indeed many mRNA molecules are translated while they are being transcribed, i.e. before being completely synthesized. Prokaryotic mRNA is degraded rapidly from the 5’ end and the first cistron (protein-coding region) can therefore only be translated for a limited amount of time. In eukaryotes, RNA Pol II synthesizes mRNA as longer precursors (pre-mRNA), the population of different pre-mRNAs being called heterogeneous nuclear RNA (hnRNA). Once transcribed, eukaryotic precursor mRNA has to be processed in various ways before being exported from the nucleus where it can be translated (Table 7).

48

Page 49: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

A Box C Box

+1

TFIIIC

B''

BRF

TBPTFIIIB

B''

BRF

TBPA Box

A Box

B Box

B Box

RNA Pol III

B''

BRF

TBPA Box B Box

RNA Pol III

Transcription initiation at eukaryotic 5S rRNA promoter:

A Box+1

C Box

TFIIIA

TFIIIA

TFIIIA

TFIIIA

TFIIIA

TFIIIC

TFIIIC

TFIIIC

Fig. 22: Transcription initiation at eukaryotic 5S rRNA promoter

Table 9: Various post transcriptional processing occurring to RNA 1. End modification It occurs during the synthesis of eukaryotic and archael mRNAs. This

involves addition of nucleotides to the 5’ or 3’ ends of the primary transcripts or their cleavage products. Such events do not occur in case of prokaryotes. These include: (i) Capping of 5’ end of mRNA

(ii) Polyadenylation of 3’ end of mRNA 2. Splicing It is the removal of introns (non-coding sequences in the genes) from the

precursor RNAs (i.e. eukaryotic mRNAs, and some eukaryotic rRNAs and tRNAs). It leads to physical change in the length of the transcript.

3. Cutting events These involve cutting of primary transcripts (or removal of nucleotides) of rRNA and tRNA with endonucleases or exonuclease to produce mature transcripts in both prokaryotes and eukaryotes. It leads to physical change in the length of the transcript.

4. Chemical modifications

These modifications are made within the rRNAs, tRNAs and mRNAs. The rRNAs and tRNAs of all organisms are modified by addition of new chemical groups. These groups are added on either the base or the sugar moiety of specific nucleotides in RNAs. It occurs to a much lesser extent with pre-mRNA in eukaryotes. Equivalent events in archaea are poorly understood. Chemical modification of mRNA called RNA editing is seen in a diverse group of eukaryotes.

49

Page 50: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Strikingly, there is an overlap in proteins involved in elongation and those required for RNA processing. As mentioned earlier, the transcription elongation factor, pTEFb activates another elongation factor hSPT5, which helps in recruitment and stimulation of the 5’ capping enzyme. Another example is the pTEFb-induced recruitment of the elongation factor TAT-SF1, which further recruits the components of the splicing machinery. Thus, it seems that transcription and RNA processing are interconnected, presumably to ensure their proper coordination, allowing cotranscriptional processing of primary transcript.

(a) Capping of 5’ end: Capping involves the addition of a modified G base (m7G) to the 5’ end of mRNA. Specifically it is a methylated G and it is joined to the RNA transcript by an unusual 5’ → 5’ linkage involving three phosphates. The cap is added in reverse polarity (5’ to 5’), thus acting as a barrier to 5’ exonuclease attack, but it also promotes splicing, transport and translation. The 5’ cap is created in three enzymatic steps:

Removal of a phosphate group from 5’ end of transcript: RNA triphosphatase removes the γ-phosphate at the 5’ end of RNA by hydrolysis leading to the formation of a diphosphate at the 5’ end (the initiating nucleotide of a transcript initially retains its α-, β- and γ-phosphates).

Addition of GTP: In the next step, the enzyme guanylyl transferase catalyzes the nucleophilic attack of the resulting terminal β-phosphate on the α-phosphoryl group of a molecule of GTP, with β- and γ-phosphates of the GTP serving as a pyrophosphate-leaving group. The reaction leads to the formation of an unusual 5’-5’ triphosphate linkage. This distinctive terminus is called a Cap.

Modification of terminal G residue by the addition of the methyl group: Once this linkage is made, the newly added G and the purine at the original 5’ end of the mRNA are further modified. The N-7 nitrogen of terminal guanine is methylated by methyltransferase using S-methyl adenosine as methyl group donor and forms Cap 0. The adjacent riboses may also be methylated to form Cap 1 or Cap 2.

(b) Splicing: In eukaryotes, genes are characterized by non-coding sequences interspersed between stretches of coding sequences. These intervening non-coding sequences are called introns and the coding sequences are called exons. Introns can be as small as 10 nucleotides and as long as hundreds of kilobases. The introns are copied when the gene is transcribed and are removed from the precursor mRNA by cutting and joining reactions. This removal of introns occurs in nucleus and is called splicing (Scheme 2.). Thus, the precursor mRNAs, which forms the nuclear RNA fraction called heterogeneous nuclear RNA (hnRNA) (unspliced transcripts) become mature mRNAs after splicing. The splicing of nuclear pre-mRNA is the most complicated post transcriptional processing event.

In eukaryotes, each intron in nuclear pre-mRNA, also called GU-AG intron, is characterized by a signature sequence in both 3’ and 5’ ends that are recognized by the spliceosome. The borders between introns and exons are thus marked by specific nucleotide sequences within the pre-mRNAs. Thus, for splicing, an intron should have a 5’-GU, an AG-3’ and a branch point sequence. Thus, the sequences within the RNA delineate where splicing will occur (Scheme 3).

50

Page 51: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

mRNA

Pre-mRNA

Splicing

3’

Exon 4Exon 1 Exon 3Exon 2

Intron 3 Intron 2Intron 15’

Scheme 2: Schematic representation of splicing in eukaryotes

3’ splice site

Upstream Exon AG GUAAGU UACUAAC PynNCAG G Downstream Exon

Branch site5’ splice site

Scheme 3: Splice sites in most of the vertebrates Splicing requires the cooperation of several small nuclear ribonucleoproteins (snRNPs, pronounced ‘snurps’). The snRNPs are formed when uracil-rich snRNAs made by RNA Pol II are complexed with specific proteins. The snRNPs interact with one another forming a complex, called spliceosome, which helps to hold the upstream and downstream exons close together while looping out the intron. This folding of the pre-mRNA into the correct conformation is essential for splicing. After the spliceosome forms, a rearrangement takes place before the two-step splicing reaction (Transesterification reactions) can occur with release of the intron as a lariat. Spliceosome is made up of 5 snRNPs, but the exact makeup differs at different stages of the splicing reaction: different snRNPs come and go at different times, each carrying out particular functions in the reaction. There are also many proteins within the spliceosome that are not part of the snRNPs. Spliceosome thus comprises of about 150 proteins and 5 snRNAs and is similar in size to a ribosome.

51

Page 52: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Various snRNPs involved in the process and their sizes and functions are summarized in the Table 10.

Table 10: Small nuclear ribonucleoprotein particles (snRNPs) in the splicing of nuclear mRNA precursors

S. No. snRNPs Size (nucleotides) Role

1 U1 snRNP 165 Binds to 5’ splice site and then the 3’ splice site;

Promotes binding of U2 snRNP

2 U2 snRNP 185 Binds the branch site (aided by U2AF);

Forms part of catalytic center after binding with U6

3 U5 snRNP 116 Binds 5’ splice site (A loop in U5 snRNA is immediately adjacent to the first base positions in both exons)

4 U4 snRNP 145 Masks the catalytic activity of U6

5 U6 snRNP 106 Catalyzes splicing

Splicing of nuclear pre-mRNA consists of two successive transesterification reactions, in which phosphodiester bonds within the pre-mRNA are broken and new ones are formed. Thus, the number of phosphodiester bonds remains constant during reactions. The chemistry of the splicing process is simple. The intron is removed in a form called a lariat as the flanking exons are joined (Fig. 23).

Thus, the snRNPs have three roles in splicing reaction: To recognize 5’ splice site and branch site and bring them together as required To catalyze (or help to catalyze) the RNA cleavage and joining reactions

Various steps of splicing pathway can be summarized as follows:

Formation of Early (E) complex: Initially, the 5’ splice site is recognized by the U1 snRNP (using base pairing between its snRNA and the pre-mRNA). One subunit of U2 auxillary factor (U2AF) binds to the pyrimidine tract and the other to the 3’ splice site. The former subunit interacts with branch point binding protein (BBP) and helps that protein to bind to the branch site. This arrangement of proteins and RNA is called the Early (E) complex.

Formation of A complex: U2 snRNP then binds to the branch site, aided by U2AF and displacing BBP. This arrangement is called the A complex. The base pairing between the U2 snRNA and the branch site is such that the branch site ‘A’ residue is extruded from the resulting stretch of double helical RNA as a single nucleotide bulge. This ‘A’ residue is thus unpaired and available to react with the 5’ splice site.

52

Page 53: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

5'

AG

GUAAGU

Y

RYNY

2'OH

P

5' splice site

Exon 1

Intron

Precursor mRNA

YnNC

A

AG

3'

G 3' splice siteP

Exon 2

Branch site

5'

AG

GUAAGU

Y

RYNY

P

Intron

YnNC

A

AG

3'

G

P

2'

5'

3'

3'OH

Lariat intermediate

First transesterification reaction

Secondtransesterification reaction

GUAAGU

Y

RYNY

P

Intron

2'

5'

3'

Lariat form of intron

YnNC

3'OH

A

AG

5'

AG

Exon 1

3'

G

P

Exon 2

+Spliced product

Fig. 23: Splicing of nuclear pre-mRNA

Formation of B complex: The next step is a rearrangement of the A complex to

bring together all three splice sites. This is achieved as follows: the U4 snRNP and U6 snRNP, along with the U5 snRNP, join the complex. Together these three snRNPs are called tri-snRNP particle, within which the U4 snRNP and U6 snRNP are held together by complementary base pairing between their RNA components and the U5 snRNP is more loosely associated through protein-protein interactions. With the entry of the tri-snRNP, the A complex is converted into the B complex.

Formation of C complex: In the next step, U1 snRNP leaves the complex and U6 snRNP replaces it at the 5’ splice site. This requires that the base pairing between the U1 snRNA and the pre-mRNA be broken, allowing the U6 snRNA to anneal with the same region (infact, to an overlapping sequence). These steps complete the assembly pathway. The next rearrangement triggers catalysis and occurs as follows: U4 snRNP is released from the complex, allowing U6 snRNP to interact with U2 snRNP (through RNA:RNA base pairing). This rearrangement is called the C complex. This rearrangement has following consequences:

It produces the active site. It brings together those components believed to be solely regions of the U2 snRNA and U6 snRNA within the spliceosome forming the active site.

The same rearrangement also ensures the proper positioning of the substrate RNA to be acted upon.

Formation of the active site juxtaposes the 5’ splice site pre-mRNA and the branch site. It is striking that the active site is primarily formed of RNA, but also that it is only formed at this

53

Page 54: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

stage of spliceosome assembly. Presumably this strategy lessens the chance of aberrant splicing. Linking the formation of the active site to the successful completion of the earlier steps in spliceosome assembly makes it highly likely that the active site is available only at legitimate splice sites.

Joining of exons and release of mature mRNA: The juxtaposition of the 5’ splice site pre-mRNA and the branch site facilitates the first transesterification reaction. The second reaction, between the 5’ and 3’ splice sites, is aided by the U5 snRNP, which helps to bring the two exons together. The final steps involve release of mRNA product and the snRNPs. The snRNPs are initially bound to the lariat, but get recycled after rapid degradation of that piece of RNA.

Components of the splicing machinery arrive or leave the complex at each step due to changes associated with structural rearrangements necessary for the splicing reaction to proceed. There is evidence to suggest that some of the components shown do not arrive or leave precisely when indicated in the figure, they may, for eg., remain present but weaken their association with the complex rather than dissociating completely. It is also not possible to be sure of the order of some changes shown, particularly the two steps involving changes in U6 pairing: when it takes over from U1 snRNP at the 5’ splice site, compared to when it takes over from U4 snRNP in binding U2 snRNP. Despite these uncertainties, the critical involvement of different components of the machinery at different stages of the splicing reaction and the general dynamic nature of the spliceosome, are as shown in Fig. 24. Some eukaryotic pre-mRNAs do not fall into the GU-AG intron category. They have different consensus sequences at their splice sites. These are AU-AC introns, which have been found in approximately 20 genes in organisms as diverse as humans, plants and Drosophila. These introns require U11 / U12 snRNPs. (c) Polyadenylation of 3’ end (followed by termination of transcription): The final RNA processing event, polyadenylation of the 3’ end of pre-mRNA, is intimately linked with the termination of transcription. Just as with capping and splicing, the polymerase CTD tail is involved in recruiting the enzymes necessary for polyadenylation. Once polymerase has reached the end of a gene, it encounters specific sequences that, after being transcribed into RNA, trigger the transfer of polyadenylation enzymes to that RNA, leading to three events (Fig. 25):

Cleavage of the message Addition of many adenine residues to its 3’ end by Poly A polymerase Termination of transcription by polymerase

Eukaryotic mature mRNA transcripts have more nucleotides beyond 3’ end. Indeed, the nucleotide preceding the poly (A) is not the last nucleotide to be transcribed. Polyadenylation was once looked on as a ‘post transcriptional’ event but it is now recognized that the process is an inherent part of the mechanism for termination of transcription by RNA Pol II.

54

Page 55: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

5' 3'A

A

BBPU2 snRNP

5' 3'A

BBP U2AF65 35U1 snRNP

U6 snRNP

U4 snRNP

+

U5 snRNPU2AF65 35

A

U1 snRNP

A

A

U4 snRNP

ALariat form of intron

Spliced exons

snRNP particleTri

3'5'

Spliceosomal mediated splicing reaction:

Fig. 24: Spliceosomal mediated slicing

In mammals, polyadenylation is directed by a signal sequence in the mRNA, almost invariably 5’-AAUAAA-3’. These are cleaved by a specific endonuclease that recognizes the sequence AAUAAA. Cleavage does not occur if this sequence or a segment of some 20 nucleotides on its 3’ side is deleted. The presence of internal AAUAAA sequences in some mature mRNAs indicates that AAUAAA is only part of the cleavage signal. This sequence is located between 10 and 30 nucleotides upstream of the dinucleotide 5’-CA-3’ and is followed 10-20 nucleotides later

55

Page 56: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

by a GU rich region. Both the poly (A) signal sequence and the GU rich region are binding sites for multisubunit protein complexes.

Cleavage and polyadenylation specificity factor (CPSF) binds poly (A) signal sequence. Cleavage stimulation factor (CstF) binds GU rich region.

5’ 3’ Cleavage site

Ongoing transcription

AAAAAAAAAAAAA 5’

Poly A Pol

Fig. 25: Termination step involves cleavage followed by polyadenylation of transcript

Besides, Poly (A) polymerase and at least two other protein factors must associate with bound CPSF and CstF in order for polyadenylation to occur.

After cleavage by the endonuclease, template-independent RNA polymerase called poly (A) polymerase adds about 250 adenylate residues to the 3’ end of the transcript. Virtually, all eukaryotic mRNAs have a series of up to 250 adenosines at their 3’ ends. This enzyme uses ATP as a precursor and adds ‘A’ residues using the same chemistry as RNA polymerase. These ‘A’ residues are not specified by DNA sequence, i.e. these A(s) are added without a template. Thus, the long tail of A(s) is found in the RNA but not the DNA. It is not clear what determines the length of the poly A tail, but that process involves other proteins that bind specifically to the poly A sequence (described later). The polymerase does not act at the extreme 3’ end of the transcript, but at an internal site, which is cleaved to create a new 3’ end to which the poly (A) tail is added. The reaction catalyzed is as follows:

RNA + n ATP → RNA-(AMP)n + PPi

The additional factors required include polyadenylate-binding protein (PABP). These PABPs catalyze the following functions:

To help the polymerase to add the adenosines Possibly influences the length of the poly (A) tail that is synthesized Appears to play a role in maintenance of the tail after synthesis Also play a role in translation

In yeast, the signal sequences in the transcript are slightly different, but the protein complexes are similar to those in mammals and polyadenylation is thought to occur by more or less the

56

Page 57: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

same mechanism.

CPSF is known to interact with TFIID and is recruited into the polymerase complex during the initiation stage. By riding along the template with RNA Pol II, CPSF is able to bind to the poly (A) signal sequence as soon as it is transcribed, initiating the polyadenylation reaction. Both CPSF and CstF contact with the CTD of the polymerase. It has been suggested that the nature of these contacts changes when the poly (A) signal sequence is located and that this change alters the properties of the elongation complex so that termination becomes favored over continued RNA synthesis. As a result, transcription stops soon after the poly (A) signal sequence has been transcribed. The details of the termination step linking cleavage and polyadenylation to termination of transcription are outlined in Fig. 26. It is noteworthy that the long tail of A(s) is unique to transcripts made by RNA Pol II, a feature that allows experimental isolation of protein coding mRNAs by affinity chromatography. The mature mRNA is then transported from the nucleus.

It is not known what links polyadenylation to termination, but it is clear that the polyadenylation signal is required for termination (interestingly, RNA cleavage is not). Two basic models have been proposed to explain the link between polyadenylation and termination:

First that the transfer of 3’ processing enzymes from the polymerase CTD tail to the RNA triggers a conformational change in the polymerase that reduces processivity of the enzyme, leading to spontaneous termination soon afterward.

The second model proposes that the absence of a 5’ cap on the second RNA molecule is sensed by the polymerase, which, as a result, recognizes the transcript as improper and terminates. The absence of the cap reflects the absence of the capping enzymes on the CTD at this stage of the transcription cycle (these enzymes are loaded onto the CTD at the point where initiation turns to elongation and are then displaced in favor of the splicing machinery).

The role of poly (A) tail is still not firmly established despite much effort. Even though polyadenylation can be identified as an inherent part of the termination process, this does not explain the necessity to add a poly (A) tail to the transcript. Evidence that it enhances translation efficiency and the stability of mRNA is accumulating. The poly (A) tail on pre-mRNA is thought to help stabilize the molecule since a poly (A)-binding protein binds to it, which should act to resist 3’ exonuclease action. In addition, the poly (A) tail may help in the translation of the mature mRNA in the cytoplasm. Blocking the synthesis of poly (A) tail by exposure to 3’-deoxyadenosine (cordycepin) does not interfere with the synthesis of primary transcript. The mRNA devoid of a poly (A) tail can be transported out of the nucleus. However, an mRNA molecule devoid of a poly (A) tail is usually a much less effective template for protein synthesis than is one with a poly (A) tail. Thus, poly (A) tail has a role in initiation of translation. It is further supported by research showing that poly (A) polymerase is repressed during those periods of the cell cycle when relatively little protein synthesis occurs. Indeed some mRNAs are stored in an unadenylated form and receive the poly (A) tail only when translation is imminent. The half-life of an mRNA molecule may also be determined in part by the rate of degradation of its poly (A) tail. Histone pre-mRNAs do not get polyadenylated, but are cleaved at a special sequence to generate their mature 3’ ends.

57

Page 58: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

5’……………………………………..AAUAAA…………..CA….…GU rich region………3’

5’……………………………………..AAUAAA…………..CA…..…GU rich region………3’

5’……………………………………..AAUAAA…………..CAAAAAAAAAAAAAA3’

Polyadenylated mRNA

Pre-mRNA

RNA Pol II

Polyadenylate binding protein

CstF

10-30 bp 10-20 bp

Polyadenylation

CPSF

DNA

DNA

Polyadenylation signal sequence (AAUAAA)

CPSF

RNA

RNA CstF

CstF

Termination is favored over CPSF is shown attached to the RNA Pol II elongation complex that is synthesizing RNA. CPSF binds to the polyadenylation signal sequence AAUAAA as soon as it is transcribed. This changes the interaction between CPSF and the CTD of RNA Pol II so that termination of transcription is now favored over continued elongation. CstF probably attaches the GU rich region downstream of AAUAAA. The CPSF is shown to leave the complex in order to bind to the polyadenylation signal, when in reality it may maintain its attachment to RNA Pol II during the polyadenylation process.

Cleavage proteins attaches to signal sequence

CPSF

Fig. 26: Termination signal and the link between polyadenylation and termination of transcription by RNA Pol II

58

Page 59: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(d) Pre-mRNA methylation: The final modification or processing event that many pre-mRNA undergo is specific methylation of certain bases. In vertebrates, the most common methylation event is on the N6 position of A residues, particularly when these A residues occur in the sequence 5’-RRACX-3’, where X is rarely G. Up to 0.1% of pre-mRNA A residues are methylated and the methylations seem to be largely conserved in the mature mRNA, though their function is unknown.

Alternative mRNA processing

Alternative mRNA processing is the conversion of pre-mRNA species into more than one type of mature mRNA. Alternative processing can be achieved in four different ways:

By using different poly (A) sites By using different promoters By retaining certain introns / by retaining or removing certain exons RNA editing

(a) Alternative poly (A) sites: Some pre-mRNAs contain more than one poly (A) site and these may be used under different circumstances (eg. in different cell types) to generate different mature mRNAs. The cell or organism has a choice of which one to use. It is possible that if the upstream site is used then sequences that control mRNA stability or location are removed in the portion that is cleaved off. Thus mature mRNAs with the same coding region, but differing stabilities or locations, could be used in the same cell at a frequency that reflects their relative efficiencies (strengths) and the cell would contain both types of mRNA. The efficiency of a poly (A) site may reflect how well it matches the consensus sequences. In other situations, one cell may exclusively use one poly (A) site, while a different cell uses another. The most likely explanation is that in one cell the stronger site is used by default, but in the other cell a factor is present that activates the weaker site so it is used exclusively, or that prevents the stronger site from being used. In some cases, the use of alternative poly (A) sites causes different patterns of splicing to occur. In some cases, factors will bind near to and activate or repress a particular site. (b) Alternative promoters: The use of different promoters in different cell types and at different developmental stages lead to the generation of different mature mRNAs. (c) Alternative splicing: In many cases, the generation of different mature mRNAs from a particular type of gene transcript can occur by varying the use of 5’- and 3’-splice sites. This is called alternative splicing. Hence, a single transcript can be spliced in multiple ways resulting in a number of protein coding sequences. The alternative splicing events as depicted in Fig. 27 are:

Exon skipped Intron retained Exon extended using cryptic splice sites Alternative exons

59

Page 60: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Intron 2

DNA

Exon 1 Ex 2 Eon xon 3Intron 1 Intron 2

Primary transcript5' 3'

5' 3'

RNA

Transcription

Exon 1Exon 2Exon 3 Exon 3Exon 1 Exon 1 Exon 2 Exon 3 Exon 1 xon 2Exon 3EIntron 1

Exon 2Exon 1

+Exon 3Exon 1

Spliced mRNA

Splicing

Normal Exon skipped Exon extended Intron retained Alternative exons

Exon on xon 3Intron 1

1 Ex 2 E

Fig. 27: Types of splicing

By this strategy, a gene can give rise to more than one polypeptide product with partially

As these splicing events occur differently in different cell types, it is likely that cell type-specific

d) RNA editing: An unusual form of RNA processing in which the sequence of the primary

There are two major mechanisms that mediate editing: ich a substitution

overlapping sequences and is more common in higher eukaryotes. Some pre-mRNAs can be spliced in more than one way, generating alternative mRNAs. It is estimated that 30% of the genes in human genome are spliced in alternative ways to generate more than one protein per gene. Some examples of alternatively spliced pre-mRNA are: troponin, tropomyosin, myosin, actin, fibronectin, fibrinogen, nerve growth factor, aldolase, alcohol dehydrogenase, calcitonin, SV40 T-antigen, Drosophila sxl, tra and dsx pre-mRNA for sex determination etc.

factors are responsible for activating or repressing the use of processing sites near to where they bind. Thus, the application of SR proteins (serine-arginine rich) and hnRNPs to guide alternative splicing mechanism has been suggested. (transcript is altered is called RNA editing. RNA editing, like RNA splicing, is a process in which sequence of RNA changes after or during its transcription i.e. at the level of mRNA. In this form of RNA processing, the nucleotide sequence of the primary transcript is altered by changing / inserting / deleting residues at specific points along the molecule. Thus, the protein produced upon translation is different from that predicted from the gene sequence i.e. coding sequence in RNA differs from the sequence of DNA from which it was transcribed. This is thus a method for increasing protein diversity, similar to alternative splicing. RNA editing occurs in two different situations, with different causes.

Site-specific deamination: In mammalian cells, there are cases in whoccurs in an individual base in mRNA, causing a change in the sequence of the protein that is coded. For eg., apolipoprotein B gene and mRNA in mammalian intestine and liver, glutamate receptors in rat brain etc.

60

Page 61: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

The m interrupted) apolipoprotein B gene whose sequence is

Another example is provided by glutamate receptors in rat brain. Editing at one position changes

Guide RNA-directed (gRNA-directed) uridine insertion or deletion: In mitochondria of

Additions or deletions (most usually of uridine) occur in trypanosome and leishmania

Besides the above-mentioned types, two other terms are associated with RNA editing. These

rtional editing: This type of editing occurs with some RNAs, for eg., the

drial

ammalian genome contains a single (identical in all the genes, with a coding region of 4563 codons. This gene is transcribed into an mRNA that is translated into a protein of 512 kD, called apo B100, representing the full coding sequence in the liver. A shorter form of protein, called apo B48 of ~250 kD size is synthesized in intestine. This protein consists of the N-terminal half of the full-length protein. It is translated from an mRNA whose sequence is identical with that of liver except for a change (deamination by cytidine deaminase) from C to U at codon 2153 in 26th exon. This substitution changes the codon CAA for glutamine into the ochre UAA for termination. The two proteins though translated from the same gene have different functions. Apo B48, which is formed only in small intestine functions in chylomicrons to transport triacylglycerols from the intestine to the liver. On the other hand, Apo B100, which is formed only in liver functions in VLDL, IDL and LDL to transport cholesterol from liver to peripheral tissues.

a glutamine codon in DNA into a codon for arginine in RNA; the change affects the conductivity of the channel and therefore has an important effect on controlling ion flow through the neurotransmitter. At another position in the receptor, an arginine codon is converted to a glycine codon.

trypanosome and leishmania, more widespread changes occur in transcripts of several genes, when bases are systematically added or deleted.

mitochondria and in paramyxovirus. Extensive editing reactions occur in trypanosomes in which as many as half of the bases in an mRNA are derived from editing. The editing reaction uses a template consisting of a guide RNA that is complementary to the mRNA sequence. An enzyme complex including endonuclease, terminal uridylyltransferase and RNA ligase catalyzes the reaction. The free nucleotide is used as a source of addition. Such type of editing is also called Pan editing.

include: Inse

paramyxovirus P gene, which gives rise to at least two different proteins because of the insertion of the Gs at specific positions in the mRNA. Guide RNAs do not specify these insertions, instead they are added by the RNA Pol as the mRNA is being synthesized. Polyadenylation editing: This type of editing is seen in many animal mitochonmRNAs. Five of the mRNAs transcribed from the human mitochondrial genome end with just a U or UA, rather than with one of the three termination codons. Polyadenylation converts the terminal U or UA into UAAAA….. and so several features that appear to have evolved in order to make vertebrate mitochondrial genome as small as possible.

61

Page 62: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

(B) Post transcriptional processing of tRNA and rRNA (maturation of tRNA and rRNA)

tRNA and rRNA are synthesized as precursors and they undergo cleavage by nuclease i.e. undergo processing and are not translated. In E. coli, three kinds of rRNA molecules and a tRNA molecule are excised from a single primary RNA transcript that also contains spacer regions. Other transcripts contain arrays of several kinds of tRNA or of several copies of the same tRNA. Mature rRNAs and tRNAs are generated by cleavage and other modifications of nascent RNA chains.

In eukaryotes, rRNA and tRNA molecules, in contrast with mRNAs and small RNAs that participate in splicing, do not have caps. Because rRNAs and tRNAs are non-coding, chemical modifications to their nucleotides affect only the structural features and possibly, catalytic activities of the molecules. Post transcriptional processing of tRNA

The rRNA operons of E. coli contain coding sequences for tRNAs. In addition, there are other operons in E. coli that contain up to seven tRNA genes separated by spacer sequences. Mature tRNA molecules are processed from precursor transcripts of both of these types of operon by nucleases. The nucleases that cleave and trim the precursors of tRNA are highly precise. In prokaryotes, the generation of mature tRNA involves cleavage of a 5’ leader sequence, removal of 3’ terminal extra residues and chemical modifications of several bases and ribose units.

Similar to prokaryotes, eukaryotic pre-tRNA contains extra nucleotides at 5’ and 3’ ends and also modified bases. Besides, some eukaryotic pre-tRNAs and archael transcripts also contain introns, which are different from pre-mRNA introns. Such introns are rare in bacteria. The primary transcript forms a secondary structure with characteristic stems and loops, which allow endonucleases to recognize and cleave off the 5’ leader and the two 3’ nucleotides. Unlike prokaryotes, 5’-CCA-3’ at the 3’ end of the mature tRNAs are added by separate enzymatic reactions and not encoded by the genes.

Various pre-tRNA processing events are summarized below: (a) Removal of the 3’ terminal residues: Once the primary transcript has folded, it has characteristic stems and loops and extra nucleotides at the 3’ end in both prokaryotes and eukaryotes. The extra flanking nucleotides at the 3’ end are cleaved by an endonuclease, RNase E or F at the base of the stem so that the precursor tRNA still has extra nucleotides. The exonuclease RNase D then removes the remaining extra nucleotides at the 3’ end, one at a time. (b) Cleavage of a 5’-leader sequence: The primary transcript of tRNA contains extra nucleotides at the 5’ end in both prokaryotes and eukaryotes. These nucleotides are cleaved by an endonuclease, RNase P. This generates the correct 5’ terminus of all tRNA molecules in E. coli. This enzyme is a ribozyme containing a catalytically active RNA molecule, capable of catalyzing a chemical reaction in the absence of protein. It is therefore a very simple ribonucleoprotein (RNP). RNase P enzymes are found in both prokaryotes and eukaryotes, being located in the nucleus of the latter. They are therefore small nuclear RNPs (snRNPs). In E. coli, the endonuclease is composed of a 377 nucleotide RNA and a small basic protein of 13.7 kD. The in vitro RNase P ribozyme reaction requires a higher Mg++ concentration than occurs in vivo, so the

62

Page 63: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

protein component probably helps to catalyze the reaction in cells. (c) Attachment of CCA at 3’ end: tRNA nucleotidyl transferase enzyme then adds CCA at the 3’ end in eukaryotes. tRNA nucleotidyl transferase is unusual enzyme that binds three ribonucleotide triphosphate precursors in separate active sites and catalyzes the formation of phosphodiester bonds to produce CCA (3’) sequence. So this sequence is not DNA or RNA dependent. The template is the binding site of enzyme. A major difference between prokaryotes and eukaryotes is that, in the former, the 5’-CCA-3’ at the 3’ end of the mature tRNAs is encoded by the genes. In eukaryotic nuclear-encoded tRNAs, this is not the case. (d) Chemical modifications of several bases and ribose units: Another processing event is the modification of bases and ribose units of tRNAs in both prokaryotes and eukaryotes. Such unusual bases are found in all tRNA molecules. They are formed by the enzymic modification of a standard ribonucleotide in a tRNA precursor. Modification involves methylation, acetylation, deamination, reduction, rearrangement, attachment of isopentenyl or SH group of bases. Many of these modifications were first identified in tRNAs, within which approximately one in ten nucleotides become altered. For eg., uridylate residues are modified after transcription to form ribothymidylate and pseudouridylate. These modifications generate diversity, allowing greater structural and functional versatility. These modifications are thought to mediate the recognition of individual tRNAs by the enzymes that attach amino acids to these molecules and to increase the range of the interactions that can occur between tRNAs and codons during translation, enabling a single tRNA to recognize more than one codon.

Most of these modifications are carried out directly on an existing nucleotide within the transcript but two modified nucleotides, quenosine and wyosine are put in place by cutting out an entire nucleotide and replacing it with the modified version.

Different pre-tRNAs are processed in a similar way, but the base modifications are unique to each particular tRNA type. (e) Removal of introns: Some eukaryotic pre-tRNAs and archael transcripts also contain introns. In eukaryotes and archaea, therefore the next step in tRNA processing is the removal of the intron, which occurs by endonucleolytic cleavage at each end of the intron followed by ligation of the half molecules of tRNA. The introns of yeast pre-tRNA can be processed in vertebrates and therefore the eukaryotic tRNA processing machinery seems to have been highly conserved during evolution. Fig. 28 shows various processing events of pre-tRNA in E. coli.

Post transcriptional processing of rRNA

In the prokaryote, E. coli, there are seven different operons for rRNA that are dispersed throughout the genome and which are called rrnH, rrnE, etc. Each operon contains one copy of each of the 5S, the 16S and the 23S rRNA sequences (Fig. 29). Within this operon, coding sequences for tRNA molecules are also present and these primary transcripts are processed to give both rRNA and tRNA molecules. The initial transcript has a sedimentation coefficient of 30S (~6500 nucleotides) and is normally quite short-lived.

63

Page 64: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

5'3'

Anticodon loop

CCA

T loopD loop

RNase P

RNase D

Endonuclease(RNase E / F)

RNase D

pre-tRNA processing in E. coli:

Variable arm

Fig. 28: Pre-tRNA processing in E. coli

4S tRNA 16S rRNA 23S rRNA 5S rRNA

Fig. 29: Pre-rRNA transcript in prokaryotes (30S) (~6500 nt) In many eukaryotes, the precursor rRNA contains one copy of the 18S coding region and one copy each of the 5.8S and 28S coding regions, which together are the equivalent of the 23S rRNA in prokaryotes (Fig. 30). The eukaryotic 5S rRNA is transcribed by RNA Pol III from unlinked genes to give a 121-nucleotide transcript, which undergoes little or no processing. The post-transcriptional processing of rRNA takes place in a defined series of steps: (a) Modification of bases and ribose units: The primary rRNA transcript folds up into a number of stem-loop structures by base pairing between complementary sequences in the transcript. The formation of this secondary structure of stems and loops allows some proteins to bind to form a ribonucleoprotein (RNP) complex. Many of these proteins remain attached to the RNA and become part of the ribosome. The same modifications occur at the same positions on all copies of an rRNA and these modified positions are, to a certain extent, the same in different species. Functions for the modifications have not been identified, although most occur within

64

Page 65: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

those parts of rRNAs thought to be most critical for the activity of these molecules in ribosomes. Modified nucleotides might, for eg., be involved in rRNA catalyzed reactions such as synthesis of peptide bonds.

18S rRNA 5.8S rRNA 28S rRNA

5S rRNA subunit transcribed separately

Fig. 30: Pre-rRNA transcript in eukaryotes (45S) (~13000 nt) After the binding of proteins, modifications such as base and sugar (usually adenosine) methylations take place, using S-adenosyl methionine (SAM) as methylating agent. In contrast to the modifications made to bacterial rRNAs, which are carried out by enzymes that directly recognize the sequences and / or structures of the regions of RNA containing the nucleotides to be modified, the methylation in eukaryotes requires small nucleolar RNPs (snoRNPs). However, the bacterial rRNAs are less heavily modified than eukaryotes ones. The snoRNAs are 70-100 nucleotides in length and are located in nucleolus. The snoRNAs contain segments of 10-21 nucleotides that are precisely complementary to segments of mature rRNAs containing O2’ methylation sites. These snoRNA sequences are located between the conserved sequence motifs known as box C (RUGAUGA) and box D (CUGA), which are respectively located on the 5’ and 3’ sides of the complementary segments. The site for methylation in rRNA is exactly the 5th position upstream of box D. Methylation is mediated by a complex of nucleolar proteins including methyltransferase. For conversion of uridine to pseudouridine, snoRNAs having conserved motifs i.e., box H / ACA, are involved. These snoRNAs contain the sequence motifs ACANNN at the 3’ end and box H (conserved sequence ANANNA) at its 5’ end. The conserved motifs of such snoRNAs form a specific base paired interaction with its target site containing U, which is then recognized by the modifying enzyme.

The chemical modifications occurring during maturation of rRNA and tRNA are listed in Table 11. (b) Cleavage of precursor rRNA by nucleases: The cleavage includes two steps: The primary cleavage event, which is mainly carried out by RNase III, releases precursors of the 5S, 16S and 23S molecules. The secondary cleavage step further cleaves at the 5’- and 3’-ends of each of these precursors by RNases M5, M16 and M23, respectively, leading to release of mature rRNA (Fig. 31). For mammalian pre-rRNA, the 47S precursor (13500 nucleotide) undergoes a number of cleavages, firstly in the external transcribed spacers (ETSs) 1 and 2. Cleavages in the internal transcribed spacers (ITSs) then release the 20S pre-rRNA from the 32S pre-rRNA (Fig. 32). Both of these precursors must be trimmed further and the 5.8S region must base pair to the 28S rRNA before the mature molecules are produced. As with prokaryotic pre-rRNA, the precursor folds and complexes with proteins as it is being transcribed. This takes place in the nucleolus.

65

Page 66: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Table 11: Examples of chemical modifications of nucleotides during rRNA and tRNA processing

S. No. Modification Details Examples

1 Methylation Addition of one or more methyl (-CH3) groups to the base or sugar

Methylation of guanosine gives 1-methyl guanosine; 1-methyl adenosine; N2, N2-Dimethyl guanosine; N7-methylguanosine; 3-methylcytidine; Ribothymidine

2 Deamination Removal of an amino (-NH2) group from the base

Deamination of adenosine gives Inosine

3 Sulphur substitution Replacement of oxygen with sulphur

Formation of 4-Thiouridine

4 Base isomerization (Rearrangement)

Changing the positions of atoms in the ring component of the base

Isomerization of uridine gives pseudouridine

5 Double bond saturation (Reduction)

Converting a double bond to a single bond

Double bond saturation converts uridine to dihydrouridine

6 Acetylation Addition of acetyl group to the base

N4-Acetylcytidine

7 Nucleotide replacement

Replacement of an existing nucleotide with a new one

Incorporation of Quenosine and Wyosine

Primary transcriptPre-16S rRNA

Pre-tRNAPre-23S rRNA

Pre-5S rRNAPre-tRNA

RNase III III P F III III P F P E

Precursors

M16 M16 M16RNase M16 M5M5

Mature rRNAs

16S rRNA 23S rRNA 5S rRNA

Processing of E. coli primary rRNA primary transcript:

Fig. 31: Processing of E. coli primary rRNA transcript

66

Page 67: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

47S pre-rRNA primary transcript

Pre-18S rRNAPre-5.8S rRNA

Pre-28S rRNA

RNase

Processing of mammalian primary rRNA primary transcript:

ETS1 ETS2ITS2ITS1

RNase

45S pre-rRNA primary transcript

RNase

41S pre-rRNA primary transcript

RNase

20S & 32S pre-rRNA precursors

Mature rRNAs 18S rRNA

5.8S rRNA

28S rRNA

Fig. 32: Processing of E. coli primary rRNA transcript (c) Removal of introns: Some eukaryotic and archael rRNA pretranscripts, for eg., Tetrahymena thermophila, contain an intron in the precursor for the largest rRNA. Such introns in pre-rRNA are extremely rare in bacteria. These pre-rRNAs undergo an unusual form of processing before it can function. The RNA folds into an enzymatically active form or ribozyme and splice out the introns. Although this process occurs in vivo in the presence of protein, it has been shown that the intron can actually excise itself in the test tube in the complete absence of protein. Inhibitors of transcription

There are two types of inhibitors of transcription: RNA Pol binding inhibitors DNA specific inhibitors

RNA Pol binding inhibitors

Inhibitors that inhibit RNA Pol by binding noncovalently to RNA Pol are called RNA Pol specific inhibitors. For eg., Rifamycin, Rifampicin, Streptolydigin, α-Amanitin etc. (Fig. 33).

Two related antibiotics, rifampicin B, which is produced by Streptomyces mediterranei and its semisynthetic derivative rifampicin specifically inhibit transcription by prokaryotic, but not eukaryotic RNA polymerases. This selectivity and their high potency (bacterial RNA Pol is 50% inhibited by 2 X 10-8 M rifamycin) have made them medically useful bactericidal agents against Gram-positive bacteria and TB. Rifamycins inhibit neither the binding of RNA Pol to the promoter nor the formation of the first phosphodiester bond, but they prevent further chain

67

Page 68: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

elongation. The inactivated RNA Pol remains bound to the promoter, thereby blocking its initiation by uninhibited enzymes.

HO

CH3CH3

CH3

N

N

N

OHO

CH3

CH3

NH

OOHOH

O

O

H C3

H CO3

CH3

CH3OH

H C3

O

O

Rifampicin:

Rifampicin

HO

CH3CH3

OCH OOO

CH3

CH3

NH

OOHOH

O

O

H C3

H CO3

CH3

CH3OH

H C3

O

O

Rifamycin B:

_2

H

Rifamycin B

N

O

HNO

H N2

HN

O

O

NH

O

NH

O

NH CH3

CH3

NH

O

NH

O

HO

NOH

O

S

C

O

H C3

OH

CH OH2

Amanitin:

Amanitin

Fig. 33: RNA Pol binding inhibitors

68

Page 69: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

The poisonous mushroom Amanita phalloides contains a series of unusual bicyclic octapeptides such as α-amanatin, which disrupts mRNA formation in animal cells by blocking Pol II and at higher concentrations, Pol III. Neither Pol I nor bacterial RNA Pol is sensitive to α-amanatin nor is the RNA Pol II of Amanita phalloides itself.

DNA specific inhibitors

Inhibitors that bind noncovalently to DNA template are called DNA specific inhibitors. For eg., Actinomycin D, Ethidium bromide, Acridine, Aflatoxin, 2-acetylaminofluorene etc. (Fig. 34).

Actinomycin D, produced by Streptomyces antibioticus, is a DNA specific inhibitor. It binds tightly and specifically to duplex DNA and in doing so, strongly inhibits both transcription and DNA replication in both prokaryotes and eukaryotes, presumably by interfering with the passage of RNA and DNA polymerases. At low concentration, it inhibits transcription without significantly affecting DNA replication. It has no effect on binding RNA Pol to DNA.

Several other intercalation agents, including ethidium bromide and proflavin also inhibit nucleic acid synthesis, presumably by similar mechanisms. Acridine inhibits RNA synthesis in a fashion similar to Actinomycin D i.e. by intercalation and deformation of DNA. Ethidium bromide is a DNA specific dye, which intercalates between the DNA and binds preferentially to supercoiled DNA. Aflatoxin (Fig. 35) obtained from the fungus Aspergillus flavus, inhibits both replication and transcription. 2-acetyl amino fluorine is a synthetic carcinogen and inhibits both replication and transcription.

Reverse transcriptase (RT) (RNA directed DNA polymerase)

The existence of RT in RNA viruses was predicted by Howard Temin in 1962 and the enzymes were ultimately detected by Temin and independently by David Baltimore in 1970.

Source: Genes of all cellular organisms are made of DNA. The same is true for some viruses, but for others the genetic material is RNA. Viruses are genetic elements enclosed in protein coats that can move from one cell to another but are not capable of independent growth. One well-studied example of an RNA virus is TMV, which infects the leaves of tobacco plants. This virus consists of single strand of RNA (6930 nucleotides) surrounded by a protein coat of 2130 identical subunits. RNA directed RNA polymerase catalyze the replication of this viral RNA. Another important class of RNA virus comprises the retroviruses, so called because the genetic information flows from RNA to DNA rather than from DNA to RNA. This class includes HIV-1 as well as a number of RNA viruses that produce tumors in susceptible animals. Retrovirus particles contain two copies of single stranded RNA molecule.

69

Page 70: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

OO OCH3

O

O OAflatoxin B1:

N+

NH 2H N2

C H2 5

Ethidium bromide:

Aflatoxin Ethidium bromide

NH+(CH ) N

2N(CH )

3

Acridine:

3 2

Acridine

O

CH 3

Actinomycin D:

N

O

CH 3

NH 2

C=O O=C

NH

CH

C=O

3CH

CH

NH

CH

O=C

N

O=C

N

2CH

3CH

CH

3CH

CH

3CH

3CHC

=

O

CH

3CH

3CH

N

HC

O=C CH

CH

CH2

HN

CH

C=O

3H C

CH

NH

CH

C=O

N

C=O

N

2CH

3CH

CH

3CH

C

=

O

CH

3H C

3H C

N

HC

O=C HC

HC

H C2

CH

3H C

3H C

OO

Thr

D-Val

Pro

Sarcosine

Methyl-Val

Phenoxazone ring system

Actinomycin D

Fig. 34: DNA specific inhibitors

70

Page 71: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Reverse transcriptases have been isolated and purified from several different RNA tumor viruses; they have molecular weights ranging from 70000 to 160000. The RNA viruses containing RTs are known as retroviruses (retro is Latin prefix for backward). Some RTs have also been isolated from malignant cells of some animals and from human patients with leukemia, which closely resemble the reverse transcriptase of some RNA tumor viruses. RTs, however, have also been found in cells of animals and people thought to be normal and not infected by tumor viruses; they have also been found in wild type E. coli. Telomerase is also a specialized RT.

Reaction catalyzed and properties: On infection with RNA viruses, the single stranded RNA viral genome (~10000 nucleotides) and the enzyme enter the host cell. The RT first catalyzes the synthesis of a DNA strand complementary to viral RNA, then degrades the RNA strand of viral RNA-DNA hybrid and replaces it with DNA. The resulting duplex DNA often becomes incorporated into the genome of eukaryotic host cell. These integrated (and dormant) viral genes can be activated and transcribed and the gene products, viz viral proteins and the viral RNA genome itself are packaged as new viruses (Scheme 4).

RNA ↓

Formation of RNA-DNA hybrid ↓

Degradation of RNA strand ↓

Synthesis of second strand of DNA ↓

Integration into host genome ↓

Formation of viral prote ↓

Packaging of n

Integrase

RT (DNA dependent DNA Pol)

RT (RNaseH)

RT (RNA dependent RNA Pol)

Scheme 4: Flow diagram indicating role of re

Transcription and translation

ins and viral RNA

ew virus

verse transcriptase in retroviral cycle

71

Page 72: U. N. Dwivedi Smita Rastogi Department of Biotechnology ...nsdl.niscair.res.in/jspui/bitstream/123456789/582/1/Transcription.pdf · Smita Rastogi Department of Biotechnology, Integral

Like many DNA and RNA Pols, RT contains Zn++. Each RT is most active with its own virus, but each can be used experimentally to make DNA complementary to a variety of RNAs. The DNA synthesis and RNA degradation activities use separate active sites on the protein. The reverse transcriptases closely resemble the DNA directed DNA polymerases and RNA polymerases in that they make DNA in the 5’ → 3’ direction, utilize deoxyribonucleotides as precursors and require both a template and a primer strand, which must have a free 3’-OH terminus. RTs require RNA template for nucleic acid synthesis; however, they can also utilize DNA templates, but the latter are less effective than RNAs. The RTs are very active on natural RNA templates, including the very large RNAs present in the viral particles. The DNAs produced hybridize with their RNA templates. RTs, like RNA Pols, do not have 3’ → 5’ proof reading exonucleases. They generally have error rate of about 1 per 20000 nucleotides added. An error rate this high is extremely unusual in DNA replication and appears to be a feature of most enzymes that replicate the genomes of RNA viruses. A consequence is a high mutation rate and faster rate of viral evolution, which is a factor in the frequent appearance of new strains of disease causing retroviruses.

RT catalyzes three different reactions:

RNA dependent DNA synthesis or Reverse transcriptase activity RNA degradation or RNaseH activity DNA dependent DNA polymerase

Functions and applications: The function of RTs in normal cells is not understood; its

resence suggests that the transcription of messages from RNA into DNA is a normal process, for eg., in synthesis of multiple copies of certain genes. The recognition of RTs has thus opened some new avenues of research in biochemical genetics. RTs have become important reagents in the study of DNA-RNA relationships and in DNA cloning techniques. They make possible the synthesis of DNA complementary to an mRNA template and synthetic DNA prepared in this manner called complementary DNA (cDNA) can be used to clone cellular genes. Suggested Readings 1. Berg J.M., Tymoczko J.L., Stryer L., Biochemistry, International Edition, V Edition, W.H. Freeman & Co. New

York. 2. Watson J.D., Baker T.A., Bell S.P., Gann A., Levine M., Losick R., Molecular Biology of the Gene, Fifth

Edition, Pearson Education. 3. Lewin B., Genes VIII, International Edition, Pearson Education International. 4. Glick B.R., Pasternak J.J., Molecular Biotechnology Principles and Applications of Recombinant DNA, III

Edition, ASM Press. 5. Turner P.C., McLennan A.G., Bates A.D., White M.R.H., Instant notes Molecular Biology, II Edition. 6. Das H.K., Textbook of Biotechnology, Wiley Dreamtech. 7. Nelson D.L., Cox M.M., Lehninger Principles of Biochemistry, IV Edition, W.H. Freeman & Co., New York. 8. Voet D., Voet J.G., Biochemistry, John Wiley & Sons. 9. Twymann R.M., Advanced Molecular Biology, Viva Books Pvt. Ltd. 10. Brown T.A., Genomes 2, Wiley Liss Publ. 11. Metzler D.E., Biochemistry: The clinical reactions of living cells, II Edition, Volume 2, Elsevier Publ.

72