kyc/teaching/files/543-05/2010 project/8b.doc  · web viewthe completed assembly of transcription...

16
Chapter 8 part 2 Transcription Outline 1. Transcription initiation a. Transcription factors b. Cis-regulatory elements i. TATA box ii. TBP- TATA binding protein iii. TFIIA-TBP-DNA ternary complex iv. GC & CAAT box c. Trans-activator-DNA binding d. DNA binding domains 2. RNA posttranscriptional processing a. 5’ capping b. Poly-A Tail c. t-RNA processing 3. Splicing a. Overview b. Mechanism i. Spliceosome ii. Branch point iii. Catalytic center c. Alternative splicing d. Self-Splicing 1. Transcription initiation In bacteria, a domain of prokaryotes, transcription begins with the binding of RNA polymerase to the promoter in DNA. Transcription initiation is more complex in eukaryotes. Eukaryotic RNA polymerase does not directly recognize the core promoter sequences. Instead, a collection of proteins called transcription factors mediate the binding of RNA polymerase and the initiation of transcription. Only after certain transcription factors are attached to the promoter does the RNA polymerase bind to it. The completed assembly of transcription factors and RNA polymerase bind to the promoter, forming a transcription initiation complex. Transcription in the archaea domain is similar to transcription in eukaryotes

Upload: truongnhi

Post on 16-Dec-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

Chapter 8 part 2 TranscriptionOutline

1. Transcription initiationa. Transcription factorsb. Cis-regulatory elements

i. TATA boxii. TBP- TATA binding protein iii. TFIIA-TBP-DNA ternary complexiv. GC & CAAT box

c. Trans-activator-DNA bindingd. DNA binding domains

2. RNA posttranscriptional processinga. 5’ cappingb. Poly-A Tailc. t-RNA processing

3. Splicinga. Overviewb. Mechanism

i. Spliceosomeii. Branch pointiii. Catalytic center

c. Alternative splicing d. Self-Splicing

1. Transcription initiation

In bacteria, a domain of prokaryotes, transcription begins with the binding of RNA polymerase to the promoter in DNA. Transcription initiation is more complex in eukaryotes. Eukaryotic RNA polymerase does not directly recognize the core promoter sequences. Instead, a collection of proteins called transcription factors mediate the binding of RNA polymerase and the initiation of transcription. Only after certain transcription factors are attached to the promoter does the RNA polymerase bind to it. The completed assembly of transcription factors and RNA polymerase bind to the promoter, forming a transcription initiation complex. Transcription in the archaea domain is similar to transcription in eukaryotes

1.a Transcription Factors

In the field of molecular biology, a transcription factor (sometimes called a sequence-specific DNA binding factor) is a protein that binds to specific DNA sequences and thereby controls the transfer (or transcription) of genetic information from DNA to mRNA. Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes. A defining feature of

Page 2: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

transcription factors is that they contain one or more DNA-binding domains (DBDs), which attach to specific sequences of DNA adjacent to the genes that they regulate.

Note of interest: There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors. Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins.

In eukaryotes, an important class of transcription factors called general transcription factors (GTFs) are necessary for transcription to occur. Many of these GTFs don't actually bind DNA but are part of the large transcription preinitiation complex that interacts with RNA polymerase directly. The most common GTFs are TFIIA, TFIIB, TFIID (see also TATA binding protein), TFIIE, TFIIF, and TFIIH. The preinitiation complex binds to promoter regions of DNA upstream to the gene that they regulate.

1.b Cis-regulatory elements

A cis-regulatory element or cis-element is a region of DNA or RNA that regulates the expression of genes located on that same molecule of DNA. This term is constructed from the Latin word cis, which means "on the same side as". These cis-regulatory elements are often binding sites for one or more trans-acting factors. A cis-element may be located 5' to the coding sequence of the gene it controls (in the promoter region or further upstream), in an intron, or 3' to the gene's coding sequence, either in the untranslated or untranscribed region. In diagram 1.b.1 we see the general pattern for cis-acting control elements.

Diagram 1.b.1

1.b.i TATA Box

The TATA box (also called Goldberg-Hogness box) is a DNA sequence (cis-regulatory element) found in the promoter region of genes in archaea and eukaryotes; approximately 24% of human genes contain a TATA box within the core promoter. The TATA box has the core DNA sequence 5'-TATAAA-3' or a variant, which is usually followed by three or more adenine bases. It is usually located 25 base pairs upstream of the transcription site. A figure of the base frequency related to the TATA box can be seen in figure 1.b.2

Page 3: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

Figure 1.b.2

1.b.ii TBP (TATA binding protein)

It is normally bound by the TATA binding protein (TBP) in the process of transcription, which unwinds the DNA, and bends it through 80°. The AT-rich sequence facilitates easy unwinding (due to weaker base-stacking interactions among A and T than G and C). When TBP binds to a TATA box within the DNA, it distorts the DNA by inserting amino acid side chains between base pairs, partially unwinding the helix, and doubly kinking it. The distortion is accomplished through a great amount of surface contact between the protein and DNA.(Shown in figure 1.b.3) TBP binds with the negatively charged phosphates in the DNA backbone through positively charged lysine and arginine amino acid residues. The sharp bend in the DNA is produced through projection of four bulky phenylalanine residues into the minor groove. The TBP is an unusual protein in that it binds to the minor groove and binds with a β sheet.

Separation of the two strands exposes the bases and allows RNA polymerase II to begin transcription of the gene.

Figure 1.b.3

Page 4: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

1.b.iii TFllA- TBP-TATA

TFIIA interacts with the TBP subunit of TFIID and aids in the binding of TBP to TATA-box containing promoter DNA. Although TFIIA does not recognize DNA itself, its interactions with TBP allow it to stabilize and facilitate formation of the pre-intiation complex (PIC). Binding of TFIIA to TBP also results in the exclusion of negative (repressive) factors that might otherwise bind to TBP and interfere with PIC formation. TFIIA also acts as a coactivator for some transcriptional activators, assisting with their ability to increase, or activate, transcription. This complex can be seen in figure 1.b.4.

Figure 1.b.4

1.b.iv CCAAT & GC Box

In molecular biology, a CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides with GGNCAATCT consensus sequence that occur upstream by 75-80 bases to the initial transcription site. The CAAT box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. It is an invariant DNA sequence at about minus 70 base pairs from the origin of transcription in many eukaryotic promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. CAAT and GC are primarily located in the region from 100-150bp upstream from the TATA box. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors.

Page 5: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

1.c Trans-activator-DNA binding

Trans-activating domain (TAD) contain binding sites for other proteins such as transcription coregulators. These binding sites are frequently referred to as activation functions (AFs). Trans-activating domains (TADs) are named after their amino acid composition. These amino acids are either essential for the activity or simply the most abundant in the TAD. Transactivation by the Gal4 transcription factor is mediated by acidic amino acids whereas hydrophobic residues in Gcn4 play a similar role. Hence the TADs in Gal4 and Gcn4 are referred to as acidic or hydrophobic activation domains respectively.

The proteins that attach to these domains are DNA-binding proteins and thus have a specific or general affinity for either single or double stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair.

1.d Various types of DNA-binding domains

Helix-turn-helix - In proteins, the helix-turn-helix (HTH) is a major structural motif capable of binding DNA. It is composed of two α helices joined by a short strand of amino acids and is found in many proteins that regulate gene expression. It should not to be confused with the helix-loop-helix domain.

Helix-loop-helix - This domain is found in some transcription factors and is characterized by two α helices connected by a loop. One helix is typically smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA binding regions. This motif can be seen in the image on the right.

Zinc Finger - Zinc fingers are small protein structural motifs that can coordinate one or more zinc ions to help stabilize their folds. They can be classified into several different structural families and typically function as interaction modules that bind DNA, RNA, proteins or small molecules.

Leucine Zipper- The basic leucine zipper (bZIP) domain contains an alpha helix with a leucine at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins. When binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves.It regulates gene expression.

Page 6: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

2. Post-transcriptional modification

Post-transcriptional modification is a process by which primary transcript RNA is converted into mature RNA. A notable example is the conversion of precursor messenger RNA into mature messenger RNA (mRNA), which includes splicing and occurs prior to protein synthesis. This process is vital for the correct translation of the genomes of eukaryotes as the human primary RNA transcript that is produced as a result of transcription contains both exons, which are coding sections of the primary RNA transcript and introns, which are the non coding sections of the primary RNA transcript.

2.a 5’ capping

The process of 5' capping is vital to creating mature messenger RNA, which is then able to undergo translation. Capping ensures the messenger RNA's stability while it undergoes translation in the process of protein synthesis, and is a highly regulated process that occurs in the cell nucleus.

The 5' cap is found on the 5' end of an mRNA molecule and consists of a guanine nucleotide connected to the mRNA via an unusual 5' to 5' triphosphate linkage. This guanosine is methylated on the 7 position directly after capping in vitro by a methyl transferase. It is referred to as a 7-methylguanosine cap, abbreviated m7G. Further modifications include the possible methylation of the 2' hydroxy-groups of the first 3 ribose sugars of the 5' end of the mRNA. The methylation of both 2' hydroxy-groups is shown on the diagram. The 5' cap looks like the 3' end of an RNA molecule (the 5' carbon of the cap ribose is bonded, and the 3' unbonded). This provides significant resistance to 5' exonucleases.

Page 7: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

To sum up: The 5' cap has 4 main functions:

1. Regulation of nuclear export.2. Prevention of degradation by exonucleases

3. Promotion of translation

4. Promotion of 5' proximal intron excision.

2.b Poly-A Tail

Polyadenylation is the addition of a poly(A) tail to an RNA molecule. The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA which only has Adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature messenger RNA (mRNA) for translation. It therefore forms part of the larger process of gene expression.

The process of polyadenylation begins as the transcription of a gene finishes. The 3'-most segment of the newly-made RNA is first cleaved off by a set of proteins; these proteins then synthesise the poly(A) tail at the RNA's 3' end. In some genes these proteins may add a poly(A) tail at any one of several possible sites, polyadenylation can therefore produce more than one transcript from a single gene, similar to alternative splicing.

Figure 2.b.1

The poly(A) tail is important for the nuclear export, translation and stability of mRNA. The tail is shortened over time and when it is short enough, the mRNA is enzymatically degraded.[2]

Page 8: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

However, in a few cell types, mRNAs with short poly(A) tails are stored for later activation by re-polyadenylation in the cytosol.

2.c t-RNA processing

Protein involved in the processing of the primary tRNA transcript to yield a functional tRNA. Transcription of tRNA genes results in a large precursor molecule which may even contain sequences for several tRNA molecules. This primary transcript is subsequently processed by cleavage and by modification of the appropriate bases.

Figure 2.c.1

Pre-tRNAs undergo extensive modifications inside the nucleus. Some pre-tRNAs contain introns; in bacteria these self-splice, whereas in eukaryotes and archaea they are removed by tRNA splicing endonuclease. The 5' sequence is removed by RNase P whereas the 3' end is removed by the tRNase Z enzyme. The non-templated 3' CCA tail is added by a nucleotidyl transferase. Before tRNAs are exported into the cytoplasm, tRNAs are

Page 9: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

aminoacylated. There is also a base modification in tRNA that aids in increasing diversity in structure and hence functional versatility.

3. Splicing

RNA splicing is the process by which introns, regions of RNA that do not code for protein, are removed from the pre-mRNA and the remaining exons connected to re-form a single continuous molecule. Although most RNA splicing occurs after the complete synthesis and end-capping of the pre-mRNA, transcripts with many exons can be spliced co-transcriptionally. The splicing reaction is catalyzed by a large protein complex called the spliceosome assembled from proteins and small nuclear RNA molecules that recognize splice sites in the pre-mRNA sequence. Many pre-mRNAs, including those encoding antibodies, can be spliced in multiple ways to produce different mature mRNAs that encode different protein sequences. This process is known as alternative splicing, and allows production of a large variety of proteins from a limited amount of DNA.

3.a Splicing Mechanism

Within the intron, a 3' splice site, 5' splice site, and branch site are required for splicing. The 5' splice site or splice donor site includes an almost invariant sequence GU at the 5' end of the intron, within a larger, less highly conserved consensus region. The 3' splice site or splice acceptor site terminates the intron with an almost invariant AG sequence. Upstream (5'-ward) from the AG there is a region high in pyrimidines (C and U), or polypyrimidine tract. Upstream from the polypyrimidine tract is the branch point, which includes an adenine nucleotide. The reaction is catalyzed by a Spliceosome (A spliceosome is a complex of specialized RNA and protein subunits that removes introns from a transcribed pre-mRNA (hnRNA) segment.

Spliceosomal and self-splicing transesterification reactions occur via two sequential transesterification reactions. First, the 2'OH of a specific branch-point nucleotide within the intron that is defined during spliceosome assembly performs a nucleophilic attack on the first nucleotide of the intron at the 5' splice site forming the lariat intermediate. Second, the 3'OH of the released 5' exon then performs a

Page 10: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

nucleophilic attack at the last nucleotide of the intron at the 3' splice site thus joining the exons and releasing the intron lariat.

3.b Spliceosome

Splicing is catalyzed by the spliceosome which is a large RNA-protein complex composed of five small nuclear ribonucleoproteins (snRNPs, pronounced 'snurps' ). The RNA components of snRNPs interact with the intron and may be involved in catalysis. Two types of spliceosomes have been identified (the major and minor) which contain different snRNPs.

3.b.i Major spliceosome

The major spliceosome splices introns containing GU at the 5' splice site and AG at the 3' splice site. It is composed of the U1, U2, U4, U5, and U6 snRPS and is active in the nucleus. In addition, a number of proteins including U2AF and SF1 are required for the assembly of the Spliceosome. U1 binds the 5' splice site and U2 at branch point, U4-U5-U6 complex joins.

Page 11: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

This type of splicing is termed canonical splicing or termed the lariat pathway, which accounts for more than 99% of splicing. By contrast, when the intronic flanking sequences do not follow the GU-AG rule, noncanonical splicing is said to occur (a.k.a. minor Spliceosome).

3.b.ii Minor Spliceosome

The minor Spliceosome is very similar to the major spliceosome, however it splices out rare introns with different splice site sequences. While the minor and major spliceosomes contain the same U5 snRNPs, the minor spliceosome has different, but functionally analogous snRNPs for U1, U2, U4, and U6, which are respectively called UOO, U11, U12 , U4atac, and U6at.Like the major spliceosome, it is only found in the nucleus.

3.b.iii Catalytic center

3.c Alternative splicing

Alternative splicing is a process by which the exons of the RNA produced by transcription of a gene (a primary gene transcript or pre-mRNA) are reconnected in multiple ways during RNA splicing. The resulting different mRNAs may be translated into different protein isoforms; thus, a single gene may code for multiple proteins.

The production of alternatively spliced mRNAs is regulated by a system of trans-acting proteins that bind to cis-acting sites on the pre-mRNA itself. Such proteins include splicing activators that promote the usage of a particular splice site, and splicing repressors that reduce the usage of a particular site. Mechanisms of alternative splicing are highly variable, and new examples are constantly being found, particularly through the use of

Page 12: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

high-throughput techniques. Researchers hope to fully elucidate the regulatory systems involved in splicing, so that alternative splicing products from a given gene under particular conditions could be predicted by a "splicing code"

3.d Self-Splicing

Self-splicing occurs for rare introns that form a ribosome, performing the functions of the spliceosome by RNA alone. There are three kinds of self-splicing introns, group I, group II, and group III. Group I and II introns perform splicing similar to the spliceosome without requiring any protein. This similarity suggests that Group I and II introns may be evolutionarily related to the SpliceosomeAlthough the two splicing mechanisms described below do not require any proteins to occur, 5 additional RNA molecules and over 50 proteins are used and hydrolyzes many ATP molecules.

Two transesterifications characterize the mechanism in which group I introns are spliced:

1. 3'OH of a free guanine nucleoside (or one located in the intron) or a nucleotide cofactor (GMP, GDP, GTP) attacks phosphate at the 5' splice site.

2. 3'OH of the 5'exon becomes a nucleophile and the second transesterification results in the joining of the two exons.

The mechanism in which group II introns are spliced (two transesterification reaction like group I introns) is as follows:

1. The 2'OH of a specific adenosine in the intron attacks the 5' splice site, thereby forming the lariat

Page 13: kyc/Teaching/Files/543-05/2010 project/8b.doc  · Web viewThe completed assembly of transcription factors and RNA polymerase bind to the promoter, ... This term is constructed from

2. The 3'OH of the 5' exon triggers the second transesterification at the 3' splice site thereby joining the exons together.