a zero-knowledge based introduction to biology · 2013. 1. 10. · translation gug"val...

Post on 22-Dec-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Zero-Knowledge Based Introduction to Biology

Jim Notwell

09 January 2013

Q: What is your genome?

A:

Q: What is your genome?

A:The sum of your hereditary information.

From DNA to Organism

You are composed of ~ 10 trillion cells

From DNA to Organism Cell

From DNA to Organism Cell Protein

Proteins do most of the work in biology

Central Dogma of Biology

DNA: “Blueprints” for a cell

•Genetic information encoded in long strings

•Deoxyribonucleic acid comes in four flavors: adenine, thymine, guanine, and cytosine

Phosphate-deoxyribose Backbone

O O

C C

CC

H

H

HHH

H

H

COP

O-

O

to next nucleotide

to previous nucleotide

to base

3’

5’

Nucleobase Complementary Pairing

Adenine (A)

Cytosine (C)

Guanine (G)

Thymine (T)

pyrimidines

purines

DNA Double Helix

DNA Packaging

Q: What is your genome?

A:The sum of your hereditary information.

Q: What is your genome?

A:The sum of your hereditary information. Humans bundle two copies of the genome into 46 chromosomes in every cell

Central Dogma of Biology

DNA vs RNA

RNA Nucleobases

Adenine (A)

Cytosine (C)

Guanine (G)

Uracil (U)

pyrimidines

purines

Gene Transcription

3’5’

5’3’

G A T T A C A . . .

C T A A T G T . . .

Gene Transcription

3’5’

5’3’

G A T T A C A . . .

C T A A T G T . . .

Gene Transcription

3’5’

5’3’

G A T T A C A . . .

C T A A T G T . . .

Strands are separated (DNA helicase)

Gene Transcription

3’5’

5’3’

G A T T A C

A . . .

C T A A T G T . . .

G A U U A C A

An RNA copy of the 5’→3’ sequence is created from the 3’→5’ template

Gene Transcription

3’5’

5’3’

G A U U A C A . . .

G A T T A C A . . .

C T A A T G T . . .

pre-mRNA 5’ 3’

RNA Processing

5’ cap poly(A) tail

intronexon

mRNA

5’ UTR 3’ UTR

Gene Structure

5’ 3’

promoter

5’ UTR exons 3’ UTR

introns

coding

non-coding

Central Dogma of Biology

From RNA to Protein•Proteins are long strings of amino acids joined by peptide bonds

•Translation from RNA sequence to amino acid sequence performed by ribosomes

•20 amino acids → 3 RNA letters required to specify a single amino acid

Amino AcidAlanine

Arginine

Asparagine

Aspartate

Cysteine

Glutamate

Glutamine

Glycine

Histidine

Isoleucine

Leucine

Lysine

Methionine

Phenylalanine

Proline

Serine

Threonine

Tryptophan

Tyrosine

Valine

C

O

N

H

C

H

H OH

R

There are 20 standard amino acids

Proteins

C

O

N

H

C

H

R

to previous aa to next aa

N-terminus

(start)

H OH

C-terminus

(end)from 5’ 3’ mRNA

Translation

The ribosome (a complex of protein and RNA) synthesizes a protein by reading the mRNA in triplets (codons). Each codon is translated to an amino acid.

Translation

GGGG"GlyGAG"GluGCG"AlaGUG"Val

A#GGA"GlyGAA"Glutamic#acid"(Glu)GCA"AlaGUA"Val

C#GGC"GlyGAC"AspGCC"AlaGUC"Val

U#GGU"Glycine"(Gly)GAU"Aspar4c#acid"(Asp)GCU"Alanine"(Ala)GUU"Valine"(Val)

G#

GAGG"Arg"AAG"LysACG"ThrAUG"Methionine"(Met)"or"START

A#AGA"Arginine"(Arg)AAA"Lysine"(Lys)ACA"Thr"AUA"Ile

C#AGC"Ser"AAC"AsnACC"ThrAUC"Ile

U#AGU"Serine"(Ser)AAU"Asparagine"(Asn)ACU"Threonine"(Thr)AUU"Isoleucine"(Ile)

A#

GCGG"Arg"CAG"GlnCCG"ProCUG"Leu

A#CGA"Arg"CAA"Glutamine"(Gln)CCA"ProCUA"Leu

C#CGC"Arg"CAC"HisCCC"ProCUC"Leu

U#CGU"Arginine"(Arg)CAU"His4dine"(His)CCU"Proline"(Pro)CUU"Leucine"(Leu)

C#

GUGG"Tryptophan"(Trp)UAG"STOPUCG"Ser"UUG"Leu

A#UGA"STOPUAA"STOPUCA"Ser"UUA"Leucine"(Leu)

C#UGC"CysUAC"TyrUCC"SerUUC"Phe

U#UGU"Cysteine"(Cys)UAU"Tyrosine"(Tyr)UCU"Serine"(Ser)UUU"Phenylalanine"(Phe)

U#

GACU

Translation

5’ . . . A U U A U G G C C U G G A C U U G A . . . 3’

Translation

5’ . . . A U U A U G G C C U G G A C U U G A . . . 3’

UTR Met

Start Codon

Ala Trp Thr

Stop Codon

Translation

Central Dogma of Biology

Protein coding

1%

Other Stuff 99%

Protein coding

1%

Non-coding exons

2%

Introns/ promoters/ polyA sites

37%

Intergenic transcribed

RNA 19% Regulatory

elements 9%

??? 32%

The ENCODE Project Consortium (2012) Nature 489:57-74

Non-coding RNAs•RNAs transcribed from DNA but not translated into protein

•Structural ncRNAs: Conserved secondary structure

•Involved in gene regulation

microRNA

Protein coding

1%

Non-coding exons

2%

Introns/ promoters/ polyA sites

37%

Intergenic transcribed

RNA 19% Regulatory

elements 9%

??? 32%

The ENCODE Project Consortium (2012) Nature 489:57-74

Different Cell Types

Subsets of the DNA sequence determine the identity and function of different cells

Gene Expression Regulation•When should each gene be expressed?

•Why? Every cell has same DNA but each cell expresses different proteins.

•Signal transduction: One signal converted to another: cascade has “master regulators” turning on many proteins, which in turn each turn on many proteins

Central Dogma of Biology

Transcription Regulation•Transcription factors link to binding sites

•Complex of transcription factors forms

•Complex assists or inhibits formation of the RNA polymerase machinery

Gene Transcription

3’5’

5’3’

G A T T A C A . . .

C T A A T G T . . .

Transcription Factor Binding Sites•Short, degenerate DNA sequences recognized by particular transcription factors

•For complex organisms, cooperative binding of multiple transcription factors required to initiate transcription

Binding Sequence Logo

Transcription Regulation

Transcription Factor A

TF A Binding Site

Gene B

Gene Regulatory RegionANRV285-GG07-02 ARI 8 August 2006 1:29

understand how different permutations of thesame regulatory elements alter gene expres-sion. An understanding of how the combina-torial organization of a promoter encodes reg-ulatory information first requires an overviewof the proteins that constitute the transcrip-tional machinery.

THE EUKARYOTICTRANSCRIPTIONALMACHINERYFactors involved in the accurate transcrip-tion of eukaryotic protein-coding genes byRNA polymerase II can be classified into threegroups: general (or basic) transcription fac-tors (GTFs), promoter-specific activator pro-teins (activators), and coactivators (Figure 2).GTFs are necessary and can be sufficient foraccurate transcription initiation in vitro (re-viewed in 141). Such factors include RNApolymerase II itself and a variety of auxil-iary components, including TFIIA, TFIIB,TFIID, TFIIE, TFIIF, and TFIIH. In addi-tion to these “classic” GTFs, it is apparent thatin vivo transcription also requires Mediator,a highly conserved, large multisubunit com-plex that was originally identified in yeast (re-viewed in 38, 119).

GTFs assemble on the core promoter inan ordered fashion to form a transcriptionpreinitiation complex (PIC), which directsRNA polymerase II to the transcription startsite (TSS). The first step in PIC assemblyis binding of TFIID, a multisubunit com-plex consisting of TATA-box-binding pro-tein (TBP) and a set of tightly bound TBP-associated factors (TAFs). Transcription thenproceeds through a series of steps, includingpromoter melting, clearance, and escape, be-fore a fully functional RNA polymerase IIelongation complex is formed. The currentmodel of transcription regulation views thisas a cycle, in which complete PIC assembly isstimulated only once. After RNA polymeraseII escapes from the promoter, a scaffold struc-ture, composed of TFIID, TFIIE, TFIIH,and Mediator, remains on the core promoter

Distal regulatory elements

Proximalpromoterelements

Promoter ( 1 kb)

Corepromoter

EnhancerSilencer

Locus controlregion Insulator

Figure 1Schematic of a typical gene regulatory region. The promoter, which iscomposed of a core promoter and proximal promoter elements, typicallyspans less than 1 kb pairs. Distal (upstream) regulatory elements, which caninclude enhancers, silencers, insulators, and locus control regions, can belocated up to 1 Mb pairs from the promoter. These distal elements maycontact the core promoter or proximal promoter through a mechanism thatinvolves looping out the intervening DNA.

Generaltranscription factor(GTF): a factor thatassembles on thecore promoter toform a preinitiationcomplex and isrequired fortranscription of all(or almost all) genes

Coactivators:adaptor proteins thattypically lackintrinsicsequence-specificDNA binding butprovide a linkbetween activatorsand the generaltranscriptionalmachinery

PIC: preinitiationcomplex

TSS: transcriptionstart site

(73); subsequent reinitiation of transcriptionthen only requires rerecruitment of RNApolymerase II-TFIIF and TFIIB.

The assembly of a PIC on the core pro-moter is sufficient to direct only low levels ofaccurately initiated transcription from DNAtemplates in vitro, a process generally referredto as basal transcription. Transcriptional ac-tivity is greatly stimulated by a second classof factors, termed activators. In general, ac-tivators are sequence-specific DNA-bindingproteins whose recognition sites are usuallypresent in sequences upstream of the corepromoter (reviewed in 149). Many classes ofactivators, discriminated by different DNA-binding domains, have been described, eachassociating with their own class of specificDNA sequences. Examples of activator fam-ilies include those containing a cysteine-rich zinc finger, homeobox, helix-loop-helix(HLH), basic leucine zipper (bZIP), fork-head, ETS, or Pit-Oct-Unc (POU) DNA-binding domain (reviewed in 142). In additionto a sequence-specific DNA-binding domain,a typical activator also contains a separableactivation domain that is required for the ac-tivator to stimulate transcription (149). An

www.annualreviews.org • Transcriptional Regulatory Elements 31

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

6.7:

29-5

9. D

ownl

oade

d fr

om a

rjour

nals

.ann

ualre

view

s.org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

04/0

3/07

. For

per

sona

l use

onl

y.

Gene Regulatory Region

ANRV285-GG07-02 ARI 8 August 2006 1:29

TBP:TATA-box-bindingprotein

TAF:TBP-associatedfactor

TFBS: transcriptionfactor-binding site

PIC

TFIIDTFIIA

TFIIB

TFIIF

TFIIH

RNApolymerase II

TFIIE

?

?

?

Activator

Mediator

DBD

AD

Corepromoter

TATA TSS

Co-activator

Figure 2The eukaryotic transcriptional machinery. Factors involved in eukaryotictranscription by RNA polymerase II can be classified into three groups:general transcription factors (GTFs), activators, and coactivators. GTFs,which include RNA polymerase II itself and TFIIA, TFIIB, TFIID,TFIIE, TFIIF, and TFIIH, assemble on the core promoter in an orderedfashion to form a preinitiation complex (PIC), which directs RNApolymerase II to the transcription start site (TSS). Transcriptional activityis greatly stimulated by activators, which bind to upstream regulatoryelements and work, at least in part, by stimulating PIC formation througha mechanism thought to involve direct interactions with one or morecomponents of the transcriptional machinery. Activators consist of aDNA-binding domain (DBD) and a separable activation domain (AD)that is required for the activator to stimulate transcription. The directtargets of activators are largely unknown.

extensive discussion of the properties of acti-vators is beyond the scope of this review; read-ers are referred to several excellent reviews onthe subject (87 and references therein).

The DNA-binding sites for activators[also called transcription factor-binding sites(TFBSs)] are generally small, in the rangeof 6–12 bp, although binding specificity isusually dictated by no more than 4–6 po-sitions within the site. The TFBSs for a

specific activator are typically degenerate,and are therefore described by a consen-sus sequence in which certain positions arerelatively constrained and others are morevariable. Many activators form heterodimersand/or homodimers, and thus their bindingsites are generally composed of two half-sites.Notably, the precise subunit composition ofan activator can also dictate its binding speci-ficity and regulatory action (37).

Although an activator can bind to a widevariety of sequence variants that conform tothe consensus, in certain instances the precisesequence of a TFBS can impact the regulatoryoutput. For example, TFBS sequence vari-ations can affect activator binding strength(reviewed in 30), which may be biologicallyimportant in situations such as in early devel-opment, in which activators are distributed ina concentration gradient (84, 144). TFBS se-quence variations may also direct a preferencefor certain dimerization partners over others(37, 124, 142). Finally, the particular sequenceof a TFBS can affect the structure of a boundactivator in a way that alters its activity (69,104, 108, 154, 163). The best-studied exam-ples are nuclear hormone receptors, a largeclass of ligand-dependent activators. Variousstudies have shown that the relative orienta-tion of the half-sites, as well as the spacing be-tween them, play a major role in directing theregulatory action of the bound nuclear hor-mone receptor dimer (37).

Activators work, at least in part, by in-creasing PIC formation through a mechanismthought to involve direct interactions withone or more components of the transcrip-tional machinery, termed the “target” (141,149). Activators may also act by promoting astep in the transcription process subsequent toPIC assembly, such as initiation, elongation,or reinitiation (103). Finally, activators havealso been proposed to function by recruit-ing activities that modify chromatin structure(47, 106). Chromatin often poses a barrierto transcription because it prevents the tran-scriptional machinery from interacting di-rectly with promoter DNA, and thus can be

32 Maston · Evans · Green

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

6.7:

29-5

9. D

ownl

oade

d fr

om a

rjour

nals

.ann

ualre

view

s.org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

04/0

3/07

. For

per

sona

l use

onl

y.

Protein coding

1%

Non-coding exons

2%

Introns/ promoters/ polyA sites

37%

Intergenic transcribed

RNA 19% Regulatory

elements 9%

??? 32%

The ENCODE Project Consortium (2012) Nature 489:57-74

Q: What if the transcription/translation machinery makes mistakes?

Q:What is the effect in coding regions?

Evolution = Mutation + Selection

Structural AbnormalitiesB. Structural Abnormalities Normal

Insertion

Reciprocal Translocation

Duplication

Deletion

Inversion

Single Nucleotide ChangesII. Single Nucleotide Changes

A A A A T A C G T G C A U U U U A U G C A C G U

Phe Tyr Ala Arg

DNA

mRNA

Protein

Normal

A A G A T A C G T G C A U U C U A U G C A C G U

Phe Tyr Ala Arg

DNA

mRNA

Protein

Silent Mutation

A A A A T A C C T G C A U U U U A U G G A C G U

Phe Tyr Gly Arg

DNA

mRNA

Protein

Missense Mutation

A A A A T T C G T G C A U U U U A A G C A C G U

Phe

DNA

mRNA

Protein

Nonsense Mutation

STOP

Single Nucleotide ChangesII. Single Nucleotide Changes

A A A A T A C G T G C A U U U U A U G C A C G U

Phe Tyr Ala Arg

DNA

mRNA

Protein

Normal

A A A T A T A C G T G C U U U A U A U G C A C G

Phe Ile Cys Thr

DNA

mRNA

Protein

Frameshift (Insertion) A A A A A C C T G C A U U U U U G G A C G U

Phe Leu His Val

DNA

mRNA

Protein

Frameshift (Deletion)

T

Evolution = Mutation + Selection

Selection

time

Harmful mutation Beneficial mutation

Evolution = Mutation + Selection

Summary

Evolution = Mutation + Selection

Summary•All hereditary information encoded in double-stranded DNA

•Each cell in an organism has same DNA

•DNA → RNA → protein

•Proteins have many diverse roles in cell

•Gene regulation diversifies protein products within different cells

Further Reading•See website: cs173.stanford.edu

top related