mw  11:00-12:15 in beckman b302 prof: gill bejerano tas: jim notwell & harendra guturu

42
http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 6: NON protein coding genes

Upload: gada

Post on 23-Feb-2016

24 views

Category:

Documents


1 download

DESCRIPTION

CS173. Lecture 6 : NON protein coding genes. MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu. Announcements. HW1 due in one week. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 1

MW  11:00-12:15 in Beckman B302Prof: Gill BejeranoTAs: Jim Notwell & Harendra Guturu

CS173

Lecture 6: NON protein coding genes

Page 2: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 2

Announcements• HW1 due in one week.

Page 3: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 3

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG

Page 4: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 4

“non coding” RNAs (ncRNA)

Page 5: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

5

Central Dogma of Biology:

Page 6: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

6

Active forms of “non coding” RNA

reverse transcription

long non-coding RNA

microRNA

rRNA, snRNA, snoRNA

Page 7: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

7

What is ncRNA?• Non-coding RNA (ncRNA) is an RNA that functions without being

translated to a protein.• Known roles for ncRNAs:

– RNA catalyzes excision/ligation in introns.– RNA catalyzes the maturation of tRNA.– RNA catalyzes peptide bond formation.– RNA is a required subunit in telomerase.– RNA plays roles in immunity and development (RNAi).– RNA plays a role in dosage compensation.– RNA plays a role in carbon storage.– RNA is a major subunit in the SRP, which is important in protein trafficking.– RNA guides RNA modification.

– RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule.

Page 8: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 8

“non coding” RNAs (ncRNA)

Small structural RNAs (ssRNA)

Page 9: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

9

AAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUCA

ssRNA Folds into Secondary and 3D Structures

P 6b

P 6a

P 6

P 4

P 5P 5a

P 5b

P 5c

120

140

160

180

200

220

240

260

AAU

UGCGGG

AAA

GGGGUCA

ACAGCCG UUCAGU

ACCA

AGUCUCAGGGGAAA

CUUUGAGAU

GGCCUUGCA A A G G G U A U

GGUAA

UA AGC

UGACGGACA

UGGUCC

UAA

CCA CGCA

GC

CAA

GUCC

UAA G

UCAACAG

AU C U

UCUGUUGAU

AU

GGAU

GC

AGU

UC A

Cate, et al. (Cech & Doudna).(1996) Science 273:1678.

Waring & Davies. (1984) Gene 28: 277.

We would like to predict them from sequence.

Page 10: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

For example, tRNA

Page 11: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

tRNA Activity

Page 12: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

ssRNA structure rules• Canonical basepairs:

– Watson-Crick basepairs:• G – C• A – U

– Wobble basepair:• G - U

• Stacks: continuous nested basepairs. (energetically favorable)

• Non-basepaired loops:– Hairpin loop– Bulge– Internal loop– Multiloop

• Pseudo-knots

Page 13: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Ab initio RNA structure prediction: lots of Dynamic Programming

• Objective: Maximizing the number of base pairs (Nussinov et al, 1978)

simple model:(i, j) = 1 if allowedfancier model:GC > AU > GU

Page 14: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Pseudoknots drastically increase computational complexity

Page 15: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs273a.stanford.edu [Bejerano Fall11/12] 15

Objective: Minimize Secondary StructureFree Energy at 37 OC:

C G U U U G G GUU

CACAAACG

-2 .0

-2 .1

-0 .9

-0 .9

-1 .8

-1 .6

+ 5 .0

G helix = GC GG C

+ G

G UC A

+ 2G

U UA A

+ G

U GA C

=

-2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol

Ghairpin loop = G

initiation (6 nucleotides) + Gmismatch

G GC A

=

5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol

Gtotal = G

hairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol

Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287.

Instead of (i, j), measure and sum energies:

Page 16: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Zuker’s algorithm MFOLD: computing loop dependent energies

Page 17: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Bafna 1

RNA structure

• Base-pairing defines a secondary structure. The base-pairing is usually non-crossing.

Page 18: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

S aSu

S cSg

S gSc

S uSa

S a

S c

S g

S u

S SS

1. A CFG

S aSu

acSgu

accSggu

accuSaggu

accuSSaggu

accugScSaggu

accuggSccSaggu

accuggaccSaggu

accuggacccSgaggu

accuggacccuSagaggu

accuggacccuuagaggu

2. A derivation of “accuggacccuuagaggu”3. Corresponding structure

Stochastic context-free grammar

Page 19: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Cool algorithmics. Unfortunately…

– Random DNA (with high GC content) often folds into low-energy structures.

– We will mention powerful newer methods later on.

Page 20: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

ssRNA transcription• ssRNAs like tRNAs are usually encoded by short

“non coding” genes, that transcribe independently.• Found in both the UCSC “known genes” track, and

as a subtrack of the RepeatMasker track

http://cs173.stanford.edu [BejeranoWinter12/13] 20

Page 21: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 21

“non coding” RNAs (ncRNA)

microRNAs (miRNA/miR)

Page 22: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 22

MicroRNA (miR)

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Page 23: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 23

MicroRNA Transcription

mRNA

Page 24: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 24

MicroRNA Transcription

mRNA

Page 25: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 25

MicroRNA (miR)

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Computational challenges:Predict miRs.Predict miR targets.

Page 26: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 26

MicroRNA Therapeutics

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Idea: bolster/inhibit miR production tobroadly modulate protein productionHope: “right” the good guys and/or “wrong” the bad guysChallenge: and not vice versa.

Page 27: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 27

Other Non Coding Transcripts

Page 28: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

lncRNAs (long non coding RNAs)

http://cs173.stanford.edu [BejeranoWinter12/13] 28

Don’t seem to fold into clear structures (or only a sub-region does).Diverse roles only now starting to be understood.

Hard to detect or predict function computationally (currently)

Page 29: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 29

lncRNAs come in many flavors

Page 30: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

X chromosome inactivation in mammals

X X X Y

X

Dosage compensation

Page 31: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Xist – X inactive-specific transcript

Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67

Page 32: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 32

Transcripts, transcripts everywhere

Human Genome

Transcribed* (Tx)

Tx from both strands*

* True size of set unknown

Page 33: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Or are they?

http://cs173.stanford.edu [BejeranoWinter12/13] 33

Page 34: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 34

The million dollar question

Human Genome

Transcribed* (Tx)

Tx from both strands*

* True size of set unknown

Leaky tx?

Functional?

Page 35: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Coding and non-coding gene production

http://cs173.stanford.edu [BejeranoWinter12/13] 35

The cell is constantly making new proteins and ncRNAs.

These perform their function for a while,

And are then degraded.

Newly made coding and non coding gene products take their place.

The picture within a cell is constantly “refreshing”.

To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.

Page 36: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Cell differentiation

http://cs173.stanford.edu [BejeranoWinter12/13] 36

To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.

That is exactly what happens when cells differentiate during development from stem cells to their different final fates.

Page 37: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Human manipulation of cell fate

http://cs173.stanford.edu [BejeranoWinter12/13] 37

To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.

We have learned (in a dish) to:1 control differentiation2 reverse differentiation3 hop between different states

Page 38: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Cell replacement therapies

http://cs173.stanford.edu [BejeranoWinter12/13] 38

We want to use this knowledge to provide a patient with healthy self cells of a needed type.

We have learned (in a dish) to:1 control differentiation2 reverse differentiation3 hop between different states

(iPS = induced pluripotent stem cells)

Page 39: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

How does this happen?

http://cs173.stanford.edu [BejeranoWinter12/13] 39

Different cells in our body hold copies of (essentially) the same genome.

Yet they express very different repertoires of proteins and non-coding RNAs.

How do cells do it?

A: like they do everything else: using their proteins & ncRNAs…

Page 40: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

Gene Regulation

http://cs173.stanford.edu [BejeranoWinter12/13] 40

Some proteins and non coding RNAs go “back” to bind DNA near genes, turning these genes on and off.

GeneDNA

Proteins

To be continued…

Page 41: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

ReviewLecture 6• Central dogma recap

–Genes, proteins and non coding RNAs• RNA world hypothesis• Small structural RNAs

–Sequence, structure, function–Structure prediction–Transcription mode

• MicroRNAs–Functions–Modes of transcription

• lncRNAs–Xist

• Genome wide (and context wide) transcription–How much?–To what goals?

• Gene transcription and cell identity–Cell differentiation–Human manipulation of cell fates

• Gene regulation control

http://cs173.stanford.edu [BejeranoWinter12/13] 41

Page 42: MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim  Notwell  & Harendra  Guturu

http://cs173.stanford.edu [BejeranoWinter12/13] 42

(On Mondays) ask students to stack the chairs without wheels at the back of the room at the end of class.