mw 11:00-12:15 in beckman b302 prof: gill bejerano tas: jim notwell & harendra guturu
DESCRIPTION
CS173. Lecture 6 : NON protein coding genes. MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu. Announcements. HW1 due in one week. - PowerPoint PPT PresentationTRANSCRIPT
http://cs173.stanford.edu [BejeranoWinter12/13] 1
MW 11:00-12:15 in Beckman B302Prof: Gill BejeranoTAs: Jim Notwell & Harendra Guturu
CS173
Lecture 6: NON protein coding genes
http://cs173.stanford.edu [BejeranoWinter12/13] 2
Announcements• HW1 due in one week.
http://cs173.stanford.edu [BejeranoWinter12/13] 3
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
http://cs173.stanford.edu [BejeranoWinter12/13] 4
“non coding” RNAs (ncRNA)
5
Central Dogma of Biology:
6
Active forms of “non coding” RNA
reverse transcription
long non-coding RNA
microRNA
rRNA, snRNA, snoRNA
7
What is ncRNA?• Non-coding RNA (ncRNA) is an RNA that functions without being
translated to a protein.• Known roles for ncRNAs:
– RNA catalyzes excision/ligation in introns.– RNA catalyzes the maturation of tRNA.– RNA catalyzes peptide bond formation.– RNA is a required subunit in telomerase.– RNA plays roles in immunity and development (RNAi).– RNA plays a role in dosage compensation.– RNA plays a role in carbon storage.– RNA is a major subunit in the SRP, which is important in protein trafficking.– RNA guides RNA modification.
– RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule.
http://cs173.stanford.edu [BejeranoWinter12/13] 8
“non coding” RNAs (ncRNA)
Small structural RNAs (ssRNA)
9
AAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUCA
ssRNA Folds into Secondary and 3D Structures
P 6b
P 6a
P 6
P 4
P 5P 5a
P 5b
P 5c
120
140
160
180
200
220
240
260
AAU
UGCGGG
AAA
GGGGUCA
ACAGCCG UUCAGU
ACCA
AGUCUCAGGGGAAA
CUUUGAGAU
GGCCUUGCA A A G G G U A U
GGUAA
UA AGC
UGACGGACA
UGGUCC
UAA
CCA CGCA
GC
CAA
GUCC
UAA G
UCAACAG
AU C U
UCUGUUGAU
AU
GGAU
GC
AGU
UC A
Cate, et al. (Cech & Doudna).(1996) Science 273:1678.
Waring & Davies. (1984) Gene 28: 277.
We would like to predict them from sequence.
For example, tRNA
tRNA Activity
ssRNA structure rules• Canonical basepairs:
– Watson-Crick basepairs:• G – C• A – U
– Wobble basepair:• G - U
• Stacks: continuous nested basepairs. (energetically favorable)
• Non-basepaired loops:– Hairpin loop– Bulge– Internal loop– Multiloop
• Pseudo-knots
Ab initio RNA structure prediction: lots of Dynamic Programming
• Objective: Maximizing the number of base pairs (Nussinov et al, 1978)
simple model:(i, j) = 1 if allowedfancier model:GC > AU > GU
Pseudoknots drastically increase computational complexity
http://cs273a.stanford.edu [Bejerano Fall11/12] 15
Objective: Minimize Secondary StructureFree Energy at 37 OC:
C G U U U G G GUU
CACAAACG
-2 .0
-2 .1
-0 .9
-0 .9
-1 .8
-1 .6
+ 5 .0
G helix = GC GG C
+ G
G UC A
+ 2G
U UA A
+ G
U GA C
=
-2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol
Ghairpin loop = G
initiation (6 nucleotides) + Gmismatch
G GC A
=
5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol
Gtotal = G
hairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol
Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287.
Instead of (i, j), measure and sum energies:
Zuker’s algorithm MFOLD: computing loop dependent energies
Bafna 1
RNA structure
• Base-pairing defines a secondary structure. The base-pairing is usually non-crossing.
S aSu
S cSg
S gSc
S uSa
S a
S c
S g
S u
S SS
1. A CFG
S aSu
acSgu
accSggu
accuSaggu
accuSSaggu
accugScSaggu
accuggSccSaggu
accuggaccSaggu
accuggacccSgaggu
accuggacccuSagaggu
accuggacccuuagaggu
2. A derivation of “accuggacccuuagaggu”3. Corresponding structure
Stochastic context-free grammar
Cool algorithmics. Unfortunately…
– Random DNA (with high GC content) often folds into low-energy structures.
– We will mention powerful newer methods later on.
ssRNA transcription• ssRNAs like tRNAs are usually encoded by short
“non coding” genes, that transcribe independently.• Found in both the UCSC “known genes” track, and
as a subtrack of the RepeatMasker track
http://cs173.stanford.edu [BejeranoWinter12/13] 20
http://cs173.stanford.edu [BejeranoWinter12/13] 21
“non coding” RNAs (ncRNA)
microRNAs (miRNA/miR)
http://cs173.stanford.edu [BejeranoWinter12/13] 22
MicroRNA (miR)
mRNA
~70nt ~22nt miR match to target mRNAis quite loose.
a single miR can regulate the expression of hundreds of genes.
http://cs173.stanford.edu [BejeranoWinter12/13] 23
MicroRNA Transcription
mRNA
http://cs173.stanford.edu [BejeranoWinter12/13] 24
MicroRNA Transcription
mRNA
http://cs173.stanford.edu [BejeranoWinter12/13] 25
MicroRNA (miR)
mRNA
~70nt ~22nt miR match to target mRNAis quite loose.
a single miR can regulate the expression of hundreds of genes.
Computational challenges:Predict miRs.Predict miR targets.
http://cs173.stanford.edu [BejeranoWinter12/13] 26
MicroRNA Therapeutics
mRNA
~70nt ~22nt miR match to target mRNAis quite loose.
a single miR can regulate the expression of hundreds of genes.
Idea: bolster/inhibit miR production tobroadly modulate protein productionHope: “right” the good guys and/or “wrong” the bad guysChallenge: and not vice versa.
http://cs173.stanford.edu [BejeranoWinter12/13] 27
Other Non Coding Transcripts
lncRNAs (long non coding RNAs)
http://cs173.stanford.edu [BejeranoWinter12/13] 28
Don’t seem to fold into clear structures (or only a sub-region does).Diverse roles only now starting to be understood.
Hard to detect or predict function computationally (currently)
http://cs173.stanford.edu [BejeranoWinter12/13] 29
lncRNAs come in many flavors
X chromosome inactivation in mammals
X X X Y
X
Dosage compensation
Xist – X inactive-specific transcript
Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67
http://cs173.stanford.edu [BejeranoWinter12/13] 32
Transcripts, transcripts everywhere
Human Genome
Transcribed* (Tx)
Tx from both strands*
* True size of set unknown
Or are they?
http://cs173.stanford.edu [BejeranoWinter12/13] 33
http://cs173.stanford.edu [BejeranoWinter12/13] 34
The million dollar question
Human Genome
Transcribed* (Tx)
Tx from both strands*
* True size of set unknown
Leaky tx?
Functional?
Coding and non-coding gene production
http://cs173.stanford.edu [BejeranoWinter12/13] 35
The cell is constantly making new proteins and ncRNAs.
These perform their function for a while,
And are then degraded.
Newly made coding and non coding gene products take their place.
The picture within a cell is constantly “refreshing”.
To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.
Cell differentiation
http://cs173.stanford.edu [BejeranoWinter12/13] 36
To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.
That is exactly what happens when cells differentiate during development from stem cells to their different final fates.
Human manipulation of cell fate
http://cs173.stanford.edu [BejeranoWinter12/13] 37
To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.
We have learned (in a dish) to:1 control differentiation2 reverse differentiation3 hop between different states
Cell replacement therapies
http://cs173.stanford.edu [BejeranoWinter12/13] 38
We want to use this knowledge to provide a patient with healthy self cells of a needed type.
We have learned (in a dish) to:1 control differentiation2 reverse differentiation3 hop between different states
(iPS = induced pluripotent stem cells)
How does this happen?
http://cs173.stanford.edu [BejeranoWinter12/13] 39
Different cells in our body hold copies of (essentially) the same genome.
Yet they express very different repertoires of proteins and non-coding RNAs.
How do cells do it?
A: like they do everything else: using their proteins & ncRNAs…
Gene Regulation
http://cs173.stanford.edu [BejeranoWinter12/13] 40
Some proteins and non coding RNAs go “back” to bind DNA near genes, turning these genes on and off.
GeneDNA
Proteins
To be continued…
ReviewLecture 6• Central dogma recap
–Genes, proteins and non coding RNAs• RNA world hypothesis• Small structural RNAs
–Sequence, structure, function–Structure prediction–Transcription mode
• MicroRNAs–Functions–Modes of transcription
• lncRNAs–Xist
• Genome wide (and context wide) transcription–How much?–To what goals?
• Gene transcription and cell identity–Cell differentiation–Human manipulation of cell fates
• Gene regulation control
http://cs173.stanford.edu [BejeranoWinter12/13] 41
http://cs173.stanford.edu [BejeranoWinter12/13] 42
(On Mondays) ask students to stack the chairs without wheels at the back of the room at the end of class.