a computational screen detected thousands of new a-to-i rna hyper-editing sites
DESCRIPTION
A computational screen detected thousands of new A-to-I RNA hyper-editing sites. Shai Carmi. Lab meeting, January 2011. Together with: Itamar Borokhov ( Compugen ), Erez Levanon Thanks to: Gilad Finkelstein, Khen Khermesh , Nurit Paz- Yaacov. A-to-I RNA editing. - PowerPoint PPT PresentationTRANSCRIPT
A computational screen detected thousands of new A-to-I RNA hyper-editing sites
Shai Carmi
Together with: Itamar Borokhov (Compugen), Erez LevanonThanks to: Gilad Finkelstein, Khen Khermesh, Nurit Paz-Yaacov
Lab meeting, January 2011
A-to-I RNA editing• RNA editing is a post-transcriptional change in the pre-RNA.• Alters the RNA sequence encoded by the DNA in a single-
nucleotide, site-specific manner.• Adenosine is converted to Inosine.• Inosine is read as Guanosine during translation and sequencing.• The ADAR (Adenosine Deaminase that Act on RNA) dsRNA
binding protein family catalyzes A-to-I editing.• Embryonically lethal, related to brain diseases.
Gommans, Mullen & Maas. Bioessays 31, 1137 (2009)
Hyper-editingNishikura. Annu. Rev. Biochem. 79, 321 (2010)
Global effect:“Inosine-containing dsRNA binds a stress-granule-like complex and downregulates gene expression in trans” (Scadden, Mol. Cell 28, 491 (2007)).
“Double-stranded RNAs containing multiple IU pairs are sufficient to suppress interferon induction and apoptosis” (Vitali & Scadden. Nat. Struct. Mol. Biol 17, 1043 (2010)).
• Editing occurs mostly near Aluinverted repeats.
• Many editing sites in each repeat.• Hyper-editing also observed.• Only one known target (biochemical screen)
(Morse, Aruscavage & Bass. PNAS 99, 7906 (2002)).
TCCCCACCCTGAGTAGCTGGGACTACAGGCATGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTATTTGTTTGA : DNA ||||||||||||||*|||||||||||||||*|||||||||||||||||||||||||||||||||||||*||||||||| TCCCCACCCTGAGTGGCTGGGACTACAGGCGTGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTGTTTGTTTGA : RNA
TCCCCACCCTGAGTAGCTGGGACTACAGGCATGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTATTTGTTTGA : DNA |||||*||||||||*||||||*||||*|||*||||||*||||*||*|||||||*||||**||||||||*||||||||* TCCCCGCCCTGAGTGGCTGGGGCTACGGGCGTGTGCCGCCACGCCGCCATGCTGGGCTGGTGGTTTGTGTTTGTTTGG : RNA
Typical editing is detected by aligning the RNA to the genome and searching for A→G mismatches
“Too edited” RNAs will not align to the genome at all!
How to detect such editing events?
Detecting hyper-editing
• Collect all sequences that were rejected from alignment to the genome (by UCSC genome browser).
• Transform the sequences— change every “A” to “G” in the human genome and in the RNA sequences.
• Re-align to the genome.
TCCCCGCCCTGGGTGGCTGGGGCTGCGGGCGTGTGCCGCCGCGCCGCCGTGCTGGGCTGGTGGTTTGTGTTTGTTTGG : DNA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TCCCCGCCCTGGGTGGCTGGGGCTGCGGGCGTGTGCCGCCGCGCCGCCGTGCTGGGCTGGTGGTTTGTGTTTGTTTGG : RNA
TCCCCACCCTGAGTAGCTGGGACTACAGGCATGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTATTTGTTTGA : DNA |||||*||||||||*||||||*||||*|||*||||||*||||*||*|||||||*||||**||||||||*||||||||* TCCCCGCCCTGAGTGGCTGGGGCTACGGGCGTGTGCCGCCACGCCGCCATGCTGGGCTGGTGGTTTGTGTTTGTTTGG : RNA
An algorithm
How hard is it?• There are about 500k ESTs not aligning to the genome, of
length about 500bp each. The genome is 3Gbp.• The 3-letters genome has lower complexity.• Need to perform all 4 different strand combinations before
the 3-letters transformation. (DNA[+/-] x RNA[+/-]).• Need to check also A→C, G→C, A→T for control.• Two years on a new laptop.
Solution: use cloud and parallelize.
Overview of the procedure
• EST downloading and filtering• EST and genome preprocessing• Blast• Original sequence reconstruction• Examination of mismatches• (Filtering results)
• Properties of hyper-edited RNAs
2-3 weeks on local server
1 day
2-3 months on Amazon cloud and partly on local server;~2000 paid computer hours; $500.
2-3 days
2-3 hours
Minutes for each operation
Computing times
Results
• Final editing criteria:>12 A-to-G, >6% of sites edited, >90% of mismataches are A-to-G.
• Number of clusters of each type.
• Quality scores not available for these ESTs to explain the non A-to-G clusters.• We have some understanding of G-to-A clusters (APOBEC, sequencing errors).
AT run not finished yet.
Note that:A-to-G represents also T-to-C.G-to-A represents also C-to-T.A-to-C represents also T-to-G.…
Examples
Query 1 AGATATTTTTAGGCTTGGCATTGTGGATCACACTTGTAATCCCAGCATTTTGGGAGGCCT 60 Sbjct 1 .............................G.G.....G...................... 60 Query 61 AGCCAGGCAGGTCCCTTGAGCCCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAACT 120 Sbjct 61 G.................G......................................... 120 Query 121 GTCTCTGCAAAATATATAAAAATTATTCAGTCCTGGTGGTGTGTGCCTGTAGTCCCACCT 180 Sbjct 121 .........G........................................G......... 180 Query 181 ACTTGAGAGGCTGAGGTGGGAGGATCACCTGAGACCAGGAGGTTGAGGTTGCAGTGAGCT 240 Sbjct 181 G....G....................G.............................G... 240 Query 241 GTGATTTCACCACTGCACTCCAGTCTGGGCAACCGAGTGAGACCCTGTCTCAAAAATAAT 300 Sbjct 241 ........G..G...................G............................ 300 Query 301 TTTAAAATAGGCCGGGCCTGGTGGCTCATGCCTGTATTCCCAGCACTTTGGGAGCCCAAG 360 Sbjct 301 .........................................G...............G.. 360 Query 361 GCGGGTGGATCACCTGAGGTCAGGGGTTCAAGACCAGCCTGGCCAACATGGTGAAACCCC 420 Sbjct 361 .....................G...................................... 420 Query 421 GTCTCTACTGAAAATACAAAAAATTAGCCAGGCGGGTGGCGGGCGCCTATAAAACCAGCT 480 Sbjct 421 ................................................G.GG....G... 480 Query 481 ACTCAGGAGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCC 540 Sbjct 481 G...G..G.....G...G........G.....G..........G........G....... 540 Query 541 GAGATT 546 Sbjct 541 ...... 546
chr16:23457836-23458381, DA103871 (-+)
Examples
Query 1 ATGTGTATTCCACACACAAATGGCTGAGTTATAGTCATAAAACAATTTGCAATAAAAAAA 60Sbjct 1 ............................................................ 60Query 61 AAACCAAAACAGATTGTCAGTTAACCAGGAAACAGTTAATGTTTTTTAATGAATCTGGCA 120Sbjct 61 .....................................G..............G....... 120Query 121 TTATAGTGAGCAAATGTCGTATTAATTTAGGCTAATTTCTAATAC-TACCATAATTTGTG 179Sbjct 121 ..G.G......G................G.....G.....G....N.G....GG...... 180Query 180 TCTAAATTTCTGTTGGGGTAGAAATTACTAAAATTGTGGGGAGTTTTTTCTGATTTTTAC 239Sbjct 181 .......................G..G..GG.....................G.....G. 240Query 240 ATTGCTTTAGGAAACATTTTTACTAATTCAGCTGTCTTAGGTAAAATGAATAGTTTTCTT 299Sbjct 241 ........G...GG...........G............G...GG.G...G.G........ 300Query 300 CCTGTTTTTTTATGTGTCATTGTTAGTGGTCTCAGAATTCTGATCAGTAACTTTGTGTAT 359Sbjct 301 ...........G............G........G..G...........GG........G. 360Query 360 GATGCTGAATTACAAACCGTTTGAATGATCCAGTTGAAAACGTATCCCTCTACTTTCTTC 419Sbjct 361 ...........G..G............G........GG.G...G.......G........ 420Query 420 AGTTGTAGAAAAGGTTAATTTCCCTCAGTGTCCCACATTATACCAACCTAAGAGAAGAAC 479Sbjct 421 ......G.........G.........G............G..........G......... 480Query 480 AGGTAATAGGGAGAA 494Sbjct 481 ....GG......... 495
chr15:68482051-68482545, DA105809 (++)
No Alu
Examples
Query 1 AGAACTAATGAGCACAGAACTAAGAAAGCCCAGGCACAGTGGCTCACATCAGTAATTCTA 60Sbjct 1 ............................................................ 60 Query 61 GGGCCTTGGGAGGCAAGACAAGAGAATCACTTGAGGCCATGAGTTCAAGGGCAGCCTAGG 120Sbjct 61 ....................G............G............G.....G....G.. 120 Query 121 CAACATAGTGGGACCCTATCTCCACAAAAATAATAATATTATTATTATTAAATAAAATAA 180Sbjct 121 ............................................................ 180 Query 181 AAGGAAGAGACAGCCATGAAGATAACTAGCTGAGGCCAGGTACAGTGGCTCATGCCTATA 240Sbjct 181 ...........................................G.............G.. 240 Query 241 ATCCCAACACTTTGGGAGGTTGAGGTGGACAGATTGCTTGAGGTCAGAAGTTCCAGACCA 300Sbjct 241 G....G.......................................G.GG.......G..G 300 Query 301 GACTGAACAACATAGCAAAACCCCATCCCTACTAAAAATACAAAAATTAGCTGGGCGTGG 360Sbjct 301 .............G..GG......G.....G..G...G..........G........... 360 Query 361 TGGCAGGCACCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCACCTGAACCT 420Sbjct 361 ....G...G.....G.........G...................G.....G.....G... 420 Query 421 GGGAGGCGGAGGATGCAGTGCGCTGAGATCATGCCACT 458Sbjct 421 ............G...G.............G....... 458
chr16:29383242-29383699, BM703103 (++)
Examples
Query 1 TTGCTCTGTCACCCAGGCTGGAGTGCAGTGGCGCAATCTCGGCTCACTGCAAGCTCCACC 60Sbjct 1 ..........G...G...................GG..............GG.....G.. 60 Query 61 TCCTGGGTTCACGCCATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTATAGGTGCCC 120Sbjct 61 ...............G............G.....................G......... 120 Query 121 ACCACCACGCCTGGCTAATTTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTAGC 180Sbjct 121 ...G......................G.....G..G.....................G.. 180 Query 181 CAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTACTGG 240Sbjct 181 .G...................G.......G....................GGG..G.... 240 Query 241 GATTACAGGTGTGAGCCACTGATACCTGGCCAATTTTTATATTTGTTGTAGAGATGAGGT 300Sbjct 241 ....G....................................................... 300 Query 301 TTTGCCATATTGTCCAGGCTGGTCTCAAACTCCTGGTCTCAAGGGATCACCCGCCTCAGC 360Sbjct 301 ........G................................................G.. 360 Query 361 CTCCCAAAGTGCTGGGACTACAGGAGTGAGCCACTGTGCCTGGCCTTGTTTGTTTGTTTT 420Sbjct 361 .......G.................................................... 420 Query 421 TTGAGATGGGGTCTCACTATGTTGGCCAGGCTGGTCTCGAACTCCTGGGTTTGAGCAATC 480Sbjct 421 ..................G......................................... 480 Query 481 CTCCTGCCATGTAGCTGGGATTATAGAGGCTACCATGTCCGTCTAGTTTTAAATT 535Sbjct 481 ....................................................... 535
chr12:56344120-56344654, DB160834 (++)
Examples
Query 2 TTAGCCAGGCATGGTGGCAGATGCCTGTAGTCCCAGCTACTCAGGAGGCTGAAGTGGGAG 61Sbjct 2 ............................G.....G...G...G........GG....... 61 Query 62 GATCCCTTGAGCCTGGGAGTTCAAGGCTGCCATGAGCCAAGATGGCACTACCACACTCCA 121Sbjct 62 .G.......G............GG.......G..G....G............G....... 121 Query 122 TCCTGGATGACAGAGCAAGACCCTGTCTC-AAAAAAAAAAAAAGAATCTACAAACGATTA 180Sbjct 122 ................G............A.............................. 181 Query 181 AATTAATAAGTGAGTTCAGCAAGATATTTTAAAAAATTATTAAAATTAACAAGTAAATTT 240Sbjct 182 ............................................................ 241 Query 241 GTGGGGACCAAGGTAAATATATAAAAATCTATTATGGTTTTTTTTTCTTTCTTTCTTTCT 300Sbjct 242 ............................................................ 301 Query 301 TTTTTTTCTGAGATGGAGTTTCACTCTTGTCACCCAGGCTGGAGTGCAATGGTGCGATCT 360Sbjct 302 ................G.....G............G...........GG.......G... 361 Query 361 TGTTTCACCGCCACCTCTGCCTCCGGGTTCAAGGGATTTTCCTGCCTCAGCCTCCTGAGT 420Sbjct 362 ...............................G................G........... 421 Query 421 AGCTGGGATTACAAGCGCCCCCCACCACACCTGGCTAATTTTTGTATTTTTAGCAGAGAC 480Sbjct 422 G.........G..G..............G........G.......G.....G..G..... 481 Query 481 GGGGTTTTACCATGTTGACCAGCCTGGTCCTCGAACTCCTGAGCTCAGGTGATCCACCCG 540Sbjct 482 ...........G......................G...........G........G.... 541 Query 541 CCTC 544Sbjct 542 .... 545
chr11:65608125-65608668, DA221841 (++)
Examples
Query 1 TTCCCTGGAGGTGCTGGGAGCTGGGAAATGTATGCGGCTGTGAATTATTAATATTTTGGA 60Sbjct 405 ...................N.............N.......................... 346 Query 61 GACCCTCACTAGGGCAGGGAGTGGCTTCAGGATAGGAAAGGGGACGCAAGGAAGACACCA 120Sbjct 345 ............................................................ 286 Query 121 GGAATGGCCGGGCGCGATGGCTTACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGG 180Sbjct 285 .............N..G......G.......GG........................... 226 Query 181 TCAGATCACCTGAGGTCGGGAGTTTGAGACCAGCCTGACCAACATGGAGAAACCCTGTCT 240Sbjct 225 ..G....G....G.......G.....G....G.....G..GG.....G.G.G........ 166 Query 241 CTACTGAAAATACAAAATTAGCTGGGCGTGGTGGCGGGTGCCTGTAATCCCAGCTACTCA 300Sbjct 165 ...............GG..G.........................G.........G...G 106 Query 301 GGAGGCTGAGGCAGGAGAATCTCTTGAACCCAGGAGGCAGAGGTTGCGGTGAGCTGAGAT 360Sbjct 105 ........G...G.....G........G...G......G..................... 46 Query 361 GGTGCCATTGCACTCCAGCCTGGGCAACAAGAGTGAAACTGTCTC 405Sbjct 45 .........................G................... 1
chr17:73090375-73090779, DB352453 (+-)
Novelty and more• Total number of edited ESTs: 807.• Total number of edited sites: 16184.• Number of novel sites: 15362 (more than any previous screen).• Number of novel ESTs: 700.• Number of novel hyper-edited ESTs (known<=5): 749.• Number not in Alu: 76.• Number of edited regions supported by multiple ESTs: 74 (169 ESTs).• Number of sites overlapping with a (transition) SNP: 250 (99 non-cDNA).
60% cDNA SNPS vs. only 0.3% expected.
Editing signature
Li et al. Science 324, 1210 (2009)
Hyper editing
ADAR2 motifsabsent in our data
Tissues and health statesTop tissues:Tissue #ESTs fraction edited
(x10-4)liver 316 12.0038brain 121 0.907091lung 39 0.964747thymus 29 3.19151prostate 28 0.830392eye 26 1.0526muscle 22 1.63777uncharacterized tissue 22 0.54209uterus 16 0.572324kidney 15 0.598217intestine 15 0.462488testis 14 0.369041spleen 12 1.9836bone 10 1.22326pancreas 10 0.413158
Top health states:State #ESTs fraction edited
(x10-4)normal 599 1.50345lung tumor 12 0.605776head and neck tumor 120.504392colorectal tumor 11 0.496144glioma 9 0.717595soft tissue/muscle tissue tumor 9
0.618gastrointestinal tumor 80.496873kidney tumor 7 0.638465germ cell tumor 7 0.241261uterine tumor 6 0.508863chondrosarcoma 5 0.583676pancreatic tumor 5 0.424452
Human liver regeneration after partial hepatectomy.
About 40% of the new sites.
Secondary structure
Are hyper-edited RNAs double-stranded?Consider the genomic sequence 10kbp flanking of the edited EST.Three measures of ``double-strandness’’.
The maximal length of dsRNA according to RNAFold (2kbp region). The total number of aligned bases when blasting
against the reverse complement.
1
3
2
The number of (+) and (-) Alus in the region.
Chr4:373100-375422
Chr16:68283493-68285759
Function
• 14 hyper edited coding sequences.• 186 UCSC genes.• 120 RefSeq genes.• Functional annotation
of UCSC genes:
USP1MARCKSL1ILF2GSK3BTNXBE2F5AS3MTSTARD10OLR1LDHBATF1LIPCCALML4ASB16
generation of precursor metabolites and energycellular lipid catabolic processhexose metabolic processsecondary metabolic processmonosaccharide metabolic processimmune responsemutagenesis sitecarbohydrate catabolic processdomain:Leucine-zipperzinc finger region:RanBP2-typehdllow-density lipoprotein bindingresponse to xenobiotic stimulusZinc finger, RanBP2-type
high-density lipoprotein particleNAD metabolic processZnF_RBZcoiled coilimmune responselipid transportlysosomecalmodulinglucose metabolic processre-entry into mitotic cell cyclelipid metabolism http://david.abcc.ncifcrf.gov/
Evolution• 28 human-specific elements in the hyper-edited regions.• Conservation scores (primate, 2kbp region):
ValidationChr 1 GTTTCCAAGTTTCCCTCTCCCTTCTTTGACTTCTGACAGCTTCCGAAGTGTGCACACAGC 60RNA 1 ............................................................ 60 Chr 61 CTCTTGTCAGCACTGTTTGGTACCTGCATCTAAAAATGAGATCACAGTCCTTCCGCTCCG 120RNA 61 ............................................................ 120 Chr 121 CAAACCCTGACAGAGACAGAATACAGAGTGGGCTTGTAGACTTGAAGTATAAAACTTTTG 180RNA 121 ............................................................ 180 Chr 181 GCCAGTCCTGGTGGCTCACACCTGTAATCCCAGCACTTTGAGAGGCCGAGGTGGGCGGAT 240 X X XRNA 181 .................G.G....................G.......G........... 240 Chr 241 CACCTGAGGTCAGGAGTTCGAGACAAGCCTGGCCAACCTTGTGAAACCCCGTCTCAACTA 300 X X X X X XRNA 241 ......G....G.............G..............................G..G 300 Chr 301 AAAATACAAAAACTAGCCGGGCATGGTGGCATGTGCCTGTAATCCCAGCTACTCAGGAGG 360 XX X X XXXX X X XRNA 301 ..G..G.GG.....G.......G...............................G..... 360 Chr 361 CGGAGGCGTGAGAATCACTTGAACCTGGGAGGTGTAGGTTGCAGTGAGCCAAGATCGCAC 420 X XX X XX X XRNA 361 ..........G..G..G....G.............G..........G...G......... 420 Chr 421 CACTGCACTCCAGCCTGGGCAACAAGAGTGAAACTCCATCTCAAAAAAAAAAGACAGAAA 480 X X XXRNA 421 ............N...........G......G............................ 480 Chr 481 ACCTTTGGAGG 491RNA 481 ........... 491
EST: DA364252(normal brain)
Genome:chr2:242643522-242644012
G- editing according to the ESTX- experimentally validated editing
6 candidates.1- not amplified.2- not sequenced.2- not edited (beyond known).1- edited!
Validation
RNA editing not previously known in this gene.
ING5 (a tumor suppressor protein that can interact with TP53)
AluSq (+)AluSz (-) 385bp usAluY (-) 593bp ds
None of the sites is a SNP
Validation
CGACAAGAGTGTACGATGACGTC
|||||*||||||*|||||*||||
CGACCGGAGTGTGCGCTGGCGTC
Thank you