a computational screen detected thousands of new a-to-i rna hyper-editing sites

24
A computational screen detected thousands of new A-to-I RNA hyper-editing sites Shai Carmi Together with: Itamar Borokhov (Compugen), Erez Levanon Thanks to: Gilad Finkelstein, Khen Khermesh, Nurit Paz-Y Lab meeting, January 2011

Upload: thora

Post on 11-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

A computational screen detected thousands of new A-to-I RNA hyper-editing sites. Shai Carmi. Lab meeting, January 2011. Together with: Itamar Borokhov ( Compugen ), Erez Levanon Thanks to: Gilad Finkelstein, Khen Khermesh , Nurit Paz- Yaacov. A-to-I RNA editing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Shai Carmi

Together with: Itamar Borokhov (Compugen), Erez LevanonThanks to: Gilad Finkelstein, Khen Khermesh, Nurit Paz-Yaacov

Lab meeting, January 2011

Page 2: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

A-to-I RNA editing• RNA editing is a post-transcriptional change in the pre-RNA.• Alters the RNA sequence encoded by the DNA in a single-

nucleotide, site-specific manner.• Adenosine is converted to Inosine.• Inosine is read as Guanosine during translation and sequencing.• The ADAR (Adenosine Deaminase that Act on RNA) dsRNA

binding protein family catalyzes A-to-I editing.• Embryonically lethal, related to brain diseases.

Gommans, Mullen & Maas. Bioessays 31, 1137 (2009)

Page 3: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Hyper-editingNishikura. Annu. Rev. Biochem. 79, 321 (2010)

Global effect:“Inosine-containing dsRNA binds a stress-granule-like complex and downregulates gene expression in trans” (Scadden, Mol. Cell 28, 491 (2007)).

“Double-stranded RNAs containing multiple IU pairs are sufficient to suppress interferon induction and apoptosis” (Vitali & Scadden. Nat. Struct. Mol. Biol 17, 1043 (2010)).

• Editing occurs mostly near Aluinverted repeats.

• Many editing sites in each repeat.• Hyper-editing also observed.• Only one known target (biochemical screen)

(Morse, Aruscavage & Bass. PNAS 99, 7906 (2002)).

Page 4: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

TCCCCACCCTGAGTAGCTGGGACTACAGGCATGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTATTTGTTTGA : DNA ||||||||||||||*|||||||||||||||*|||||||||||||||||||||||||||||||||||||*||||||||| TCCCCACCCTGAGTGGCTGGGACTACAGGCGTGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTGTTTGTTTGA : RNA

TCCCCACCCTGAGTAGCTGGGACTACAGGCATGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTATTTGTTTGA : DNA |||||*||||||||*||||||*||||*|||*||||||*||||*||*|||||||*||||**||||||||*||||||||* TCCCCGCCCTGAGTGGCTGGGGCTACGGGCGTGTGCCGCCACGCCGCCATGCTGGGCTGGTGGTTTGTGTTTGTTTGG : RNA

Typical editing is detected by aligning the RNA to the genome and searching for A→G mismatches

“Too edited” RNAs will not align to the genome at all!

How to detect such editing events?

Detecting hyper-editing

Page 5: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

• Collect all sequences that were rejected from alignment to the genome (by UCSC genome browser).

• Transform the sequences— change every “A” to “G” in the human genome and in the RNA sequences.

• Re-align to the genome.

TCCCCGCCCTGGGTGGCTGGGGCTGCGGGCGTGTGCCGCCGCGCCGCCGTGCTGGGCTGGTGGTTTGTGTTTGTTTGG : DNA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TCCCCGCCCTGGGTGGCTGGGGCTGCGGGCGTGTGCCGCCGCGCCGCCGTGCTGGGCTGGTGGTTTGTGTTTGTTTGG : RNA

TCCCCACCCTGAGTAGCTGGGACTACAGGCATGTGCCACCACACCACCATGCTAGGCTAATGGTTTGTATTTGTTTGA : DNA |||||*||||||||*||||||*||||*|||*||||||*||||*||*|||||||*||||**||||||||*||||||||* TCCCCGCCCTGAGTGGCTGGGGCTACGGGCGTGTGCCGCCACGCCGCCATGCTGGGCTGGTGGTTTGTGTTTGTTTGG : RNA

An algorithm

Page 6: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

How hard is it?• There are about 500k ESTs not aligning to the genome, of

length about 500bp each. The genome is 3Gbp.• The 3-letters genome has lower complexity.• Need to perform all 4 different strand combinations before

the 3-letters transformation. (DNA[+/-] x RNA[+/-]).• Need to check also A→C, G→C, A→T for control.• Two years on a new laptop.

Solution: use cloud and parallelize.

Page 7: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Overview of the procedure

• EST downloading and filtering• EST and genome preprocessing• Blast• Original sequence reconstruction• Examination of mismatches• (Filtering results)

• Properties of hyper-edited RNAs

2-3 weeks on local server

1 day

2-3 months on Amazon cloud and partly on local server;~2000 paid computer hours; $500.

2-3 days

2-3 hours

Minutes for each operation

Computing times

Page 8: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Results

• Final editing criteria:>12 A-to-G, >6% of sites edited, >90% of mismataches are A-to-G.

• Number of clusters of each type.

• Quality scores not available for these ESTs to explain the non A-to-G clusters.• We have some understanding of G-to-A clusters (APOBEC, sequencing errors).

AT run not finished yet.

Note that:A-to-G represents also T-to-C.G-to-A represents also C-to-T.A-to-C represents also T-to-G.…

Page 9: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Examples

Query 1 AGATATTTTTAGGCTTGGCATTGTGGATCACACTTGTAATCCCAGCATTTTGGGAGGCCT 60 Sbjct 1 .............................G.G.....G...................... 60 Query 61 AGCCAGGCAGGTCCCTTGAGCCCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAACT 120 Sbjct 61 G.................G......................................... 120 Query 121 GTCTCTGCAAAATATATAAAAATTATTCAGTCCTGGTGGTGTGTGCCTGTAGTCCCACCT 180 Sbjct 121 .........G........................................G......... 180 Query 181 ACTTGAGAGGCTGAGGTGGGAGGATCACCTGAGACCAGGAGGTTGAGGTTGCAGTGAGCT 240 Sbjct 181 G....G....................G.............................G... 240 Query 241 GTGATTTCACCACTGCACTCCAGTCTGGGCAACCGAGTGAGACCCTGTCTCAAAAATAAT 300 Sbjct 241 ........G..G...................G............................ 300 Query 301 TTTAAAATAGGCCGGGCCTGGTGGCTCATGCCTGTATTCCCAGCACTTTGGGAGCCCAAG 360 Sbjct 301 .........................................G...............G.. 360 Query 361 GCGGGTGGATCACCTGAGGTCAGGGGTTCAAGACCAGCCTGGCCAACATGGTGAAACCCC 420 Sbjct 361 .....................G...................................... 420 Query 421 GTCTCTACTGAAAATACAAAAAATTAGCCAGGCGGGTGGCGGGCGCCTATAAAACCAGCT 480 Sbjct 421 ................................................G.GG....G... 480 Query 481 ACTCAGGAGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCC 540 Sbjct 481 G...G..G.....G...G........G.....G..........G........G....... 540 Query 541 GAGATT 546 Sbjct 541 ...... 546

chr16:23457836-23458381, DA103871 (-+)

Page 10: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Examples

Query 1 ATGTGTATTCCACACACAAATGGCTGAGTTATAGTCATAAAACAATTTGCAATAAAAAAA 60Sbjct 1 ............................................................ 60Query 61 AAACCAAAACAGATTGTCAGTTAACCAGGAAACAGTTAATGTTTTTTAATGAATCTGGCA 120Sbjct 61 .....................................G..............G....... 120Query 121 TTATAGTGAGCAAATGTCGTATTAATTTAGGCTAATTTCTAATAC-TACCATAATTTGTG 179Sbjct 121 ..G.G......G................G.....G.....G....N.G....GG...... 180Query 180 TCTAAATTTCTGTTGGGGTAGAAATTACTAAAATTGTGGGGAGTTTTTTCTGATTTTTAC 239Sbjct 181 .......................G..G..GG.....................G.....G. 240Query 240 ATTGCTTTAGGAAACATTTTTACTAATTCAGCTGTCTTAGGTAAAATGAATAGTTTTCTT 299Sbjct 241 ........G...GG...........G............G...GG.G...G.G........ 300Query 300 CCTGTTTTTTTATGTGTCATTGTTAGTGGTCTCAGAATTCTGATCAGTAACTTTGTGTAT 359Sbjct 301 ...........G............G........G..G...........GG........G. 360Query 360 GATGCTGAATTACAAACCGTTTGAATGATCCAGTTGAAAACGTATCCCTCTACTTTCTTC 419Sbjct 361 ...........G..G............G........GG.G...G.......G........ 420Query 420 AGTTGTAGAAAAGGTTAATTTCCCTCAGTGTCCCACATTATACCAACCTAAGAGAAGAAC 479Sbjct 421 ......G.........G.........G............G..........G......... 480Query 480 AGGTAATAGGGAGAA 494Sbjct 481 ....GG......... 495

chr15:68482051-68482545, DA105809 (++)

No Alu

Page 11: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Examples

Query 1 AGAACTAATGAGCACAGAACTAAGAAAGCCCAGGCACAGTGGCTCACATCAGTAATTCTA 60Sbjct 1 ............................................................ 60 Query 61 GGGCCTTGGGAGGCAAGACAAGAGAATCACTTGAGGCCATGAGTTCAAGGGCAGCCTAGG 120Sbjct 61 ....................G............G............G.....G....G.. 120 Query 121 CAACATAGTGGGACCCTATCTCCACAAAAATAATAATATTATTATTATTAAATAAAATAA 180Sbjct 121 ............................................................ 180 Query 181 AAGGAAGAGACAGCCATGAAGATAACTAGCTGAGGCCAGGTACAGTGGCTCATGCCTATA 240Sbjct 181 ...........................................G.............G.. 240 Query 241 ATCCCAACACTTTGGGAGGTTGAGGTGGACAGATTGCTTGAGGTCAGAAGTTCCAGACCA 300Sbjct 241 G....G.......................................G.GG.......G..G 300 Query 301 GACTGAACAACATAGCAAAACCCCATCCCTACTAAAAATACAAAAATTAGCTGGGCGTGG 360Sbjct 301 .............G..GG......G.....G..G...G..........G........... 360 Query 361 TGGCAGGCACCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCACCTGAACCT 420Sbjct 361 ....G...G.....G.........G...................G.....G.....G... 420 Query 421 GGGAGGCGGAGGATGCAGTGCGCTGAGATCATGCCACT 458Sbjct 421 ............G...G.............G....... 458

chr16:29383242-29383699, BM703103 (++)

Page 12: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Examples

Query 1 TTGCTCTGTCACCCAGGCTGGAGTGCAGTGGCGCAATCTCGGCTCACTGCAAGCTCCACC 60Sbjct 1 ..........G...G...................GG..............GG.....G.. 60 Query 61 TCCTGGGTTCACGCCATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTATAGGTGCCC 120Sbjct 61 ...............G............G.....................G......... 120 Query 121 ACCACCACGCCTGGCTAATTTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTAGC 180Sbjct 121 ...G......................G.....G..G.....................G.. 180 Query 181 CAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTACTGG 240Sbjct 181 .G...................G.......G....................GGG..G.... 240 Query 241 GATTACAGGTGTGAGCCACTGATACCTGGCCAATTTTTATATTTGTTGTAGAGATGAGGT 300Sbjct 241 ....G....................................................... 300 Query 301 TTTGCCATATTGTCCAGGCTGGTCTCAAACTCCTGGTCTCAAGGGATCACCCGCCTCAGC 360Sbjct 301 ........G................................................G.. 360 Query 361 CTCCCAAAGTGCTGGGACTACAGGAGTGAGCCACTGTGCCTGGCCTTGTTTGTTTGTTTT 420Sbjct 361 .......G.................................................... 420 Query 421 TTGAGATGGGGTCTCACTATGTTGGCCAGGCTGGTCTCGAACTCCTGGGTTTGAGCAATC 480Sbjct 421 ..................G......................................... 480 Query 481 CTCCTGCCATGTAGCTGGGATTATAGAGGCTACCATGTCCGTCTAGTTTTAAATT 535Sbjct 481 ....................................................... 535

chr12:56344120-56344654, DB160834 (++)

Page 13: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Examples

Query 2 TTAGCCAGGCATGGTGGCAGATGCCTGTAGTCCCAGCTACTCAGGAGGCTGAAGTGGGAG 61Sbjct 2 ............................G.....G...G...G........GG....... 61 Query 62 GATCCCTTGAGCCTGGGAGTTCAAGGCTGCCATGAGCCAAGATGGCACTACCACACTCCA 121Sbjct 62 .G.......G............GG.......G..G....G............G....... 121 Query 122 TCCTGGATGACAGAGCAAGACCCTGTCTC-AAAAAAAAAAAAAGAATCTACAAACGATTA 180Sbjct 122 ................G............A.............................. 181 Query 181 AATTAATAAGTGAGTTCAGCAAGATATTTTAAAAAATTATTAAAATTAACAAGTAAATTT 240Sbjct 182 ............................................................ 241 Query 241 GTGGGGACCAAGGTAAATATATAAAAATCTATTATGGTTTTTTTTTCTTTCTTTCTTTCT 300Sbjct 242 ............................................................ 301 Query 301 TTTTTTTCTGAGATGGAGTTTCACTCTTGTCACCCAGGCTGGAGTGCAATGGTGCGATCT 360Sbjct 302 ................G.....G............G...........GG.......G... 361 Query 361 TGTTTCACCGCCACCTCTGCCTCCGGGTTCAAGGGATTTTCCTGCCTCAGCCTCCTGAGT 420Sbjct 362 ...............................G................G........... 421 Query 421 AGCTGGGATTACAAGCGCCCCCCACCACACCTGGCTAATTTTTGTATTTTTAGCAGAGAC 480Sbjct 422 G.........G..G..............G........G.......G.....G..G..... 481 Query 481 GGGGTTTTACCATGTTGACCAGCCTGGTCCTCGAACTCCTGAGCTCAGGTGATCCACCCG 540Sbjct 482 ...........G......................G...........G........G.... 541 Query 541 CCTC 544Sbjct 542 .... 545

chr11:65608125-65608668, DA221841 (++)

Page 14: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Examples

Query 1 TTCCCTGGAGGTGCTGGGAGCTGGGAAATGTATGCGGCTGTGAATTATTAATATTTTGGA 60Sbjct 405 ...................N.............N.......................... 346 Query 61 GACCCTCACTAGGGCAGGGAGTGGCTTCAGGATAGGAAAGGGGACGCAAGGAAGACACCA 120Sbjct 345 ............................................................ 286 Query 121 GGAATGGCCGGGCGCGATGGCTTACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGG 180Sbjct 285 .............N..G......G.......GG........................... 226 Query 181 TCAGATCACCTGAGGTCGGGAGTTTGAGACCAGCCTGACCAACATGGAGAAACCCTGTCT 240Sbjct 225 ..G....G....G.......G.....G....G.....G..GG.....G.G.G........ 166 Query 241 CTACTGAAAATACAAAATTAGCTGGGCGTGGTGGCGGGTGCCTGTAATCCCAGCTACTCA 300Sbjct 165 ...............GG..G.........................G.........G...G 106 Query 301 GGAGGCTGAGGCAGGAGAATCTCTTGAACCCAGGAGGCAGAGGTTGCGGTGAGCTGAGAT 360Sbjct 105 ........G...G.....G........G...G......G..................... 46 Query 361 GGTGCCATTGCACTCCAGCCTGGGCAACAAGAGTGAAACTGTCTC 405Sbjct 45 .........................G................... 1

chr17:73090375-73090779, DB352453 (+-)

Page 15: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Novelty and more• Total number of edited ESTs: 807.• Total number of edited sites: 16184.• Number of novel sites: 15362 (more than any previous screen).• Number of novel ESTs: 700.• Number of novel hyper-edited ESTs (known<=5): 749.• Number not in Alu: 76.• Number of edited regions supported by multiple ESTs: 74 (169 ESTs).• Number of sites overlapping with a (transition) SNP: 250 (99 non-cDNA).

60% cDNA SNPS vs. only 0.3% expected.

Page 16: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Editing signature

Li et al. Science 324, 1210 (2009)

Hyper editing

ADAR2 motifsabsent in our data

Page 17: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Tissues and health statesTop tissues:Tissue #ESTs fraction edited

(x10-4)liver 316 12.0038brain 121 0.907091lung 39 0.964747thymus 29 3.19151prostate 28 0.830392eye 26 1.0526muscle 22 1.63777uncharacterized tissue 22 0.54209uterus 16 0.572324kidney 15 0.598217intestine 15 0.462488testis 14 0.369041spleen 12 1.9836bone 10 1.22326pancreas 10 0.413158

Top health states:State #ESTs fraction edited

(x10-4)normal 599 1.50345lung tumor 12 0.605776head and neck tumor 120.504392colorectal tumor 11 0.496144glioma 9 0.717595soft tissue/muscle tissue tumor 9

0.618gastrointestinal tumor 80.496873kidney tumor 7 0.638465germ cell tumor 7 0.241261uterine tumor 6 0.508863chondrosarcoma 5 0.583676pancreatic tumor 5 0.424452

Human liver regeneration after partial hepatectomy.

About 40% of the new sites.

Page 18: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Secondary structure

Are hyper-edited RNAs double-stranded?Consider the genomic sequence 10kbp flanking of the edited EST.Three measures of ``double-strandness’’.

The maximal length of dsRNA according to RNAFold (2kbp region). The total number of aligned bases when blasting

against the reverse complement.

1

3

2

The number of (+) and (-) Alus in the region.

Chr4:373100-375422

Chr16:68283493-68285759

Page 19: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Function

• 14 hyper edited coding sequences.• 186 UCSC genes.• 120 RefSeq genes.• Functional annotation

of UCSC genes:

USP1MARCKSL1ILF2GSK3BTNXBE2F5AS3MTSTARD10OLR1LDHBATF1LIPCCALML4ASB16

generation of precursor metabolites and energycellular lipid catabolic processhexose metabolic processsecondary metabolic processmonosaccharide metabolic processimmune responsemutagenesis sitecarbohydrate catabolic processdomain:Leucine-zipperzinc finger region:RanBP2-typehdllow-density lipoprotein bindingresponse to xenobiotic stimulusZinc finger, RanBP2-type

high-density lipoprotein particleNAD metabolic processZnF_RBZcoiled coilimmune responselipid transportlysosomecalmodulinglucose metabolic processre-entry into mitotic cell cyclelipid metabolism http://david.abcc.ncifcrf.gov/

Page 20: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Evolution• 28 human-specific elements in the hyper-edited regions.• Conservation scores (primate, 2kbp region):

Page 21: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

ValidationChr 1 GTTTCCAAGTTTCCCTCTCCCTTCTTTGACTTCTGACAGCTTCCGAAGTGTGCACACAGC 60RNA 1 ............................................................ 60 Chr 61 CTCTTGTCAGCACTGTTTGGTACCTGCATCTAAAAATGAGATCACAGTCCTTCCGCTCCG 120RNA 61 ............................................................ 120 Chr 121 CAAACCCTGACAGAGACAGAATACAGAGTGGGCTTGTAGACTTGAAGTATAAAACTTTTG 180RNA 121 ............................................................ 180 Chr 181 GCCAGTCCTGGTGGCTCACACCTGTAATCCCAGCACTTTGAGAGGCCGAGGTGGGCGGAT 240 X X XRNA 181 .................G.G....................G.......G........... 240 Chr 241 CACCTGAGGTCAGGAGTTCGAGACAAGCCTGGCCAACCTTGTGAAACCCCGTCTCAACTA 300 X X X X X XRNA 241 ......G....G.............G..............................G..G 300 Chr 301 AAAATACAAAAACTAGCCGGGCATGGTGGCATGTGCCTGTAATCCCAGCTACTCAGGAGG 360 XX X X XXXX X X XRNA 301 ..G..G.GG.....G.......G...............................G..... 360 Chr 361 CGGAGGCGTGAGAATCACTTGAACCTGGGAGGTGTAGGTTGCAGTGAGCCAAGATCGCAC 420 X XX X XX X XRNA 361 ..........G..G..G....G.............G..........G...G......... 420 Chr 421 CACTGCACTCCAGCCTGGGCAACAAGAGTGAAACTCCATCTCAAAAAAAAAAGACAGAAA 480 X X XXRNA 421 ............N...........G......G............................ 480 Chr 481 ACCTTTGGAGG 491RNA 481 ........... 491

EST: DA364252(normal brain)

Genome:chr2:242643522-242644012

G- editing according to the ESTX- experimentally validated editing

6 candidates.1- not amplified.2- not sequenced.2- not edited (beyond known).1- edited!

Page 22: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Validation

RNA editing not previously known in this gene.

ING5 (a tumor suppressor protein that can interact with TP53)

AluSq (+)AluSz (-) 385bp usAluY (-) 593bp ds

None of the sites is a SNP

Page 23: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

Validation

Page 24: A computational screen detected thousands of new A-to-I RNA hyper-editing sites

CGACAAGAGTGTACGATGACGTC

|||||*||||||*|||||*||||

CGACCGGAGTGTGCGCTGGCGTC

Thank you