introduction to rna bioinformatics craig l. zirbel october 5, 2010 based on a talk originally given...

16
Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov.

Upload: madeleine-haynes

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to RNA Bioinformatics

Craig L. ZirbelOctober 5, 2010

Based on a talk originally given by Anton Petrov.

Outline

Lecture 1• Importance of RNA, examples (miRNA, riboswitches).• RNA 2D and 3D structure.• RNA structure prediction.Lecture 2• RNA basepairs and 3D motifs• Predicting secondary structure from sequence (mfold)Lecture 3• Statistical variability of protein and RNA sequences

ENCODE Project Consortium, Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14447(7146):799-816

In the human, out of approximately 3 billion nucleotides, only about 1.5% code for proteins, although up to 93% are transcribed into RNA. What is this “non-coding” RNA doing?

Mattick, J.S. (2004) The hidden genetic program of complex organisms. Scientific American 291 (4): 60-67.

DNA

RNA

Protein

Transcription

Translation

tRNARibosomal RNA

DNA

RNA

Protein

Transcription

Translation of exons

Reverse Transcription

Splicing

tRNARibosomal RNAMany other types of ncRNA

Introns (RNA)

micro RNA

Mattick, J.S. (2004) The hidden genetic program of complex organisms. Scientific American 291 (4): 60-67.

Kim VN, MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol. 2005 May;6(5):376-85

microRNA

Mattick, J.S. (2004) The hidden genetic program of complex organisms. Scientific American 291 (4): 60-67.

Bioinformatical challenge: given a DNA sequence,

predict microRNA genes and their respective targets.

miRNAs in a transcript, waiting to be diced out

Peterson, K.J., Dietrich, M.R. and McPeek, M.A. (2009) MicroRNAs and metazoan macroevolution:insights into canalization, complexity, and the Cambrian explosion. BioEssays 31:736–747.

Acquisition of novel microRNAs (shown in white boxes) may be a driving force of recent evolution. Also a factor in cancers?

There are 84 mammal-specific microRNAs, and 84 more that are

found exclusively in apes.

Montange, R. K., & Batey, R. T. (2008). Riboswitches: emerging themes in RNA structure and function. Annu Rev Biophys 37:117-133.

RIBOSWITCHES

Bioinformatic challenges: find riboswitches in

genomic sequences, design novel riboswitches.

RNAs which bind to other molecules when they are present, altering the shape and function of the RNA.

http://en.wikipedia.org/wiki/List_of_RNAs

Types of RNA

Bioinformatic challenges: Is this list final? Could there be more types of non-

coding (ncRNA) that we don’t know yet? How to search for novel ncRNAs in

genomes?

Goals of RNA bioinformatics

• Find and classify RNA genes in genomic sequences (using both experimental and computational methods).

• Predict secondary and 3D structure from RNA sequence.

• Infer function from structure.• Rationally design RNA molecules for

biotechnology.• Find diseases associated with RNAs (e.g.,

cancer and miRNA)

Why RNA is unique

• Similar to DNA in chemical composition, primary and secondary structure, and information content, but with more complicated structure than helices• Similar to Proteins in tertiary and 3D structure and function, but also very different, mostly base-base interactions, fewer backbone-backbone• Binds substrates and catalyzes reactions, just as proteins.• Participates in all stages of gene expression and information transfer: transcription, splicing, translation. Frequent target of antibiotics.

Similarities Between Protein and RNA 3D Structures

• Compact folding • Hierarchical

organization • Modular domains • Specific tertiary

interactions • Molecular “mimicry”

-- Proteins that “mimic” RNA

LIANG, H., & LANDWEBER, L. F. (2005). Molecular mimicry: Quantitative methods to study structural similarity between protein and RNA. RNA, 11(8), 1167-1172.

The tertiary structures of tRNA-mimic translation factors and tRNA. (a) Thermus thermophilus EF-G:GDP (PDB accession code 1DAR). (b) Thermus aquaticus EF-Tu:GDPNP:Phe-tRNAPhe (1TTT). (c) Thermus thermophilus RRF (1EH1). (d) Yeast Phe-tRNAPhe.

RNAs are not linear - they fold back on themselves to match up complementary strands

RNA 2D Structure Elements

Bioinformatics: sequence and genome analysis By David W. Mount

Bioinformatic challenges: predict most stable 2D

structures, resolve pseudoknotted regions etc.

Basepairs are the basic units of secondary structure.

2d to 3d structure of RNA