central dogma of molecular biology -...

46
Transcriptomics Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Upload: others

Post on 30-Aug-2019

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Transcriptomics

Marta Puig Institut de Biotecnologia i Biomedicina

Universitat Autònoma de Barcelona

Page 2: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Central dogma of molecular biology

Page 3: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Genome

Proteome Transcriptome

Complete DNA content of an organism with all its genes and

regulatory sequences

Complete collection of proteins and their relative

levels in each cell

Transcription

Translation

Central dogma of molecular biology

Phenotype

Complete set of transcripts and their relative levels of expression in a particular cell or tissue under defined conditions at a given time

Page 4: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

RNA profiling provides information about:

Expressed sequences and genes of a genome

Gene regulation and regulatory sequences

Function and interaction between genes

Functional differences between tissues and cell types

Identification of candidate genes for any given process or disease

Why is the study of RNA so important?

Page 5: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Methods

Alternative splicing

Types of transcripts

Regulatory sequences

ENCODE project

Overview

Page 6: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

SINGLE GENES

Northern

RT-PCR

5’ and 3’ RACE

Quantitative RT-PCR (Real-Time RT-PCR)

WHOLE TRANSCRIPTOME

EST sequencing

Microarrays

RNA-Seq

Transcriptome analysis methods

Page 7: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Transcriptome analysis using microarrays

Gene expression arrays - Quantification of transcript abundance

- Single/multiple 3’ probes

Genome tiling arrays - Identification of transcribed sequences - Multiple probes covering the genome

Alternative splicing arrays - Quantification of different RNA isoforms - Probes in exons and exon-exon junctions

Gene

Probes

Gene

Probes

Inclusion form

Exclusion form

Page 8: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Brent (2008) Nature Reviews Genetics 9: 62-73

ESTs

Alignment with genome

cDNA synthesis

cDNA library

Sanger sequencing of insert ends

Expressed Sequence Tags (EST)

Page 9: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

RNA-seq

AAAAA

Figure 1. Wang et al. (2009) Nature Reviews Genetics 10: 57-63

Sequencing of all the transcripts in a sample using NGS technologies

Page 10: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

RNA-seq mapping of short reads in exon-exon junctions

CCGAAAATCAAGTCATCCCTAAAGACTAAGTAAGTAACCATATTACATTAAGGAAGGCACTTTAAAAGTTTATAATCATTTGTAGACTCCCACCAAAGCCACTGACTCGCAAGG

Exon Exon Intron

RNA-seq

Page 11: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figures 1 and 2. Graveley et al. (2011) Nature 471: 473-479

Discovery of new transcripts by RNA-seq in D. melanogaster

RNA-seq examples

Page 12: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Expression profile by RNA-seq of the D. melanogaster gene eve in different developmental stages

Quantification and determination of expression profiles

D. melanogaster RNA-seq data as shown in GBrowse (FlyBase)

RNA-seq examples

Page 13: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Independence of the existence of an available genomic sequence

Detection of new transcripts

Single-nucleotide precision

Detection of splicing variants and alternative transcription starts and ends

Detection of SNPs in transcribed regions

Detection of allele-specific transcription

Accurate quantification of expression levels (wide range of measurements)

Great reproducibility

Small amount of initial RNA needed

RNA-seq advantages

Page 14: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Table 1. Wang et al. (2009) Nature Reviews Genetics 10: 57-63

RNA-seq advantages

Page 15: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

CTGAATAAATCCA

Polyadenylation signal

Methionine TRANSLATION INITIATION

Regulatory elements

Promoters

ACTGATGTCCA

TATA

TRANSCRIPTION START SITE

TRANSCRIPTION TERMINATION SITE

CCGATAAATCC STOP codon

TRANLATION TERMINATION

5’ UTR 3’ UTR

ORF

DNA

mRNA AAAAAAAAA

polyA tail

Splicing

mRNAs

Page 16: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 1. Nielsen and Graveley (2010) Nature 463: 457-463 Figure 1. Li et al. (2007) Nature Reviews Neuroscience 8: 819-831.

Internal exons

Initial/final exons

Exon inclusion/skipping

Alternative 5’ splice site selection

Alternative 3’ splice site selection

Intron retention

Alternative splicing

Page 17: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 8.22. Evolution. Barton et al. (2007) Cold Spring Harbor Laboratory Press

Alternative promoters Exon inclusion/skipping Alternative polyA sites

Alternative splicing example: α-tropomyosin

Alternative 3’ splice site selection

Page 18: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 2. Nielsen and Graveley (2010) Nature 463: 457-463

Extreme alternative splicing examples

>500

38016

28

Number of isoforms

Page 19: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Prevalence of alternative splicing in Drosophila

7473 genes are alternatively spliced

60.7% out of 12295 expressed genes with multiple exons

Table 1. Graveley et al. (2011) Nature 471: 473-479

Page 20: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 2. Wang et al. (2008) Nature 456: 470-476

92-94% of human genes show alternative splicing 86% of human genes generate two different transcripts in significant amounts (minor isoform frequency of 15%) Many alternative isoforms are produced in different tissues as a result of a specific regulation

Prevalence of alternative splicing in humans

Page 21: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 1. Wang et al. (2008) Nature 456: 470-476

Tissue-regulated splicing variants in humans

Figure 2. Nielsen and Graveley (2010) Nature 463: 457-463

Not all possible isoforms exist

Regulation of alternative splicing

Figure 4. Graveley et al. (2011) Nature 471: 473-479

Developmentally regulated splicing variants in D. melanogaster

Page 22: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Genes tend to express many isoforms simultaneously

One isoform dominates in a given condition

12

0.3

¾ of the protein-coding genes have at least two different major isoforms

Variability of gene expression contributes more than variability in splicing ratios to the variability of transcript abundance across cell lines

Figure 4. Djebali et al. (2012) Nature 489: 101-108

Regulation of alternative splicing in humans

Page 23: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Unanswered questions

How many of the observed isoforms are functionally relevant?

Can alternative splicing account for the higher complexity of some organisms?

Table 2. Nielsen and Graveley (2010) Nature 463: 457-463

Page 24: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Type Name Size Transcripts Function

Small non-coding RNAs

rRNAs ribosomal RNAs 114-5000 nt 531 Component of ribosome

tRNAs transfer RNAs 73-93 nt 624* Translation

snRNAs small nuclear RNAs 100-300 nt 1923 Splicing

snoRNAs small nucleolar RNAs 60-300 nt 1529 RNA modification

miRNAs micro RNAs 21-23 nt 3116 Gene expression regulation

Long non-coding RNAs

lncRNAs long non-coding RNAs >200 nt 21271 Regulation, imprinting…

* Number of transcripts from GENCODE v7 data

Number of transcripts from GENCODE v14 data

Types of transcripts

Page 25: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

rRNAs and tRNAs

rRNAs transcribed from a polycistronic transcript that is modified and processed to generate the mature 18S, 5.8S and 28S

rRNAs assemble with proteins to form the two subunits of the ribosome

tRNAs carry an amino acid to the protein synthetic machinery of a cell (ribosome) as directed by a three-nucleotide sequence (codon) in the mRNA

Essential components of the protein translation process

tRNAs

rRNAs

Page 26: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

snRNAs snoRNAs

Dredge et al. (2001) Nature Reviews Neuroscience 2: 43-50 Eddy (2001) Nature Reviews Genetics 2: 919-929

snRNAs and snoRNAs

Part of the splicing machinery Guide chemical modifications of other RNAs

Page 27: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 2. He and Hannon (2004) Nature Reviews Genetics 5: 522-531.

Small non-coding RNAs (21-23 nt) involved in the post-transcriptional regulation of gene expression by binding to the 3’ UTR of target mRNAs

Identified in the early 1990s, but recognized as a distinct class of regulators in the early 2000s

Detected in multiple species ranging from humans to mice, Drosophila, C. elegans or even plants (Arabidopsis)

Abundant in many cell types and may be involved in many different processes

Target around 60% of mammalian genes

microRNAs

Page 28: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Rinn and Chang (2012) Annual Review of Biochemistry 81: 145–166

Genomic organization

Definition Non-coding RNAs longer than 200 nucleotides

Long non-coding RNAs (lncRNAs)

Page 29: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 5. Derrien et al. (2012) Genome Research 22: 1775-1789

Lower expression levels in all tissues compared to protein-coding genes More tissue-specific expression patterns compared to mRNAs

Expression of long non-coding RNAs

Distribution of the number of Human Body Map tissues in which lncRNA and protein-coding transcripts are detected

Page 30: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Currently 21,271 annotated transcripts transcribed from 12,933 loci in the human genome Significantly more conserved than neutrally evolving sequences but at lower levels than protein-coding genes

Are lncRNAs functional?

Baker (2011) Nature Methods 8: 379–383

Byproduct

Guide

Scaffold

Long non-coding RNAs

Page 31: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

3.1 kb

1 kb

Figure 2. Huarte and Rinn (2010) Hum. Mol. Genet. 19 :R152-R161

Examples of long non-coding RNAs

lincRNA-p21 represses many genes and results in cellular apoptosis GAS5 is induced under starvation and growth arrest. It competes with glucocorticoid receptor for DNA binding sites and results in reduced metabolism A lncRNA is transcribed from the promoter region of CCND1 induced for DNA damage, and recruits TLS protein to CCND1 (cyclin D1) and represses its expression, interrupting cell cycle

Page 32: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 3. Harrow et al. (2012) Genome Research 22: 1760-1774

≈10,000

≈3,000

≈29

≈175

Definition Genes that have lost their coding ability

Types

Pseudogenes

Page 33: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

863 pseudogenes are transcribed and associated with active chromatin in the human genome

Can pseudogenes have a function or they are just what remains of inactivated genes?

PTENP1 pseudogene protects PTEN from miRNA silencing, and therefore has a tumor suppressive function

Figure 1. Poliseno et al. (2010) Nature 465: 1033-1038

Pseudogenes

Page 34: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Mouse

Data from Su et al. (2004) PNAS 101: 6062-6067

Human

http://biogps.org

Transcript profiling across tissues

Page 35: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Transcript profiling across individuals

Figure 1. Cheung and Spielman (2009) Nature Reviews Genetics 10: 595-604

Different expression levels of a given gene are detected in different individuals

Page 36: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Regulatory changes have unique properties that could make them especially important in phenotypic evolution

Reduced pleiotropical effects

Fine-tuning of gene function

Co-dominance and more efficient selection

Coding vs. Regulatory changes

Page 37: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Lactase production in adults shows large variability in human populations and seems related with pastoralism

In most mammals ability to digest milk disapears with age and is related to the production of the lactase enzyme

Figure 1. Itan et al. (2010) BMC Evolutionary Biology 10:36

Persistence of lactase expression

Page 38: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 1. Tishkoff et al. (2007) Nature Genetics 39: 31 - 40

Persistence of lactase expression

Page 39: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Regulatory elements are difficult to predict:

Small ( <50 pb)

Variable sequence motifs

Few nucleotide positions are really important

Poorly conserved and with not defined locations

Regulatory elements:

Core promoter

Proximal elements

Distal enhancers (upstream / downstream)

Figure 1. Ong and Corces (2011) Nature Reviews Genetics 12: 283-293

Regulatory elements

Page 40: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

ChIP-seq

Figure 1. Massie and Mills (2008) EMBO reports 9: 337-343. Figure 2. Park (2009) Nature Reviews Genetics 10: 669-680.

Chromatin immunoprecipitation (ChIP) + Sequencing

Detection of transcription factor binding sites and other DNA-protein interactions

Page 41: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

PHASES

• Pilot phase (2003-2007) 1% of human genome (44 regions, a total of ≈30 Mb)

• Production phase (2007-) Whole genome

ENCyclopedia Of DNA Elements

International project funded by the National Human Genome Research Institute (NHGRI) with the goal to identify all functional elements in the human genome.

ENCODE project

Page 42: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Figure 1. Ecker et al. (2012) Nature 489: 52-55

ENCODE project – Functional elements

Page 43: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

Maher (2012) Nature 489: 46-48

1,640 genome-wide data sets prepared from 147 cell types

ENCODE project data

Page 44: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

A total of 62.1% and 74.7% of the human genome is covered by either processed or primary transcripts, respectively No cell line expresses more than 56.7% of the union of the expressed transcriptomes across all cell lines

A large number of previously unknown transcription start sites and new transcript isoforms have been identified

Thousands of new non-coding transcripts have been detected (22,531 long-noncoding RNAs)

An initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features have been described

80% of the genome has been annotated with potentially functional elements

ENCODE project main results

Page 45: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism

http://genome.ucsc.edu/

ENCODE project data

Page 46: Central dogma of molecular biology - UABbioinformatica.uab.es/base/documents/masterGP/Transcriptomics2013610-1354.pdf · Genome Proteome Transcriptome Complete DNA content of an organism