rna sequencing - netherlands bioinformatics centre€¦ · from next to 3rd generation sequencing...

35
RNA Sequencing 05-06-2013, Elio Schijlen Next gen insight into transcriptomes

Upload: others

Post on 18-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

RNA Sequencing

05-06-2013, Elio Schijlen

Next gen insight into transcriptomes

Page 2: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Transcriptome complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition. Understanding the transcriptome is essential for interpreting the functional elements of the genome The key aims of transcriptomics are: to catalogue all species of transcripts, including mRNAs, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications; to quantify the changing expression levels of each transcript during development and under different conditions.

Page 3: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Recently, the development of novel high-throughput DNA sequencing methods has provided a new method for both determining, mapping and quantifying transcriptomes. This method, termed RNA-Seq (RNA sequencing) clear advantages over previous approaches is revolutionizing the manner in which eukaryotic transcriptomes are analysed

Page 4: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

454

Illumina HiSeq2000

Pacbio RS Ion proton

SOLiD 5500

Page 5: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

From next to 3rd generation sequencing

Illumina HiSeq Fluorescent nt scanning

SOLiD Ligation fluorescent oligos

454 Pyrosequencing

Ion proton Hydrogen detection

Pacbio Real time fluorescent detection

Page 6: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

From next to 3rd generation sequencing

Illumina HiSeq ssDNA sequence template

• Clonaly amplified into clusters on glass slide (flow cell)

SOLiD 5500 idem

454 ssDNA sequence template

• Clonaly amplified on beads (emPCR)

Ion proton idem

Pacbio dsDNA sequence template

• Single molecule/polymerase molecule complex

Page 7: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Illumina HiSeq2000

Syringe pumps

Reagents

compartment

Optics

Flow cell

access door

Flow cell

8 channels

Page 8: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Illumina HiSeq2000

Library Preparation

DNA (0.1-5.0 μg)

C

C

C

C

A

A

A

T

T

G

G

G

G

Sequencing

Single molecule array

Cluster Growth 5’

5’ 3’

T G T A C G A T C A C C C G A T C G A A

1 2 3 7 8 9 4 5 6

T G C T A C G A T …

Image Acquisition Base Calling

Page 9: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Eusol BACs 177.14 M PF clusters; 33.8 Gb>Q30

Lane Sample ID Sample Ref Index Description Yield (Mbases) % PF # Reads

% of raw clusters per lane

1 lane1 unknown Undetermined

Clusters with unmatched barcodes for lane 1 3,234 87.47 36,608,108 9.74

1 plate10 EUsol_fill_gaps TAGCTT 3,359 94.77 35,088,534 9.34

1 plate1 EUsol_fill_gaps ATCACG 4,150 95.35 43,091,246 11.47

1 plate2 EUsol_fill_gaps CGATGT 3,480 95.66 36,020,422 9.59

1 plate3 EUsol_fill_gaps TTAGGC 3,496 95.27 36,331,200 9.67

1 plate4 EUsol_fill_gaps TGACCA 4,674 95.4 48,508,022 12.91

1 plate5 EUsol_fill_gaps ACAGTG 2,305 93.65 24,365,574 6.49

1 plate6 EUsol_fill_gaps GCCAAT 1,895 94.83 19,783,144 5.27

1 plate7 EUsol_fill_gaps CAGATC 3,366 94.9 35,115,836 9.35

1 plate8 EUsol_fill_gaps ACTTGA 2,592 95.29 26,934,126 7.17

1 plate9 EUsol_fill_gaps GATCAG 3,232 94.59 33,829,830 9.01

Page 10: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

SOLiD 5500

Page 11: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

454 sequencing technology & workflow

Page 12: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

NGS - 454 pyrosequencing raw read

GCTAAG

Page 13: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Ion semiconductor sequencing

Page 14: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Ion Torrent PGM & Proton

Page 15: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

3d Gen Sequencing: PacBio

SMRT sequencing

Kb read length

<50,000 reads

<100 Mb

Page 16: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Pacbio sequencing

Phospholinked

Cleavage by DNA polymerase

• Fluorophore clipped off by polymerase

• DNA synthesized is natural

• No steric hindrance or accumulation of

background signal ZMW Zero Mode Waveguide

Page 17: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Sequence read length (raw), quality

Illumina HiSeq fixed 50 or 100 nt, SR and PE

SOLiD 5500 fixed 75 nt

454 range 50-1,000 nt (av~750)

Ion torrent range 50-200 nt (av ~170)

Pacbio range 50-20,000 nt (av ~3-4 kb)

Page 18: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Sequence read quality

Illumina HiSeq HQ reads, systematic errors

• Lower quality 3’ends

• Low GC coverage

SOLiD very HQ reads

• Lower quality 3’ends

454 HQ reads, sytematic errors

• Homopolymer problems

• Clonality

• Lower quality 3’ends

Ion torrent idem, but lower overall quality

Pacbio Low Quality (0.8-0.85)

• Random errors

• No decrease read quality 3’end

Page 19: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Sequence reads & throughput/run

Illumina HiSeq 1.5 E+09 full flowcell, 12days/run

• Up to 550 Gb (2 cells)

SOLiD 5500XL 1.5 E+09 full flowcell, 6days/run

• Up to 240 Gb (2 flow chips)

454 1 E+06 full PTP, 1 day/run

• Up to 1 Gb

Ion torrent 60-80 E+06 ionPI chip, 4 hours/run

• Up to 10 Gb

Pacbio 300,000 (8 cell strip), 1day/run

• Up to 0.75 Gb

Page 20: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Transcript coverage

Page 21: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

DNA Samples for sequencing

1

mRNA

Small RNA

Other Apps ChIP-Sequencing

Genomic DNA Active Chromatin

Library preparation: Ligate adapters to both ends of

fragmented nucleic acid

Page 22: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

RNA input requirements

RNA: DNA free, RNAse free, non degraded, No contaminants (proteins, polysaccharides)

Page 23: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase
Page 24: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Protocol variations Fragmentation methods RNA: nebulization, hydrolysis cDNA: sonication, Dnase I treatment Depletion of highly abundant transcripts Positive selection of mRNA . Poly(A) selection or target specific Negative selection. (RiboMinus, RNAseH) Strand specificity Most RNA sequencing is not strand-specific Single-end or Paired-end sequencing

Page 25: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

(Illumina) RNA seq workflow

Page 26: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Aligning the millions of reads to a "reference genome". many tools available for aligning genomic reads to a reference genome (sequence alignment tools), however, special attention is needed when alignment of a transcriptome to a genome, mainly when dealing with genes having intronic regions. As discussed above, the sequence libraries are created extracting mRNA using its poly(A) tail, which is added to the mRNA molecule post-transcriptionally and thus splicing has taken place. Therefore, the created library and the short reads obtained cannot come from intronic sequences and thus, when trying to align these short reads to a reference genome, only short reads aligning entirely inside exonic regions will be matched while short reads from exon-exon junction regions will not. Several software packages exist for short read alignment, and recently specialized algorithms for transcriptome alignment have been developed, e.g. TopHat and Cufflinks.

Page 27: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Sequences coverage

A.thaliana:approx 60E+06 mapped reads

result in plateau of unique gene models

expressedm(approx 20,000)

Page 28: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase
Page 29: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Multi mapped 50nt SR reads (A.thaliana ~5%) can cause inaccurate expressin estimates

Tubulin B chain

reads mapped to reference

genome (gray)

Blue lines intron spanning reads

Histograms read coverage

Blue multimapped contributed

Green unique mapped contributed

Including multimapped artificially

increases expression value

Readmapping 2 genes sharing

genome region by their 3’end on

opposite strands

Multimapped reads derived from +

strand would severly overestimate

expression of – strand gene.

Page 30: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase
Page 31: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase
Page 32: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase
Page 33: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Ekblom et al., 2012 Comparative and Functional Genomics doi:10.1155/2012/281693

Page 34: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Wenger and Galliot BMC Genomics 2013, 14:204 doi:10.1186/1471-2164-14-204

Page 35: RNA Sequencing - Netherlands Bioinformatics Centre€¦ · From next to 3rd generation sequencing ... Ion proton idem Pacbio dsDNA sequence template •Single molecule/polymerase

Some considerations The information gathered by RNAseq has similar limitations as other RNA expression analysis pipelines. RNA status dependent • Biological variable: Tissue specific; Time dependent. Triplicates! • During a cell's lifetime and context, its gene expression levels change. • Strongly RNA quality dependent Library prep method dependent Sequencing technology dependent Analysis method dependent Because of this, care must be taken when drawing conclusions from the sequencing experiment. Results must be verified using independent technology