rnaseq short intro - göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf ·...

22
RNA-Seq practical Basic processing: UNIX tools and IGV Erik Larsson

Upload: others

Post on 31-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

RNA-Seq practical!

Basic processing: UNIX tools and IGV!

Erik Larsson

Page 2: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

RNA-Seq practical!

•  Tophat!– Alignment!

•  IGV!– Visualization!

•  Cufflinks!– Gene discovery!– Find differentially expressed genes!

!

Page 3: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

<3% coding sequence

~40% coding genes

GGGGTGAGATCTGGCTGGGTAGGGCTGTTTGACAGGGACACAGTTCACGGCCTGGGACTTGCCAACAAAGTCACCCTGTAGTTCAGGTGACACACAAGTGGATGGGGAGGGTGAGACCCAGGATCTCTTCTCCCCCAGGTCCTTATGAGGGGCTGGAGGAGACAGAACTGGGGTGCTGGACCCTCAGCATAAAGAATGCTATAGGCTGGGCATGGTGACTCATGCCTGTAAATCCCAGCGTTTTGGGAGGCCAAGGCGGGCAGATTGCTTGAGCCCAGAAATTTGAGACCAGCCTGGGCAACATAGCGAGACCCCGGGCAACATAGCGAGACCCCATCTCTAAAAAAATAAAATAAAATTAGCCAGGTTGGTGGCACAAGTCTGCAATTCTAACTACTTGGATGGGCTGAGATGGGAGGATCACTTGAGCCTGGGAGGTCAAGGCTGCAGTGAGCTGTGATTGTGCCACTGCACTCCAGCCGAGGGGACAGAGTGAAACCTTGCCTTAAAAAGACTGCTATGGCCCGAGTCCCTCTGCTGTGCCGGGCACTGTGCTGGGCATGTAACAGGCATATTCTTCTGATCTTTACAACTCTCCCATGAGGCAGGCACTATCGTTAGCCCATTTTACAGATGTGGCCATAGAGGCCCAGAGAGGAGAAGGGGCTTACCTAAGGCTATAGACTGTTGGTATCTGGAGATAAACCCGGGATGGTGCTCACTAAACTACCTTGGGTGTCAGTCCTGCTTCAAGACTCCAGAGAGATAAAGAGAGATGACCTCAGAGACAAAGAGACTCAGACCCAGCCAGAGGCCCAATGGACAGTGGGAGGGGTGGGTGGAAGAAGGCTGGTCTCTGTCTGACCAAGCCCCCCCAGAATAACGCAGGCTGCCCCCCTAGGTGGAAACAATGACACAATCAGCTCCCAATACCAAGGGCCTGACATCACAAGGGGAGGGGAAGGCAGCTGAGGTTGTGGGGGGAGGTGCCCCGCCCCTTGGCAGGCCCCTACAGCCAATGGAACGGCCCTGGAAGAGACCCGGGTCGCCTCCGGAGCTTCAAAAACATGTGAGGAGGGAAGAGTGTGCAGACGGAACTTCAGCCGCTGCCTCTGTTCTCAGCGTCAGTGCCGCCACTGCCCCCGCCAGAGCCCACCGGCCAGCATGTCCTCTGCTCACTTCAACCGAGGCCCTGCCTACGGGCTGTCAGCCGAGGTTAAGAACAAGGTAGGGCTGGAGGGCCTCCCTGGCCTGGCCCACACGTCCTGCCAGGCCAGAGCCCTGAGCTTGGGGTCCCTTGAACCCCCTCCTGCCTATCCTATGTGACTTGGAAACTGAGAGGGGAAAAGGGAGTGATATGGGATAGGGGCTGCCTGTCTCCCCCTGAACATCCCGGAGCCCCCAGCTATGGTTGGGGCTGGAATGGGGGGGCACACAGCCACACATAAACAGAGGGGGTCAGTCCATTGCAAAGATACCCACCTGATCAGTCTTCTGTTAACCCTTCGTGTTCTTGGGGGGAACAACATAGGGGGAAGACTTGTTGATTTTTCCATATCCCCCGGCCTGACAAAGAAATTGGGGAGCGCTTGAGTGCTGGGGTACCTGGGAAGTGACGCCGTGAAAGTGTGGGAGATCCTGAAGACAGAGGGGGACGGTGAAAGGCAGGAAGCGGGCATCAGAAGTGCGGCAGGGGTCTCCTGACTGTGGAGCTAGGAAGATACCTGGACACCACCTTCATGCTATGGTTGGGTAAACTGAGGTTCGGAGAGGAGAGGCAAATAGCTGGGGTCCCAGGTAAAGCAGGTACAGCGCTCGGACCCTGGACTCACCCCCCATACACCAGGATGGGCTCAGCTTCTCCCAGCTGGAGAACTTTAAGTTTCCAGCCCACTGGAATCGCCCCAACAGTATTGCCGAGGGAGGAGTTCCTGCCCCATTTGACAGAGGGGAACACTGAGGCTCAGGGTGGCTTTTCCCAGGGTCCCATGGTGAGGAAGTGGGGGACTGGGTTGGAACCTGGGTCGAGGGATCTCGGGGCTGGAGGAGGGGGCTGGTGGGGGGCGGGTCCTCGGGCGAGAGACAGATCCCAGCGCCGCCCTCCTCCCCCCCAGCGCCGGCCCCAGAGCCGCGCAGAGCCGCGCAGAGACGCCGCGCCTTATAAGGCGGCCTCGGGGAGCCCGGGCCACGCTATATAAGGGCCGGTTTGCTTTATAAAGCCGGGCTGGTGGCGTGGGGGGCGGCAGGGCCAGGGCCAGGTGAGGGGGCCGCCCCTCCCACCTCCCCCCACTCACCCGGGAGAAGAAGAGGCAGCCCGGTCCCCTAGGGGCTGGGAGCCTGGCTGGGCTTGGGCGGAGGGTTCTGGAGAAATGGGAGTGGAGTGGGGGAGGGGGGGGACAGTGGAGAGAGGGAAAAGCAGGGAGGTGGGGGGAGAGGCAGACAGAGATACTGGGAGCCTGAGACACCCTAGGGACAGACGGGGGAGGGCGAGCCAGGAGCGAGATAAGACCTAGACAAGGATGGAGGGGCAGGGAGAGGAGACAGAGCCCCACCACCCCCACCCCAGGCAGGAAACCTGGAGACAGAGAAAGACCTAGAGAGGCAGATATACAAGACCCAGGAGCCCTACCCCTGGCCAGACAGGGACTAGCCACCTAGAGAGATGGGGACCCAAGACTGGGCCAAGAAAAGACAGCGCTGGGGAAGAGAGAGACAGAGGAGTCGGGGGGATAAGAGGGAGAGAGACATACAGACGTGCAAGGGGTGGGGGCTAAGACAGAGACAAGCCCCCACCACTAACCAGAGACAGAGCCCTGGAGCTGAAGACCTGGGGGACACGGAGAGACAGAGATGTATGACCAGCACTCCTCTGCAAGCCAGCACCCAGGGACACCTCCTTAGACATCCTTCTTCCCTTCCTGAGGTGCCCTCTCTTCCAACAGGGGGCACAGAGGGGGCAGGGCTAGAGGAAGAGAAGCCCCAAGTTTGGCCTGGGCGAAAAACCAGGGTGCCGGGTGCCACCCCTCTAGCTCAGAGGATCCAGCTCCCCACACCCCACCCCTCATCTACATTCCCTGGTGCCAAACCTCAGAATGCCCGGAATGGCCCCCTGGGCAGGTGCCACCTCAGCCCTGGCTCTCAGCCCGCCCCAGCCCCCATCCCCCAACTATGGATCTGGGGCAAAATTGCCTTAGTTGGGAAGGACGAGGGAGATCAGGCTCTAGGAAGTTCAGACAGGACCCAGGGAGCCCAGGCTGCCCCCAATGCATCCTCACCCCTTTCTCTGTGCCCCCTGCCCTCCCCTCGCCCCAGCTGGCCCAGAAGTATGACCACCAGCGGGAGCAGGAGCTGAGAGAGTGGATCGAGGGGGTGACAGGCCGTCGCATCGGCAACAACTTCATGGACGGCCTCAAAGATGGCATCATTCTTTGCGAGTGAGTGAGGCTCTCGAAGCCGAGACCCTGCAACATCCCCCAACTCCATGCAGCCCCTCAACCCCCAAAACAACCATGATCCTGGAACTGAGTTGAACACTTTCTATTGGATACCTTTGGGGTGGCCAGTAATCATTGTGCCCATTTAACAGGCACAGAAAACTGAGGCTCAGGTGAAATGCATTGCACCAAGTCCCACGTGGTTTCAAGGGAAATGACTCTAGAATCTTAACCACCATGCTATATAGGGTAGGCCCATCTGTGGCCGCCAGAGTCCCCAGAAAGAGCGGTCACAGCTAAAAGGCAGCAGCCAACAGCTGTTCATGGCTGGCTTGGTGATGTGAGGAGAGATGTGCAGCAATAATTAAAGGAGGCCCTGGTTTTCTTTCTGTTTTCTTTTTGTTTTTTTGAGATACAGTCTTGTTCTGTTGCCCAGGCTGCAGTGCAGAGACACAATCTCGGCTCACTGCAACCTCCGCCTCCAGGGTTTAAGTGATTCTCCTGCCTCAGCCTCCCCAATAGCTGGGATTACAGGCACGCACCACCATGCCTGGCTAATTTTTGTATTTTTTTAAAGTAGAGATGGGGTTTCACCATGTTGGCCAGGATGGTTACGAACTCCTGACCTCAATTGATCCACCTACCTCAGCCTCCCAAAGTGCTGGGATTACAGGCACGTGCCACCATGCCCGGTTAATTTTTGTTTTTTTTTTTTTTTTTTCAGTAGAGATGGAGTTTCACCATGTTGACTAGGCTGGTCTTGAACTCCTGACTTCAAGTGATCCACCTGCCTTGGCCTCCCAAAGTGCTGGGATTGCAGGCACATGCCACCACGCCTGGCTAATTTTTGTATTTTTTTTTTTTTTTTTTAGTAGAGACAGTGTTTCACCATGTTGACCGGGCTGGTCTCAAACTGTGTGTGACACACACACACATGTGACAGTTGTGAAAAACACACACGTGTGTGTGTGGACACACACACACACACACACAC

~60% transcribed

Page 4: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

The human transcriptome (according to GENCODE v11)!

1,944 SnRNA

1,521 SnoRNA

1,756 MicroRNA1,190 Misc. RNA19,999

Protein-coding12,534Pseudogene

10,419 LncRNA

Shahrouki, Larsson, Frontiers in Genetics 2012

Page 5: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

RNA-seq, RNA sequencing, transcriptome sequencing, total RNA-seq, mRNA-seq,

miRNA-seq…!

•  Many names, sometimes mean same!•  All about characterizing RNA with next-

generation sequencing (NGS) in one way or the other!

Page 6: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Microarrays vs. RNA-seq!

•  Simultaneously quantify most known genes!

•  Simultaneously quantify all known genes at high accuracy!

•  Identify new genes!•  Study splicing patterns!•  Discover mutations!•  Fusion transcripts!•  Find viruses!•  Allele-specific expression!•  …!

Page 7: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

New toys

Applied Biosystems 3730 (2002) Illumina HiSeq 2000 (2010)

50.000-100.000 bp per run ~200.000.000.000 bp per run

Page 8: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

NGS principle (Illumina/Solexa)!Take picture to figure out first base in each cluster !

Remove terminators and repeat everything many times!

Add labeled nucleotides, primers, polymerase!

Page 9: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Source: Illumina!Sequencing!!

Isolate polyA+!Fragmentation!

Add random primers!

cDNA synthesis!(first and second strand)!

Ligate adapters!

Standard RNA-seq workflow (polyA+)!

Page 10: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Directional/strand-specific RNA-seq:dUTP method!

Levin et al, Nature Methods 2010!

RNA!

dsDNA!

Adapters!

U U U U U!

U U U U U!

UNC treatment!

Page 11: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

RNA-seq data analysis!

•  Alignment!•  Gene discovery!•  Expression quantification!•  Testing for differential expression!•  Variant discovery!

Page 12: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Pairwise alignment

•  Figure out where one sequence belongs within another sequence

•  Trivial if not for substitutions, insertions, deletions

Genome: TGCGTACGCTCGATAGCTCGCATCGCTAGCCTCGCATAGCTAGCGATCGT

TCGCATCGCTAGCCTCGCAGAGCTAGC RNA:

||||||||||||||||||| |||||||

Page 13: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Aligning RNA-seq reads!

•  Why? Figure out from where the were transcribed!!•  Required prior to most analyses!!Two main options:!•  Align to transcriptome!

–  Fast, simple!–  Avoids problems with “spliced”/junction-spanning

reads!•  Align to genome!

–  Requires specialized RNA-seq aligner (can handle junction-spanning reads)!

Page 14: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Gapped alignments

•  Aligners for RNA-seq will need to handle gapped alignments

•  Junction-spanning reads will otherwise be lost

Genome:

Spliced mRNA: AAA

NGS reads:

Page 15: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Splice-junction aware aligners!

•  TopHat!– Popular option, big online user community!– Finds new junctions but can be guided by

known annotation!– Cuts up reads into smaller pieces and calls

the Bowtie short-read aligner!•  SOAPsplice!•  SpliceMap!•  …!

Page 16: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

TopHat output visualized using IGV(human ACTB locus)!

Page 17: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

RNA-seq data analysis!

•  Alignment!•  Gene discovery!•  Expression quantification!•  Testing for differential expression!•  Variant discovery!

Page 18: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Transcriptome assembly/gene discovery!

•  Task:!– Use aligned reads to discover genes and

figure out transcript structures !•  Tools:!

– Cufflinks!•  Most popular choice!•  Lots of online support, actively developed!

– Scripture!– Trans-ABySS!

Page 19: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Cufflinks discovers new transcripts/genes from aligned reads!

Aligned reads!

Discovered transcript isoforms!

Abundance estimates!

Page 20: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

RNA-seq data analysis!

•  Alignment!•  Gene discovery!•  Expression quantification!•  Testing for differential expression!•  Variant discovery!

Page 21: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

Testing for differential expression!

•  Normal t-test not optimal!– RNA-seq is “digital” rather than continuous!

•  Negative binomial distribution is better!– EdgeR, DeSeq!

•  Runs in R environment!– Cuffdiff (Cufflinks package)!

•  +Easy: use alignments without prior quantification!•  +Can test for differential splicing!•  -Very conservative!

Page 22: rnaseq short intro - Göteborgs universitetbio.lundberg.gu.se/courses/vt13/rnaseq_intro.pdf · RNA-seq data analysis! ... – Popular option, big online user community! – Finds

http://bio.lundberg.gu.se/courses/vt13/rnaseq.html

Read intro carefully!

Good luck!!