![Page 1: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/1.jpg)
Jason WilliamsCold Spring Harbor Laboratory, DNA Learning Center
[email protected]@JasonWilliamsNY
DNALC Live
Intro to RNA-Seq with JupyterPart II
![Page 2: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/2.jpg)
DNALC LiveThis is an experiment, give us feedback
on what you would like to see!
![Page 3: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/3.jpg)
dnalc.cshl.edu
DNALC Website and Social Media
dnalc.cshl.edu/dnalc-live
![Page 4: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/4.jpg)
youtube.com/DNALearningCenter
DNALC Website and Social Media
facebook.com/cshldnalc
@dnalc
@dna_learning_center
![Page 5: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/5.jpg)
Who is this course for?
• Audience(s): • Undergraduate biology 200 level and up • (advanced AP Bio/graduate)
• Format: 2 sessions (1 per week); ~ 45 minutes each
• Exercises: Follow along through CyVerse
• Learning resources: Slides and online lesson available
![Page 6: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/6.jpg)
Course Learning Goals• Understand the rationale of an RNA-Seq experiment and its design
• Learn about the Linux command line
• Use Jupyter (SRA Toolkit) to import sequence data
• Use Jupyter (FastQC/Trimmomatic) to quality check/trim sequence data
• Use Jupyter (Kallisto) to (pseudo)align reads
• Use Jupyter (genomeview/UCSC) to explore RNA-Seq results
![Page 7: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/7.jpg)
Lab Setup• We will be using CyVerse VICE – You can get a free account at cyverse.org
(required)
![Page 8: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/8.jpg)
Intro to RNA-Seq with JupyterPart II
(alignment and visualization)
![Page 9: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/9.jpg)
Steps for today’s session
• Review data set and data cleaning
• Learn about alignment of reads
• See how to visualize data in a genome browser
![Page 10: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/10.jpg)
RNA-Seq with DNA Subwaydnalc.cshl.edu/dnalc-live
![Page 11: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/11.jpg)
Introduction to RNA-Seq
![Page 12: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/12.jpg)
What is RNA-Seq? - measuring the transcriptome
• To understand what genes are active, and under what circumstances, we must know what genes are being transcribed into messenger RNA
![Page 13: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/13.jpg)
What is RNA-Seq? - measuring the transcriptome
• To understand what genes are active, and under what circumstances, we must know what genes are being transcribed into messenger RNA
• A cell in the liver has the same DNA instructions as a neuron in the brain. However the genes being expressed differ greatly between these cells
![Page 14: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/14.jpg)
What is RNA-Seq? - measuring the transcriptome
• RNA-Seq allows us to measure the transcriptome – take an account of all transcription occurring in a cell/tissue
![Page 15: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/15.jpg)
What is RNA-Seq? - measuring the transcriptome
• RNA-Seq allows us to measure the transcriptome – take an account of all transcription occurring in a cell/tissue
• We use the abundance of an RNA transcript as a proxy for the activity of some cellular process (e.g. protein synthesis, regulatory activity)
![Page 16: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/16.jpg)
What is RNA-Seq? - measuring the transcriptome
• RNA-Seq allows us to measure the transcriptome – take an account of all transcription occurring in a cell/tissue
• We use the abundance of an RNA transcript as a proxy for the activity of some cellular process (e.g. protein synthesis, regulatory activity)
• We analyze these data to compare samples (e.g. cancerous vs. non-cancerous)
![Page 17: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/17.jpg)
What is RNA-Seq?
![Page 18: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/18.jpg)
Introduction to our data set
![Page 19: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/19.jpg)
An NSF Research Collaboration Network
![Page 20: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/20.jpg)
Leptin expression vs. diet – RNA-Seq pilot lesson
Colon tissue/tumors in mice raise on Regular (RD) or High-fat (HFD) diet
![Page 21: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/21.jpg)
Leptin expression vs. diet – RNA-Seq pilot lesson
Colon tissue/tumors in mice raise on Regular (RD) or High-fat (HFD) diet
![Page 22: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/22.jpg)
Sequence data from NCBI
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA353374
![Page 23: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/23.jpg)
Access lessons and sign in on CyVerse
![Page 24: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/24.jpg)
Phred scores…
Phred Score Error (bases miscalled) Accuracy10 1 in 10 90%20 1 in 100 99%30 1 in 1,000 99.9%40 1 in 10,000 99.99%50 1 in 100,000 99.999%
![Page 25: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/25.jpg)
Lab – Sequence alignment
![Page 26: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/26.jpg)
There are many roads to RNA-Seq
Photo credit:Conesa et al. Genome Biology (2016) 17:13DOI 10.1186/s13059-016-0881-8
![Page 27: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/27.jpg)
RNA-Seq with Kallisto
![Page 28: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/28.jpg)
RNA-Seq with Kallisto
Kallisto (pseudo)aligns reads to a reference transcriptome
1. An index is built of the reference transcriptome
2. Sequence reads are (pseudo)aligned to transcripts
![Page 29: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/29.jpg)
Reference transcriptome
A collection of “all” the transcripts in an organism
Ensembl tour: https://useast.ensembl.org/Homo_sapiens/Info/Index
![Page 30: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/30.jpg)
Kallisto – Pseudoalignment
Photo credit:http://mcb112.org/w02/w02-lecture.html
![Page 31: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/31.jpg)
Kallisto results
![Page 32: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/32.jpg)
Kallisto results
• target_id: Identifier for the transcript (from Ensembl)
![Page 33: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/33.jpg)
Kallisto results
![Page 34: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/34.jpg)
Kallisto results
• target_id: Identifier for the transcript (from Ensembl)
• length: length (nucleotides) of transcript exons
![Page 35: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/35.jpg)
Kallisto results
• target_id: Identifier for the transcript (from Ensembl)
• length: length (nucleotides) of transcript exons
• eff_length: length of transcript that was sampled*
*In the original sequencing library, we rarely sample whole entire transcripts, this number accounts for the fragment length of the library
![Page 36: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/36.jpg)
Kallisto results
• target_id: Identifier for the transcript (from Ensembl)
• length: length (nucleotides) of transcript exons
• eff_length: length of transcript that was sampled*
• est_counts: The estimated number of reads that have mapped to the transcript
*In the original sequencing library, we rarely sample whole entire transcripts, this number accounts for the fragment length of the library
![Page 37: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/37.jpg)
Normalization – gene length
Gene A Gene B
Which is longer (bp)?
![Page 38: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/38.jpg)
Normalization – gene length
Gene A(300bp)
Gene B (100bp)
Which has more reads?
![Page 39: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/39.jpg)
Normalization – gene lengthWhich has more reads?
Gene A(300bp)
Gene B (100bp)
![Page 40: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/40.jpg)
Normalization – gene length
1627Gene A(300bp)
Gene B (100bp)
![Page 41: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/41.jpg)
Normalization – gene length
Gene A(300bp)
27/300 = 0.09
Gene B (100bp)
16/100 = 0.16
1627
![Page 42: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/42.jpg)
Goal recap• Understand the rationale of an RNA-Seq experiment and its design
• Learn about the Linux command line
• Use Jupyter (SRA Toolkit) to import sequence data
• Use Jupyter (FastQC/Trimmomatic) to quality check/trim sequence data
• Use Jupyter (Kallisto) to (pseudo)align reads
• Use Jupyter (genomeview/UCSC) to explore RNA-Seq results
![Page 43: Intro to RNA-Seq with Jupyter Part II · • Understand the rationale of an RNA-Seq experiment and its design • Learn about the Linux command line • Use Jupyter (SRA Toolkit)](https://reader036.vdocument.in/reader036/viewer/2022071218/6050525761772b4f052c43d6/html5/thumbnails/43.jpg)
dnalc.cshl.edu
DNALC Website and Social Media
dnalc.cshl.edu/dnalc-live