single-cell rna-seq analysis -...

37
Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology 2019, UQ, 2 July 2019 Dr Joshua W. K. Ho Associate Professor School of Biomedical Sciences The University of Hong Kong Dr Kitty Lo Dr Pengyi Yang Prof Jean Yang School of Mathematics and Statistics University of Sydney Sydney Australia

Upload: others

Post on 08-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Single-cell RNA-seq analysis

Winter School on Mathematical and Computational Biology 2019, UQ, 2 July 2019

Dr Joshua W. K. Ho Associate Professor School of Biomedical Sciences The University of Hong Kong

Dr Kitty Lo Dr Pengyi Yang Prof Jean Yang

School of Mathematics and Statistics University of Sydney Sydney Australia

Page 2: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology
Page 3: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology
Page 4: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Groupminionsbasedontheirsimilarityofphysicalappearance–clusteringIden7fyingdis7nguishingfeaturesbetweendifferentgroupsofminions–differen.alexpressionanalysis

Page 5: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Example – diverse cell types in the mouse nervous system

Zeisel(2018),Cell

Page 6: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Exponen7al growth in single cell RNA seq technologies

Svenssonetal.NatureProtocols(2018)

Page 7: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Droplet based technologies are now domina7ng

Macoskoetal.(2015),Cell

10XGenomicsisacommercialproviderofdropletbasedscRNAseqplaNorm

Page 8: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

scRNAseq experiments approaching 1 million cells

Saundersetal.,(2018)Cell

690,000individualcellsfrom9regionsofadultmousebrain

Page 9: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Number of scRNAseq tools also increasing rapidly

Downloadedfromwww.scrna-tools.org

Page 10: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Steps in scRNA-seq analysis

Zappiaetal.(2018)

Software •  CellRanger for 10X Genomics data •  https://support.10xgenomics.com/single-cell-

gene-expression/software/overview/welcome

•  Seurat (all-purpose single cell R package) •  https://satijalab.org/seurat/

•  Scanpy (A python package) •  https://scanpy.readthedocs.io/

•  Follow their online tutorial…easy to use

Page 11: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Batch effect removal

Batch effect removal •  Seurat (all-purpose single cell R

package) for very basic normalization •  Batch effect correction

•  mnnCorrect •  ZINB-Wave •  scMerge

Page 12: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

E9.5 E10.5 E11.5 E12.5 E13.5 E14.5 E15.5 E16.5 E17.5

GSE87795Suetal.

GSE90047Yangetal.

GSE87038Dongetal.

GSE96981Campetal.

N=320cells

N=389cells

N=79cells

N=448cells

Liver fetal development 7me course datasets

Page 13: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

tSNE of liver fetal development 7me course datasets

Highlightedbycelltypes Highlightedbybatches

Challenge:Strong“batcheffect”

Page 14: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

scMerge

scMergeRpackageandwebsite:h\ps://sydneybiox.github.io/scMerge/

PNAS:h\ps://doi.org/10.1073/pnas.1820006116

Page 15: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Coming back to our mo7va7onal data – Liver fetal development 7me course datasets

−20

0

20

40

−20 0 20tSNE1

tSN

E2

logcounts

−20

0

20

−20 −10 0 10 20 30tSNE1

tSN

E2

scMerge_scSEG cell_typescholangiocyteEndothelial CellEpithelial CellHematopoietichepatoblast/hepatocyteImmune cellMesenchymal CellStellate Cell

batchGSE87038GSE87795GSE90047GSE96981

−20

0

20

40

−20 0 20tSNE1

tSN

E2logcounts

−20

0

20

−20 −10 0 10 20 30tSNE1

tSN

E2

scMerge_scSEG cell_typescholangiocyteEndothelial CellEpithelial CellHematopoietichepatoblast/hepatocyteImmune cellMesenchymal CellStellate Cell

batchGSE87038GSE87795GSE90047GSE96981

BeforescMerge AQerscMerge

E10.5 hepatoblasts

E17.5 cholangiocytes

E17.5 hepatocytes

Page 16: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Cell assignment

Science questions

•  What cell types are present in the dataset?

•  Can we identify the cell types?

•  What is the cell type composition?

•  Are the cells transitioning from one state to

another?

Analysis techniques •  Clustering (unsupervised learning)

•  Classification (supervised learning)

•  Dimension reduction

Page 17: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Dimension reduced plot of our data (tSNE plot)

−20

−10

0

10

20

−20 −10 0 10 20tsne1

tsne

2

t−SNE plot

How many cell types are there? What are the cell types?

Page 18: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

k-means clustering

−20

−10

0

10

20

−20 −10 0 10 20tsne1

tsne

2

t−SNE plot

How many cell types are there? What are the cell types?

Page 19: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Clustering algorithms

k-means

Hierarchical

RaceID

SC3

CIDR

countClust

RCA

SIMLR

Luke Zappia, et al. PLoS Comp. Bio. 2018

25%+

Clustering algorithms in single cell research

Page 20: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Phase 4: Gene iden7fica7on

Science questions

•  Which genes are differentially expressed between

cell types?

•  What are the marker genes for each cell type?

Page 21: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology
Page 22: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Differences between single cell and bulk RNAseq

•  Singlecellgeneexpressionsshowabimodalexpressionpa\ern–abundantgenesareeitherhighlyexpressedorundetected.

•  Thiscanbetechnical(drop-outs).• Drop-outsleadtotechnicalzeroesinthedata.•  TechnicalzeroesareduetolowcaptureefficiencyinscRNAseqexperiments.

• Manymethodshavebeenproposedtodealwithdrop-outs

Page 23: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Differen7al expression analysis

•  Simplesta7s7caltest• Wilcoxonranktest,t-test

• MethodsdevelopedforbulkRNAseqDE•  DESeq2•  EdgeR•  Voom-Limma

•  scRNAspecific•  Seurat•  MAST•  DECENT•  D3E•  ….manymore!

Page 24: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

DE methods comparisons for scRNAseq

SonesonandRobinson(2018)Naturemethods

Page 25: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

LKS Faculty of Medicine

Making scRNA-seq analysis more scalable

Page 26: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Cloud computing to enable scalability

• Cloud computing + Big Data Framework •  Cloud computing

•  A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources

•  Key characteristics – elasticity + pay-as-you-go model •  Advantages – low entry cost + scalability

•  Big Data framework •  Hadoop – a software framework for distributed processing of big data in

large scale cluster (YARN for resource management, HDFS for big data storage, and MapReduce for analytics engine)

•  Spark – a general purpose data-analytics engine for analysis of big data using in-memory computation (allows a speed up of up to 100x compared to MapReduce)

Page 27: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Falco framework

MapReduce Spark

Andrian Yang Yang et al (2017) Bioinforma)cs Michael Troup

Page 28: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Falco framework features

•  Ease of use •  Falco provides helper script to launch

EMR cluster and submit jobs to the cluster •  User can easily configure the cluster and jobs

by modifying the configuration file passed to the helper script

• Customisation •  Falco allows user to add custom

alignment and/or quantification tools •  User will need to implement custom function to

call the aligner/quantification tool •  Custom tool must be compatible with divide-

and-conquer approach

[job_config] !name = mESC analysis job !action_on_failure = CONTINUE !analysis_script = run_pipeline_multiple_files.py !analysis_script_s3_location = s3://[YOUR-BUCKET]/scripts !analysis_script_local_location = source/spark_runner !upload_analysis_script = True !![spark_config] !driver_memory = 30g !executor_memory = 30g !![script_arguments] !input_location = s3://[YOUR-BUCKET]/mESC_clean !output_location = s3://[YOUR-BUCKET]/mESC_gene_counts !annotation_file = vM9_ERCC.gtf !strand_specificity = NONE !run_picard = True !aligner_tool = STAR !aligner_extra_args = !counter_tool = featureCount!counter_extra_args = -t exon -g gene_name!picard_extra_args = !region = us-west-2 !

Sample configuration for running analysis job

Page 29: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Benchmarking •  Single-cell RNA-seq data sets

•  Mouse embryonic stem cell (mESC) data (869 samples) •  200bp paired-end reads,1.28×1012 bases,

1.02Tb FASTQ.gz files) •  Human brain data (466 samples)

•  100bp paired-end reads, 2.95×1011 bases, 213.66 Gb FASTQ.gz files

•  Performance comparison of Falco against single-node •  STAR+featureCount (S+F)

•  Mouse: speedup of 2.6x – 33.4x •  Brain: speedup of 5.1x – 145.4x

•  HISAT2+HTSeq (H+H) •  Mouse: speedup of 2.5x – 58.4x •  Brain: speedup of 4.0x – 132.5x

System Nodes Mouse - embryonic stem cell (hours)

Human - brain (hours)

S+F H+H S+F H+H

Standalone

1 (1 process) 93.7 154.7 85.67 65.34

1 (5 processes) 29.3 33.8 99.09 67.08

1 (12 processes) 21.1 16.4 115.71 55.15

1 (16 processes) 18.5 13.6 114.11 67.98

Falco

10 7.0 2.7 32.13 65.34

20 4.1 1.6 39.64 67.08

30 3.3 1.4 57.68 67.68

40 2.8 1.1 76.08 67.98

Table 1. Runtime analysis of single cell datasets

Page 30: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Cost effectiveness using AWS spot instances

• Utilising spot instances •  AWS allows utilisation of unused Amazon

computing capacity – known as Spot instances •  Typically cheaper compared to ‘on-demand’ cost

•  To use spot instance, user needs bid for the resource

•  Use of spot instance for analysis provides a savings of ~65% compared to using ‘on-demand’ instances •  Alternative use - decrease runtime by utilising more

instances for a given ‘on-demand’ price

Figure 3. Spot instance price history for September to October

Table2.Falcocostanalysis-on-demandvsspotinstances

Table 2. Falco cost analysis - on-demand vs spot instances for STAR+featureCount

Dataset Number of nodes

Time (hours)

On-demand cost (USD)

Spot cost (USD)

% Savings

Mouse - ESC

10 8 247.20 85.67 65.34 20 5 301.00 99.09 67.08 30 4 258.00 115.71 55.15 40 3 356.40 114.11 67.98

Human - brain

10 3 92.70 32.13 65.34 20 2 120.40 39.64 67.08 30 2 179.00 57.68 67.68 40 2 237.60 76.08 67.98

Page 31: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Scaling up to a larger data set •  Data set (for Standalone + Falco) •  Single-cell Mouse oligodendrocyte from

central nervous system (SRP066613) •  6,283 samples of 50bp single-ended reads,

totalling to 231.02 Gbp stored in 200 Gb of fastq.gz file.

•  Standalone + Falco •  Preprocessing with Trimmomatic •  Alignment with STAR •  Quantification with featureCount •  Clustering with CIDR

•  Cell Ranger – custom pipeline designed by chromium •  Alignment with STAR •  Timing is approximated from runtime of a

different mouse scRNA-seq dataset

0.0

0.5

1.0

1.5

1 Process 12 Processes16 Processes Cell Ranger

Standalone

10 Nodes 40 Nodes

Num

ber o

f cel

ls p

roce

ssed

per

sec

onds

Falco

Page 32: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Falco software

• Source code •  Falco is available to download from Github •  Our work on Falco has been featured in a Nature

Toolbox article

Checkout Falco at github.com/VCCRI/Falco

Page 33: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

starmap: Immersive 3D visualisation of single cell data using smartphone-enabled virtual reality

•  EnablingwidespreaduseofVRvisualisa7onusinglow-cost($10)VRheadsets,andaperson’sownsmartphone(withawebbrowser)

•  Supportinterac7onusingheadmovement,keyboard,remotegamepad,andvoicecontrol

JianfuLiYuYao

Page 34: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

Using starmap to visualise a data set of 68,000 cells from a scRNA-seq data

Page 35: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

starmap •  starmapdemo:h\ps://vccri.github.io/starmap/

• 

•  starmapsourcecode:h\ps://github.com/VCCRI/starmap

•  bioRxivpreprint:h\ps://www.biorxiv.org/content/early/2018/05/17/324855

Page 36: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

h\ps://www.abacbs.org/giw/

Full-papersubmission(fororalpresenta7onandjournalpublica7on):Thisweek!Abstractsubmission(fororalorposterpresenta7on):1September2019?

Page 37: Single-cell RNA-seq analysis - Bioinformaticsbioinformatics.org.au/winterschool/wp-content/uploads/...Single-cell RNA-seq analysis Winter School on Mathematical and Computational Biology

THANK YOU We are recruiting: -  PhD students ($57K pa scholarship) -  Research assistants -  Postdoctoral fellows -  Bioinformaticians (staff) -  Faculty [email protected] https://holab-hku.github.io/ @joshuawkho

HKU-USydneyStrategicPartnershipFund–‘SingleCellPlus’