methods, applications & analysis of scrna-seq• project success depends on recognition and good...
TRANSCRIPT
Methods, applications & analysis of scRNA-Seq:How an integrated understanding of every step makes for a better single cell
experiment
Michael Kelly, PhDTeam Lead, NCI Single Cell Analysis Facility
2018 ABRF Meeting – Satellite Workshop 4Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq)Sunday, April 22
End-to-End Support for Single Cell RNA-Seq
• Project success depends on recognition and good integration of all aspects of the single cell genomics workflow
• Failures in one aspect can lead to problems downstream
• Lack of integration makes troubleshooting especially difficult
• Relatively “easy” to collect data; potential to get “stuck” at last step
Experimental Design
Sample Prep &
Handling
Capture & Library Prep
SequencingBioinformatic Analysis & Data Interpretation
Successful Validation &
Project Completion
Lessons learned from various scRNA-Seq projects• Selecting the right approach for the project goals• Importance of numbers and capture efficiency• Full-length transcript versus 3’ (or 5’) end-counting
• Sample prep matters• Single nuclei RNA-Seq for specific applications• Effects of low viability sample preps on downstream data quality
• Data wrangling• What single cell RNA-Seq looks like and why the reference is important
• A proposed model for integrated informatics support
Selecting the right approach for the project goals
Importance of numbers and capture efficiency
Single Cell RNA-Seq to Identify Cell-Type Specific Transcriptional Programs in Mammalian Cochlea
Modified from Waddington 1957
Prosensory Domain
Embr
yoni
cPo
stna
tal
Organ of Corti Goals of our single cell RNA-Seq study• Transcriptional programs• Understand changes in cellular plasticity• Novel cell type markers
Modified from Waddington 1957
Theoretical principle of trajectory analysis with single cell RNA-Seq data
Undifferentiated Precursors
Differentiated Cell Types
Single cell profiles at multiple profiles provide snapshots of
transcriptional expression across many cells
Modified from Waddington 1957
Theoretical principle of trajectory analysis with single cell RNA-Seq data
Undifferentiated Precursors
Differentiated Cell Types
Individual cell types can be identified from each time-point
Modified from Waddington 1957
Theoretical principle of trajectory analysis with single cell RNA-Seq data
Asynchrony in development allows for generation of a temporal model of expression across ”Pseudotime”
Undifferentiated Precursors
Differentiated Cell Types
Pseu
dotim
e
Transcriptional trajectory associated with differentiation of each unique cell type
can be analyzed
Single cell profiling with Fluidigm C1 was a good start, but was too low throughput
Original single cell dataset could only distinguish major cell type differences
From Burn & Kelly et al 2015
Drop-Seq scRNA-Seq provided better resolution than Fluidigm C1
Kelly et al 2012
HCs 1
HCs 2
Non−epithelial
Interdental med NSCs 1
med NSCs 2
lat SCs
med SCs
−80
−40
0
40
80
−100 −50 0 50tSNE_1
tSN
E_2
HCs 1HCs 2Non−epithelialInterdentalmed NSCs 1med NSCs 2lat SCsmed SCs
Drop-Seq data generated with Joey Mays (post-bac).
Outer Hair Cells
Inner Hair Cells
-Relatively low cost, once established and working consistently (not guaranteed…)-Considerable troubleshooting required (not plug & play)-Lower sensitivity (1-2,000 genes detected per cell) than Fluidigm C1 or FAC-Seq-Limited experimental design (captures back to back)
~1700 cells
6
SingleCellPartitioning,LysisandBarcoding
10
•Standardsequencing configurations
•Easiertomultiplexwithnon-SClibraries
•HighqualityUMIandCellBarcodereads
•Highperformanceonpatternedflowcells
3’AssayScheme- GelBeadRTPrimersforInlineBarcoding• Cell partitioned with barcoded gel bead
• Cell Barcode & Unique Molecular Identifiers (UMI) on 3’-end of cDNA
• ~2000 median genes detected on average with ~50,000 mean sequencing reads per cells
• Gene-level counts data
10X Genomics droplet-based single cell RNA-Seq provided high-throughput method to profile large population of cells
in unbiased manner
• High efficiency of capture in 10X platform is amenable to smaller input numbers of cells (small tissue or rare population)
• Much better suited to limited samples like mouse cochlea…
Workflow of droplet-based single cell RNA-Seqprofiling of the mammalian cochlea
• Novel Cell Type Markers
• Transcriptional Programs
Modified from Cantos et al 2000
Data Analysis.And More Data Analysis.
Extract cochlear epithelium Dissociate cells
Single cell cDNA library generation & sequencing
• ~5000 P1 cochlear epithelia cells• 99 Inner Hair Cells• 294 Outer Hair Cells• 88 Inner Phalangeal Cells• 54 Inner Pillar Cells• 229 Lateral Supporting Cells• 3753 Medial Non-sensory Cells• 74 Lateral Non-Sensory Cells
Single cell RNA-Seq of postnatal day 1 cochlea identifies expected cell subtypes
Clusters defined in unbiased manner.
Cluster identities determined from marker gene expression.
Differential expression analysis can reveal genes uniquely enriched in each cell subtype (or groups of cells)
Cochlear duct cross sectionSensory cells in Red and Green
Modified from Cantos et al 2000 (Wu Lab)
0
1 2
3
4
5
6
7
−20
0
20
−30 −20 −10 0 10 20tSNE_1
tSN
E_2
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
8
9
1011
12
13
14
15
−50
−25
0
25
−25 0 25tSNE_1
tSN
E_2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
−60
−40
−20
0
20
40
−25 0 25tSNE_1
tSN
E_2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0
12
3
4
5
6
7
8
9
10
11
12
13
14
−25
0
25
−20 0 20 40tSNE_1
tSN
E_2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
E12 Cochlea E14 Cochlea E16 Cochlea P7 CochleaP1 Cochlea
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
−25
0
25
50
−25 0 25 50tSNE_1
tSN
E_2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
4,153 Cells 10,489 Cells 9,741 Cells 9,541 Cells 3,812 Cells
As cells move from undifferentiated precursors to differentiated cell types, they become more transcriptionally distinct
Greater number of scRNA-Seq datapoints allows better modeling of dynamic expression changes during differentiation
Selecting the right approach for the project goals
Full-length transcript versus 3’ (or 5’) end-counting
Full-length scRNA-Seq should allow isoform discrimination / quantitation – requires sufficient coverage
0.50
0.75
1.00
1.25
0 25 50 75 100Position
valu
e
variableS37_A02
S37_A03
S37_A04
S37_A05
S37_A06
S37_B01
S37_B02
S37_B03
S37_B05
S37_C01
S37_C02
S37_C03
S37_C04
S37_C05
S37_C06
S37_D01
S37_D02
S37_D03
S37_D04
S37_D05
S37_D06
S37_E01
S37_E02
S37_E03
S37_E04
S37_E05
S37_E06
S37_F01
S37_F02
S37_F03
S37_F04
S37_F05
S37_F06
S37_G01
S37_G02
S37_G03
S37_G04
S37_G05
S37_G06
1
2
3
0 25 50 75 100Position
valu
e
variableS13_A01
S13_A02
S13_A03
S13_A04
S13_B01
S13_B02
S13_B03
S13_B04
S13_C01
S13_C02
S13_C03
S13_C04
S13_D01
S13_D02
S13_D03
S13_D04
S13_E01
S13_E02
S13_E03
S13_E04
S13_F01
S13_F02
S13_F03
S13_F04
S13_G01
S13_G02
S13_G04
S13_H01
S13_H02
S13_H03
S27_A03
S27_A04
S27_A06
S27_B03
S27_B04
S27_B05
S27_B06
S27_C03
S27_C04
S27_C05
S27_C06
S27_D04
S27_D05
S27_D06
S27_E01
S27_E02
S27_E04
S27_E05
S27_E06
S27_F01
S27_F03
S27_F04
S27_F05
S27_F06
S27_G01
S27_G02
S27_G03
S27_G04
S27_G05
S27_G06
S27_H03
S27_H04
S27_H05
S27_H06
5’ to 3’ Coverage Plots
Fluidigm C1 (v1 Chemistry) Clontech SMARTer v4 and SMART-Seq2
Single cell gene specific target amplification & qPCR to study crucial differential isoform usage in developing cochlea
Whole transcriptome scRNA-Seq data was too sparse and had 3’ bias – not reliable enough. Moved to STA -> qPCR approach
Mutation in mice and humans in intron of gene leads to mis-splicing and deafness. Cell type specific phenotype. C1 Biomark HD
Collaboration with Banfi Lab at Univ of Iowa – Nakano et al paper in final review
Sample prep matters
Single nuclei RNA-Seq for specific applications
Single nucleus RNA-Seq provides a useful alternative to whole single cell RNA-Seq
• Allows scRNA-Seq profiling of difficult to dissociate tissues• Neuronal tissues• Biobanked samples
• Avoid dissociation-associated transcriptional changes• Nuclear transcriptome
representative of whole cell transcriptome• Lower sensitivity & more
intronic readsLake et al 2016 Science
Habib et al 2017 Nature Methods
Single nuclei DropSeq to study spinal cord neurons and cell type specific activity in behavior
Sathyamurthy & Johnson et al (2018) Cell Reports
Modified detergent concentration and adjusted flow rates to optimize single nuclei encapsulation efficiency
Time post-treatment of target signal upregulation
Habib et al 2017 Nature Methods “DroNc-seq“
0
25000
50000
75000
Fres
h
Nuc
lei
Tim
e
Actin
oD
Identity
nUMI
0
2500
5000
7500
Fres
h
Nuc
lei
Tim
e
Actin
oD
Identity
nGene
0.00
0.05
0.10
0.15
Fres
h
Nuc
lei
Tim
e
Actin
oD
Identity
percent.mito
0
1
2
3
4
5
Fres
h
Nuc
lei
Tim
e
Actin
oD
Identity
Fos
0
1
2
3
4
Fres
h
Nuc
lei
Tim
e
Actin
oD
Identity
Jun
Single nuclei RNA-Seq has limited sensitivity – still good for identifying major cell types
Hair Cells
Supp Cells
GER1GER2
GER3GER4
InterdentalLER
Oc90+
Mes
Igfbp7+−20
0
20
−20 0 20tSNE_1
tSN
E_2
Hair Cells
Supp Cells
GER1
GER2
GER3
GER4
Interdental
LER
Oc90+
Mes
Igfbp7+
IHC OHC
IPhC
IPC
OPCDC1
DC2
DC3Hensen
Border GER1
GER2GER3
GER4Interdental
LERBmp4+
Oc90+
−20
0
20
40
−30 0 30tSNE_1
tSN
E_2
OHC
IPhC
IPC
OPC
DC1
DC2
DC3
Hensen
Border
GER1
GER2
GER3
GER4
Interdental
LER
Bmp4+
Oc90+
Evaluating sensitivity and effectiveness with cochlear epithelial cells
Sample prep matters
Effects of low viability sample preps on downstream data quality
scRNA-Seq capture input of a high viability cell preparation is important for
• 10X Genomics recommends loading 90% viable cells or higher
• Often the underlying causes of below target # of cells
• Can contribute a “background signal” –ambient RNA
Effects of low cell viability on downstream and options to enrich for viable cells
~10% cell viability & no dead cell removal
Dead cell removal -> 50% cell viability
No clear inflection point
Viable cells with good transcript counts; low background
Any selection or manipulation can have the effect of biasing cell populations
• Selection / enrichment is trade—off between ”cleaner” data and a potential to bias the data
• Alignment of cell rations with histology or flow cytometry can provide insight and confidence
CD45
Exp
ress
ion
Data wrangling What single cell RNA-Seq looks like and why the reference is important
Type of downstream data depends on scRNA-Seqmethod - data defined by cDNA library type
• ”Full-length” scRNA-Seq methods generate reads that can span entire transcript length –might not cover very 5’ or 3’ end
• 3’ or 5’ transcript end enriched scRNA-Seqfor gene-level counts (5’ for VDJ methods)
What does ”full-length” single cell RNA-Seq data look like?
1
2
3
0 25 50 75 100Position
valu
e
variableS13_A01
S13_A02
S13_A03
S13_A04
S13_B01
S13_B02
S13_B03
S13_B04
S13_C01
S13_C02
S13_C03
S13_C04
S13_D01
S13_D02
S13_D03
S13_D04
S13_E01
S13_E02
S13_E03
S13_E04
S13_F01
S13_F02
S13_F03
S13_F04
S13_G01
S13_G02
S13_G04
S13_H01
S13_H02
S13_H03
S27_A03
S27_A04
S27_A06
S27_B03
S27_B04
S27_B05
S27_B06
S27_C03
S27_C04
S27_C05
S27_C06
S27_D04
S27_D05
S27_D06
S27_E01
S27_E02
S27_E04
S27_E05
S27_E06
S27_F01
S27_F03
S27_F04
S27_F05
S27_F06
S27_G01
S27_G02
S27_G03
S27_G04
S27_G05
S27_G06
S27_H03
S27_H04
S27_H05
S27_H06
0.50
0.75
1.00
1.25
0 25 50 75 100Position
valu
e
variableS37_A02
S37_A03
S37_A04
S37_A05
S37_A06
S37_B01
S37_B02
S37_B03
S37_B05
S37_C01
S37_C02
S37_C03
S37_C04
S37_C05
S37_C06
S37_D01
S37_D02
S37_D03
S37_D04
S37_D05
S37_D06
S37_E01
S37_E02
S37_E03
S37_E04
S37_E05
S37_E06
S37_F01
S37_F02
S37_F03
S37_F04
S37_F05
S37_F06
S37_G01
S37_G02
S37_G03
S37_G04
S37_G05
S37_G06
5’ to 3’ Coverage Plots
Public reference annotations don’t cover all the possible transcript isoforms and polyA sites
• Be aware that you might be missing some of your reads for less mature reference annotations• 3’ (or 5’) end counting could exacerbate this? • Cell-type specific rare exon usage of alternative polyA signal?
Bulk RNA-Seq Data 3’ Only scRNA-Seq
A complex genome adds some additional challenges – 3’ ends of genes can overlap
• “Unstranded” RNA-Seq reads aligning to overlapping 3’ end can’t be reliably disambiguated
• Interestingly, 3’-end only (or 5’-only) libraries are inherently “stranded”
A proposed model for integrated informatics support
Guidance, Training & Directed Informatics Support
Experimental Design & Planning
Core Facility Embedded Bioinformatics
Two-way communication at wet-bench steps
Data QC and primary data processing
Subject matter expert query of data & identify specific testing
Basic level informatics-trained end-user
Advanced informatic analysis
Bioinformatic support: Lab-embedded, third-party, or
core-embedded
Single Cell Analysis Facility
SequencingFacility & Genomics
Tech Laboratory
NCI Genomics
CoreProtein Analysis
Core
NCI Investigator Labs (clinical & basic science)
& NIH Intramural CommunityBroader Single Cell Community
Bioinformatic Analyst will be embedded in our Single Cell Analysis Facility and will work closely with SCAF team & NCI Investigator
NIH Bethesda Campus
Frederick National Lab Campus
• Sequencing Production
• Project Tracking • R&D Innovation• Analysis Workflows
NIH Clinical Center on Main Bethesda Campus
NCI Single Cell Analysis Facility
Collaborative Bioinformatics
& NCI Data Science
Laboratory
NCI Single Cell Analysis Facility as a collaborative team (we are hiring a bioinformatic analyst!)
NCI Flow Core
Acknowledgements
Cochlear scRNA-Seq work was funded by NIDCD Intramural Research Program - DC000021 (MWK) & Other Sources of
Funding for Collaborators
Laboratory of Cochlear DevelopmentMatt KelleyJoseph MaysAlejandro AnayaMahmoud KhalilBetsy DriverWeise ChangKathryn EllisHanna SherrillTessa SandersKuni Iwasa
NIDCD Genomics & Computational Biology Core
Robert MorellErich Boger
Decibel TherapeuticsJoe BurnsAdam PalermoKathy So
CollaboratorsBanfi Lab (Univ of Iowa)Levine Lab (NINDS)Friedman Lab (NIDCD)Klein Lab (NICHD)
Developers of open-source analysis packages
NIH High-performance computing staff
NCI Center for Cancer Research Office of Science Technology ResourcesNCI Genomics Core
Val Bliskovsky, PhD & Liz Conner, PhD
Frederick National Lab - Cancer Technology Research ProgramSequencing Facility & Genomic Technology Lab
NIDCR Flow Cytometry CoreMehrnoosh Abshari