a primer on single-cell rna-seq analysisbioinformatics.org.au/ws/wp-content/uploads/sites/...a...
TRANSCRIPT
Aprimeronsingle-cellRNA-seqanalysis
DrMa6hewRitchie@mritchieau
UQWinterSchoolinComputaAonalBiology4thJuly2017
Abriefhistoryofgeneexpressiontechnology
Tilingarrayprobes
1.TheMicroarrayera.LongliveMicroarrays…Timeline of technologyLockhart et al., 1996
1995 2000 2005 2010 2015
De Risi et al., 1996
. . . . . .
Gunderson et al., 2004
..
5’ 3’Gene
ProbeTiling
Abriefhistoryofgeneexpressiontechnology
1.TheMicroarrayera.LongliveMicroarrays…2.DawnoftheRNA-sequencingera.Longlivesequencing…
3.Single-cellRNA-sequencingprotocolsemergeandproliferate…
Timeline of technologyLockhart et al., 1996
1995 2000 2005 2010 2015
Mortazavi et al., 2008Marioni et al. , 2008De Risi et al., 1996
. . . . . .
Gunderson et al., 2004
...
..Tang et al., 2009Islam et al., 2011 (STR-seq)
.
.Hashimshony al., 2012 (CEL-seq)
.Macosko et al,. 2015
(Drop-seq)
.
Cloonan et al., 2008
Outline
1. Pre-processingscRNA-seqdataa) Aligningreads,dealingwith
barcodesb) QualityControl(samples&genes)c) NormalizaUond) DimensionreducUon
2. Downstreamanalysis
a) DifferenUalexpression(DE)analysis
scRNA-seqworkflow
e.g.FASTQfiles
" #readsmappedtoeachgene
" UniquemolecularidenUfiers(UMIs)
" Filtercells" Filtergenes
Removecell-specificbiases
" Clustering" Hypervariability" DEanalysis" GOandPathway
analysis" Trajectoryanalysis
RawdataistypicallyavailableasaFASTQfile…
…100smillionsmorerowsofdata• FastQCisolenusefulforassessingsequencequalityhYps://www.bioinformaUcs.babraham.ac.uk/projects/fastqc/
LotsofbarcodestodealwithinscRNA-seqdata…
-Cell(sample)specificbarcodes(olenknownsequences)-Moleculebarcodes(UMIs,randomsequences)
UniqueMolecularIdenAfiers(UMIs)
Islam et. al. Nat Methods. 2014
HPRT
HPRT
GAPDH
mRNA
cDNA
UMI2
UMI3
cell 1
cell 1
UMI1 cell 1
Index UMI
Molecule counting
HPRT
HPRT
GAPDH
PCR bias
UMIhandlingso`ware
• umitools(hYp://brwnj.github.io/umitools/)• Fastqreformamngandbamde-duping
• UMI-tools(Smithetal.,GenomeResearch2017,hYps://github.com/CGATOxford/UMI-tools)• ModelsequencingerrorsinUMIsusinganetwork-basedmethod
• umis(Svenssonetal.,NatureMethods2017,hYps://github.com/vals/umis/)• HandlesbothcellularandmolecularUMIs
• scPipe(hYps://github.com/LuyiTian/scPipe)• Simplegene-centricapproach,witherrorcorrecUon(R-based)
Modelling errors in UMIs
Smith et al. Genome Res. 2017;27:491-499 (Figure 1E)
ReadalignmentandgenecounAng
Manywellestablishedmethodsforaligningshortreadstoareferencegenome/transcriptome
• BWA(Li&Durbin.BioinformaUcs2009;25:1754–60.)• STAR(Dobinetal.Bioinforma;cs.2013;29:15–21)• Rsubread(Liaoetal.NucleicAcidsRes.2013;41:e108)Oncealigned,obtaingene/transcriptcountsinaUMI-awarefashion
PseudoaligneropUons• Sailfish(Patroetal.NatBiotechnol.2014;32:462–4)• Salmon(Patroetal.bioRxiv.2015.doi:10.1101/021592.)• kallisto(Brayetal.NatBiotechnol.2016.doi:10.1038/nbt.3519.)
TableofCountsFASTQ BAM
CharacterisAcsofscRNA-seqdata
" High-resoluUon,high-dimensionalandhighlevelsofnoise" 60~70%ofcountsarezero" Eachcellexpresses1,000~8,000genes" Upto100-folddifferenceintotalcoverage" QualitycontrolisessenUal!
Gene S1 S2 S3 S4 S5 S6 S7 … …
Rp1 0 0 0 0 0 0 0 … …
Sox17 0 1 6 11 2 0 0 … …
Mrpl15 0 0 0 0 0 0 0 … …
Lypla1 12 0 0 0 0 0 18 … …
Tcea1 7 0 0 21 0 2 0 … …
Rgs2 0 0 0 0 0 0 0 … …
Cldn4 0 0 0 0 0 0 0 … …
… … … … … … … … … …
~100–1,000cells
~10,000- 40,000genes
Qualitycontrol(QC)
" Removelow-qualitycells(subsetbycolumn)" Filterbylibrarysize" Filterbynumberofexpressedgenesineachcell" ExamineMitochondrial,RibosomalorSpike-in
proporUons
" Removelow-abundancegenes(subsetbyrow)" Filterbyaverageexpressionlevel" Filterbyexpressedinatleastncells" Usealessaggressiveapproachforstudiesinvolvingrarecells
Fastq reformat(bc_trim_barcode)
Readsalignment(Rsubread::align)
Exonmapping(bc_exon_mapping)
Barcodedemultiplex(sc_demultiplex)
Genecount(sc_gene_counting)
NumberofRemovedreads
Alignmentrate
Numberofreadsmappedtointron/exon
Readspercell;unmatchedbarcodes
NumberofcorrectedUMI&filteredgenes
Genecountingmatrix
Qualitycontrolinformationmatrix
AnSCData objectforqualitycontrolandfurtherdownstreamanalysis
QualitycontrolmetricscollectedateachstepscPipe workflowA B C
D
scPipeQualityControlMetrics
PairsplotsofQCMetricscanbeuseful
Typically 5-15% of samples are discarded in this process (50% in extreme cases).
Filtersampleswithcare!
DatafromDrJamesRyall(UniMelb)
Fastq reformat(bc_trim_barcode)
Readsalignment(Rsubread::align)
Exonmapping(bc_exon_mapping)
Barcodedemultiplex(sc_demultiplex)
Genecount(sc_gene_counting)
NumberofRemovedreads
Alignmentrate
Numberofreadsmappedtointron/exon
Readspercell;unmatchedbarcodes
NumberofcorrectedUMI&filteredgenes
Genecountingmatrix
Qualitycontrolinformationmatrix
AnSCData objectforqualitycontrolandfurtherdownstreamanalysis
QualitycontrolmetricscollectedateachstepscPipe workflowA B C
D
OtherApproachestoQC
• Ilicicetal.GenomeBiology2016;17:29.• TrainaSVMtodeterminelowqualitysamples(FluidigmC1)
• scater(McCarthyetal.Bioinforma;cs2017;33(8):1179–86 hYp://bioconductor.org/packages/scater)
• Exploratorydataanalysisapproach–plotQCmetricsanddetermineoutliersusingselectedmetrics
NormalizaAon
" ScalingnormalizaUonforbulkRNA-seq" Computeascaling(size)factorpersample;" Popularmethods:TMM,DESeq;" AssumesmostgenesarenotdifferenUallyexpressedbetween
samples.
" MethodsforscRNA-seq" Bylibrarysize" BaSiCs(Vallejosetal.2015)" scran(Lunetal.2016)" ComBatinsva(Leeketal.2012)" Usespike-inRNAs
S"llanopen
ques"on
DimensionReducAon
" HelpsvisualiserelaUonshipsbetweensamples" Popularmethods:MDS,PCA,t-SNE(t-DistributedStochasUcNeighborEmbedding),etc.
MDSplot:Distancematrix
PCAplot:Covariancematrix
DimensionReducAonwitht-SNE
t-SNEappliedtoMouseBloodCellscRNA-seqdata
DatafromDrChrisUneBiben(WEHI)
MDSappliedtoMouseBloodCellscRNA-seqdata
MDSappliedtoMouseBloodCellscRNA-seqdata
DimensionReducAonwitht-SNE
Freepublic10XGenomics:-2,700PeripheralBloodMononuclearCells(PBMC)
" Usefulforhigh-dimensionaldata:" alargenumberofcells" morediversepopulaUons
" Mayover-interpretresultsforlessheterogenousdata
Notesonusingt-SNEWaYenberg,etal.,"HowtoUset-SNEEffecUvely",DisUll,2016.hYp://disUll.pub/2016/misread-tsne/
‘Althoughimpressive,theseimagescanbetempUngtomisread.’
1.ThosehyperparametersreallymaYer.2.Clustersizesinat-SNEplotmeannothing.
Notesonusingt-SNEWaYenberg,etal.,"HowtoUset-SNEEffecUvely",DisUll,2016.hYp://disUll.pub/2016/misread-tsne/
‘Althoughimpressive,theseimagescanbetempUngtomisread.’
1.ThosehyperparametersreallymaYer.2.Clustersizesinat-SNEplotmeannothing.3.Distancesbetweenclustersmightnotmeananything.
Notesonusingt-SNEWaYenberg,etal.,"HowtoUset-SNEEffecUvely",DisUll,2016.hYp://disUll.pub/2016/misread-tsne/
‘Althoughimpressive,theseimagescanbetempUngtomisread.’
1.ThosehyperparametersreallymaYer.2.Clustersizesinat-SNEplotmeannothing.3.Distancesbetweenclustersmightnotmeananything.4.Randomnoisedoesn’talwayslookrandom.5.Youcanseesomeshapes,someUmes.6.Fortopology,youmayneedmorethanoneplot.
DifferenAalexpression
" DetectDEgenesormarkers." Methodsdesignedforbulkdata–edgeR,voom,DESeq2,etc." MethodsdevelopedforscRNA-seq–monocle,MAST,SCDE,etc.
DifferenAalexpression
" DifferenUalexpressionanalysisbyedgeR" Quasi-likelihood(QL)pipelineisnotappropriate;" LikelihoodraUotest(LRT)isrecommended.
‘Generally,however,methodsdevelopedforbulkRNA-seqanalysisdonotperformnotablyworsethanthosedevelopedspecificallyfor
scRNA-seq.’
SonnesonandRobinson,bioRxiv2017hYp://www.biorxiv.org/content/early/2017/05/28/143289
Thevalueofcontroldatasets
• BenchmarkingeffortsforscRNA-seqareintheirinfancy• Lackofgoodcontroldatasetsforcomparinganalysismethods• ThewiderangeofscRNA-seqprotocolsmakesthischallenging!
Summary
" scRNA-seqisapowerfultechniquetostudygeneregulaUoncellbycell;
" Thecountscontainhighlevelsofnoisewithmanydropouts;
" QualitycontrolisessenUaltoremoveproblemaUccellsaswellaslow-abundancegenes;
" GoldstandardsfordataanalysisofscRNA-seqdatahaveyettoemerge;
" MethodsdevelopedforbulkdatacanbeappliedtoscRNA-seqdata(withduecare).
FurtherReading
• Svenssonetal.Moore’sLawinSingleCellTranscriptomicshYps://arxiv.org/abs/1704.01379
• Lunetal.Astep-by-stepworkflowforlow-levelanalysisofsingle-cellRNA-
seqdatawithBioconductor,F1000Research,2016hYps://f1000research.com/arUcles/5-2122/v2
• WaYenberg,etal.,"HowtoUset-SNEEffecUvely",DisUll,2016.
hYp://disUll.pub/2016/misread-tsne/
Rsubread
www.b
iocondu
ctor.org
R-basedanalysispipelineforscRNA-seqdata
scran
www.bioconductor.org
scPipe
www.b
iocondu
ctor.org
www.b
iocondu
ctor.org
Acknowledgements
Yunshun(Andy)ChenLuyiTianShianSuStuartLeeShalinNaikDanielaZalcensteinChris"neBibenRobertoBonelli
DavisMcCarthyAaronLunJohnMarioniJamesRyallErnstWolvertang
AMSIBioInfoSummer2017
http://bis.amsi.org.au
4-8 December 2017 Monash University