a brief introduction to transcriptomics: from sampling to ... · a brief introduction to...

22
A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1. Introduc/on to transcriptomes 2. Sample collec/on 3. RNA extrac/on methods and RNA quality assessment and quan/fica/on 4. RNA sequencing techniques 5. Bioinforma/c Analyses - Typical pipeline: Quality assessment, trimming, 6. Special type of analyses: mapping onto genome, quan/fica/on of expression, variant calling (SNPs)

Upload: others

Post on 23-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

A brief introduction to transcriptomics: from

sampling to data analysis

Leeds-omicsintroduc/onseries

Outline

1.  Introduc/ontotranscriptomes2.  Samplecollec/on3.  RNAextrac/onmethodsandRNAquality

assessmentandquan/fica/on4.  RNAsequencingtechniques5.  Bioinforma/cAnalyses-Typicalpipeline:Quality

assessment,trimming,6.  Specialtypeofanalyses:mappingontogenome,

quan/fica/onofexpression,variantcalling(SNPs)

Page 2: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Transcriptomes give us information of gene expression

3

Iden/fygenesdifferen/allyexpressed,iden/fyfunc/onalchanges…

Why use transcriptomes in biological research?

Pros

•  Easy,accessiblewaytoseeandquan/fygeneexpression

•  Immediateaccesstotheproteincodingpor/onofthegenome

•  Iden/fyalterna/vesplicing•  Iden/fySingleNucleo/dePolymorphisms(SNPs)incodingregions

Cons•  Snapshotin/me(different/mes,differentexpressionpaTerns)

•  Absenceofagenedoesnotmeanitisnotpresentinthegenome.

•  Difficulttoensurethatyouhavesampledasinglecelltype.

•  Sta/s/calanalysisishighlydependentonexperimentaldesign.

Page 3: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

The stage of gene expression we capture

5

RNAseqcapturesthematuremessengerRNA(mRNA)

Targetsthecharacteris/cpoly-AtailofthemRNA

Theassump/onisthattheamountofmRNAforanygeneisreflec/veofitsimpactonthecellfunc/on

Sampling design

VERYIMPORTANT:whatisyourresearchques/on?--willyouhaveenoughtoaddressyourques/on?Thingstobearinmind:• What/ssuestotarget–relevanttoyourresearchques/on•  Homogeneoussamplingof/ssues-totheextentyoucanmanage

•  Replicates–accountsforvaria/onandimportanttovalidateresults

•  Developmentalstageofstudiedindividuals

•  Consultsequencingspecialists–(IanCarrandSteveMoss)foradviceonsampling

Page 4: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Some techniques commonly used to stabilise RNA

•  Snapfreezing(liquidnitrogen)–immediatestoragein-80°C.•  RNAlater(Ambion)–smallsized/ssue(<0.5cmlengths)putinx5volumesofit.Longtermstorage:-20°Cor-80°C.

•  NAPbuffer(”homemade”)–similartoRNAlater.•  Othercommercialproductscustomisedtosampletypes(i.e.blood)

Snapfreezing(liquidnitrogen)

Preserving/ssuewithRNAlater

ThermoFisherScien/fic

Considerations when preserving samples

• mRNAisfragileandunstable-suscep/bletodegrada/on–actfast.

•  Ensureasep/ccondi/ons–usetubesandtoolsthatareRNAse-free.

• Amountof/ssuethatyouneed–some/ssueshavehighyields(e.g.liver),andotherstendtogivelowyield(e.g.adipose/ssue,brain).

•  Storage–ideallyat-80°C

Page 5: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Comparison between preserving methods and samples

Camacho-Sanchezetal.2013.MolecularEcologyResources13,663–673

Snapfrozen:bestresults

FollowedbyRNAlaterandNAPbuffer

Obtaining the mRNA

Bind total RNA

Tissue

Lyse and homogenis

e

Add gDNA eliminator

and chloroform

Separate phases

Add ethanol to aqueous

phase

Wash

Elute

Total RNA

IMPORTANT CONSIDERATIONS: Extraction of RNA is complicated by the presence of ribonucleases in tissues •  RNases are difficult to inactivate

ORGANIC EXTRACTION PROTOCOL

Page 6: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Other RNA extraction methods Extrac:onmethod Benefits Drawbacks

Filter-based,SpinBasketFormats Convenientandeasy Canbecomecloggedwithpar/culates

Amenabletosingle-sampleand96-wellprocessing

gDNAandotherlargenucleicacidsareokenretained

Canbeautomated Automa/onrequirescomplexvacuumsystems/centrifuga/on

Magne/cPar/cleMethods Canbeautomated Magne/cpar/clescanbecarriedthrough

Rapidsamplecollec/on/concentra/on Lessefficientinviscoussolu/ons

Noriskoffilterclogging Laboriouswhenperformedmanually

DirectLysisMethods Workswellwithsmallsamples Dilu/on-based

Canbeautomated Spectrophotometricmeasurementofyieldisnotpossible

Scalable PossibleforRNAseresidualac/vity

Poten/alformostaccurateRNArepresenta/on

Performancecanbesubop/mal

RNA quality assessment and quantification Itisimportanttoestablishboththepurityandconcentra/onofRNAthathasbeenextracted

UVSpectroscopy•  MeasuresabsorbanceofdilutedRNAsampleat260and280nm•  Nucleicacidconcentra/oniscalculatedusingBeer-Lambertlaw

Absorbanceatapar/cularwavelength

Concentra/onofnucleicacid

Pathlengthofthespectrophotometer

cuveTe(typically1cm)Ex/nc/on

coefficient

εRNA=0.025(mg/ml)-1cm-1

A=εCI

Page 7: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

RNA quality assessment and quantification Itisimportanttoestablishboththepurityandconcentra/onofRNAthathasbeenextracted

UVSpectroscopy•  MeasuresabsorbanceofdilutedRNAsampleat260and280nm•  Nucleicacidconcentra/oniscalculatedusingBeer-Lambertlaw

A=εCIe.g.A260=1.0isequivalentto~40μg/mLRNAA260/A280ra/oindicatesRNApurity•  1.8-2.1indicateshighlypurifiedRNA

IMPORTANTCONSIDERATIONS:•  pH•  CuveTe•  RNAdilu/onrange•  DoesnotdiscriminatebetweenDNAandRNA(useRNase-freeDNasetoremovecontamina/ngDNA

RNA quality assessment and quantification Itisimportanttoestablishboththepurityandconcentra/onofRNAthathasbeenextracted

Agilent®2100Bioanalyzer•  Combina/onofmicrofluidics,capillaryelectrophoresisandfluorescentdye•  EvaluatesbothRNAconcentra/onandintegrityBioanalyzerlabchip

•  Nano(ng/μL)andpico(50-5000pg/μL)systemsavailable•  DeterminessizeandmassdeterminedasRNAmoleculesfluoresceinchipchannels•  Systemproducesagel-likeimageandanelectropherogram•  Comparesunknownconcentra/onstoAgilent®RNA6000Ladder•  RNAIntegritynumberdeterminedbyanalysisalgorithm(maxvalue10)

RIN~10

RIN~6

Page 8: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

RNA Sequencing

•  Wholetranscriptomeshotgunsequencing(WTSS)•  Revealsthepresenceandquan/tyofRNAinabiologicalsampleatagivenmomentin/me

RNAISOLATION

RNASELECTION/DEPLETION:

ISOLATEDRNA

SELECTIONVIAPOLY(T)MAGNETICBEADS

POLY(A)RNAMOLECULESBINDTOPOLY(T)BEADS

•  PolyAselec/on•  rRNAdeple/on•  RNAcapture

cDNASYNTHESIS

Page 9: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

RNA sequencing IMPORTANTCONSIDERATIONS:• COST•  SINGLEVSPAIRED-ENDREADS

•  SE:FOREXPRESSIONANALYSISOFWELLANNOTATEDGENOMES•  PE:BETTERFORCHARACTERISATIONOFPOOLYANNOTATEDTRANSCRIPTOMES

• READLENGTH• DEPTHOFCOVERAGE

•  Determinedbynumberofsamples(libraries)inonelane• REPLICATES,RANDOMISATIONANDMULTIPLEXING

RAWREADS DATAANALYSIS

Sampletype ReadsneededforDifferen:alExpression

(millions)

ReadsNeededforRareTranscriptorDeNovoAssembly(millions)

ReadLength

Smallgenomes(bacteria/fungi)

5 30-65 50SEorPEforposi/onalinfo

Intermediategenomes(Drosophila,C.elegans)

10 70-130 50-100SEorPEforposi/onalinfo

Largegenomes(human/mouse)

15-25 100-200 >100SEorPEforposi/onalinfo

E.g.(Luietal.,2014)12samplesinonelaneofIlluminaHiSeq=10millionreadspersample4samplesinonelaneofIlluminaHiSeq=30millionreadspersample3Xmorereadspersample =1.5Xcostincrease

=~25%moredifferen/allyexpressedgenesdetectedLiu,Y.,Zhou,J.,andWhite,KP.,(2014)RNA-seqdifferen/alexpressionstudies:moresequenceormorereplica/on?Bioinforma/csFeb1;30(3):301-4

Bioinformatics - Analysis of transcriptomic data

Page 10: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Pasteurella in Saiga Antelope host

MassmortalityhitSaigaAntelopeinSpring2015.àPasteurellainfec:on?4samplesofdifferent/ssues-  3antelopesdiedfrominfec/on-  1antelopediedfromothercause

2objec:ves:1)  GetexpressionlevelofvirulentPasteurella

genes(coun/ngreads)2)  Iden/fyotherpossiblemuta/ons(variant

calling)

Page 11: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Transcriptomic pipeline

Transcriptomic pipeline

Page 12: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

NGS data – what it looks like

ExamplesizeforsampleofSaigatranscriptome:12Gb

(.fastq,.sff,.fa,.csfasta/.qual)

Transcriptomic pipeline

Page 13: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Sequencing quality check

Fastqqualityscore:Q=-10log10PQualityscore Probabilityofincorrect

iden:fica:onAccuracyofbaseiden:fica:on

40 1in10000 99.99%

30 1in1000 99.9%

20 1in100 99%

10 1in10 90%

FastQCinterface

FastQC:visualisa/onTrimmoma/c:trimreadsCutadapt:removeadaptors

Sequencing quality check

Fastqqualityscore:Q=-10log10PQualityscore Probabilityofincorrect

iden:fica:onAccuracyofbaseiden:fica:on

40 1in10000 99.99%

30 1in1000 99.9%

20 1in100 99%

10 1in10 90%

FastQCinterface

FastQC:visualisa/onTrimmoma/c:trimreadsCutadapt:removeadaptors

Page 14: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Sequencing quality check

Fastqqualityscore:Q=-10log10PQualityscore Probabilityofincorrect

iden:fica:onAccuracyofbaseiden:fica:on

40 1in10000 99.99%

30 1in1000 99.9%

20 1in100 99%

10 1in10 90%

FastQCinterface

FastQC:visualisa/onTrimmoma/c:trimreadsCutadapt:removeadaptors

Transcriptomic pipeline

Page 15: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Mapping reads to Pasteurella genome

Ø ExtractPasteurellareadsfromsamplesØ Caseswherethereisnoreferencegenome

ReferencePasteurella(FASTAfile–NCBI)

SamplePasteurellaSaigaantelope

Mapping to reference genome

Outputfile:BAM(BinaryAlignmentMap)compressedandencrypted.SAM(SequenceAlignmentMap)

Commonso\wareexamples:ForDNA:-  BWA(BurrowWheelerAligner)-  Bow/eForRNA:-  Tophat-  STAR

Page 16: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Transcriptomic pipeline

PicardToolsSamtools

Transcriptomic pipeline

Page 17: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

SAM format and alignment statistics

Sta:s:cs:Samtools‘flagstat’

SAMformat

Transcriptomic pipeline

Page 18: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Mpileup file

Samtools‘mpileup’

SAMfile

Mpileupfile

Transcriptomic pipeline

Page 19: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Count reads mapping a region

Commonso\ware:htseq-countComparegeneexpressions.àDifferen/alexpression

Sample1 Sample2

Total:1350readsmappinggeneA

Total:10readsmappinggeneA

Transcriptomic pipeline

Page 20: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Compare reference to ‘sample’ Pasteurella

Commonso\ware:Varscan(java)

Variantcalling:

-  SNP(singlepolymorphismnucleo/des)

-  Indels

IGV

FastQCTrimmoma/c/Cutadapt

BWA,Bow5e,STAR,Tophat

SamtoolsØ  FlagstatØ  Mpileup

HTSeq-countVarscan

Summary

Page 21: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Need help?

Adviceonappropriatepipeline:Ø IanCarr:[email protected]

Ø StephenMoss:[email protected]

Unixcommand,script,so\wareparameters:

Ø NatachaChenevoy:[email protected]

Coming2dayworkshopinthenewyear:“Introduc:ontostandardtranscriptomeanalysis”

SteveMoss

Page 22: A brief introduction to transcriptomics: from sampling to ... · A brief introduction to transcriptomics: from sampling to data analysis Leeds-omics introduc/on series Outline 1

Acknowledgements

M. O’Connell

MembersoftheO’ConnellLab

MembersoftheCreeveyLabatAberystywthUniversity

Sequencingadvice:IanCarrSteveMoss Simon Goodman