sanger vs next-gen sequencing … · next-gen sequencing workflow source: lu and shen, 2016,...
TRANSCRIPT
1
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Tools and Algorithms in BioinformaticsGCBA815/MCGB815/BMI815, Fall 2017
Week-8: Next-Gen Sequencing
RNA-seq Data Analysis
Babu Guda, Ph.D.Professor, Genetics, Cell Biology & Anatomy
Director, Bioinformatics and Systems Biology Core
University of Nebraska Medical Center
SangervsNext-GenSequencing
Source: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwj356Gaj-zWAhXEzFQKHZrlCh0QjRwIBw&url=http%3A%2F%2Fslideplayer.com%2Fslide%2F11674461%2F&psig=AOvVaw3BHyDsG4jHY9z4Y3Jc11IY&ust=1507933065294289
2
Next-GenSequencing
Source: https://bloggenohub.files.wordpress.com/2015/01/slide1.jpg
• No in vivo cloning
CostofHumanGenomeSequencing
Source: http://blog.dnanexus.com/wp-content/uploads/2017/04/Screen-Shot-2017-04-24-at-11.40.38-AM.png
3
Next-GenSequencingWorkflow
Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657
• Genome• Wholegenomesequencing• Wholeexomesequencing• Targetedgenepanels(cancer,newborns,autism,etc.)
• Transcriptome• WholeRNAsequencing• mRNAtranscriptome(poly-Aselection)• SmallRNAanalysis(siRNA,snoRNA,lincRNA,etc.)• Geneexpressionprofilingforselectedtargetgenes
• Metagenome• Bulksequencingofmanytypesofbacteria• Examples:humangutmicrobiome,soilsamples,foodcontamination,
extremophiles,etc.• Epigenome
• ChromatinImmunoprecipitation Sequencing(ChIP-Seq)• Methylation Sequencing(Methyl-Seq)
ApplicationsofNGS
4
DifferentSequencingLibraries
Source: http://slideplayer.com/7847747/25/images/7/Types+of+Sequencing+Libraries.jpg
Paired-endSequencing
Source: https://assets.illumina.com/content/dam/illumina-marketing/images/science/v2/web-graphic/paired-end-vs-single-read-seq-web-graphic.jpg
5
FASTQFilesfromPaired-endSequencing
Source: https://bioinf-galaxian.erasmusmc.nl/galaxy/
Demultiplexing MixedSamples
Source: https://www.illumina.com/content/dam/illumina-marketing/images/technology/multiplexing-overview-figure.gif
6
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
Different File Types in NGS analysis
• Fastq file – generated by the sequencer, contains NGS reads• SAM file – Sequence Alignment/Map (generated by aligning the
NGS reads with the reference genome)• BAM file – Binary version of the SAM file (SAMtools are used to
manipulate SAM/BAM files)• GFF file – General Feature Format used to hold genome
annotation (chromosome, strand, frame, exon, CDS, etc.)• GTF file – Gene Transfer Format (Also contains all the info as in
GFF and in addition contains gene annotation information)• VCF file – Variant Call Format (used to store variant data such
as SNPs, InDels, short structural rearrangements)
Row1:Informationfromthesequenceraboutthelocationofthisreadontheplate
Row2:TheSequenceRow3:MetadataprovidedbythesequencingteamRow4:Qualityscorespertainingtoeachnucleotideinthesequence
[email protected]/1GAGGCTATAGCATGGTCAAGGCACAAGAAGATCACTGGACTGCCCTCGCTCAGCCCTCAGCTACTG+>>?>?@>?>@@>?@@=@@@@@??>??@??@?@A?>@@@?>@@???A@:@A@@A@@@A@@AAB@@BB
7
__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815
FASTQ format:
FASTQ is based on the popular FASTA format for sequences
FASTA format>sequence_ID; header in one lineAGTTGTAGTCCGTGATAGTCGGATCGG
FASTQ format provides additional information that includes the quality score@20FUKAAXX100202:1:64:10634:114560/1TTGTATTTTTAGTAGAGACGGAGTTTCGCCATGTTGGTCAGGCTGGCCTCGAATTCCTGACCTCAAGTGATCCGCCCGCCTCGGCCTCCCAACGTTTTGG+?=@7=>B==;;BB?<B?=8539<6?6>8>=BB<<B=08:9@5;:A@@?@9:BAAA<?;8;@AC@BBBBBA?<9-@B@;CAA77<:BEB<BB@07?@=<?84
ASCII code for Quality score (Phred score, ranges from 0-50)
ASCII code for Quality score (in the increasing order; ! is the worst and ~ is the best
SimilartotheFastq fileinthatitcontainstherawsequenceanditsqualityscores.
Italsotellsyouwherethesequencealignedtothegenome,andhowwell(thisscre isalsophred-scaled).
Inthiscase,thisreadalignedtochromosome22,position17445857,andhasaqualityscoreof60(ora1in1,000,000chanceofbeingplacedincorrectly).
SequenceAlignment/Map(SAM/BAM)SRR098401.104031357 83 chr22 17445857 60 76M = 17445512 -421 ACTGTTACCAGATCAAGAACTGATAGGGACAGGGATCATTATTCCCCCTTTACAGATGAGAAGGCCGTCACGCCTC @@>>B@@@BBAAAB9A@@>:@@?=A@?@?@A???>?@??=???@@@@@>@>>@@@><??@>@>@@8?>?=:@>?>> BD:Z:NOJKPQQQQMONOMKKKLNOMNLLLJLMINLJLMLMLKKKKJLJJJMKCKLINJMMLJKKKMOOMNNOLPQSNMKK PG:Z:MarkDuplicates RG:Z:NA12878 BI:Z:OOMLRRPPRPPQQONOLOPOONOOOKLNMONJKMNONMMMMLMKKKMLGMNLNMMNNJMJLNOMLNMPNONONNMM NM:i:0 MQ:i:60 AS:i:76 XS:i:0
8
VariantCallFormat(VCF)
RNA-Seq Data Analysis
9
ComputationalAnalysisofRNA-SeqData
Source: Conesa et al., Genome Biology, 2016, 17:13
RNA-Seq DataAnalysisWorkflow
FastQC, FQTrim
STAR, HISAT, TopHat, Sailfish, Salmon
Cufflinks, EdgeR, DESeq
CuffDiff, DESeq, DegeR, Limma
GSEA, IPA, DAVID, GO, etc.
Illumina, Ion Torrent, PacBio
10
InputFilesforRNA-seqAnalysis
Download Test Data file from the Course Page and unzip the folder
GalaxyServerhttps://usegalaxy.org/
• A large compilation of open-source NGS data analysis tools that
are accessible to users on web-based platforms
• Data can be uploaded from a PC/Mac and computing can be done
on the cloud
• No need to install tools and maintain servers locally
• In-depth tutorials are available to use Galaxy services
• A list of Public Galaxy Servers can be found at
• https://galaxyproject.org/public-galaxy-servers/
• Today’s RNA-seq analysis will be performed from the following link
• https://bioinf-galaxian.erasmusmc.nl/galaxy/
11
Phred Score(Q)explainedPhred&score&(Q)&vs&Error&probability&(P)&
PQ 10log10−=
&
BaseSequenceQualityInterpretation
Bad Quality
Bad QualityQuality drops at the tail end
Excellent Quality
12
ReadMappingandAssembly
Source: https://home.cc.umanitoba.ca/~frist/PLNT7690/lec12/lec12.3.html
DownstreamAnalysisofRNA-seqResults
IPA: Ingenuity Pathway Analysis
Source: Yoo et al., Nature Genetics, 2014
Hierarchical Clustering
GSEA- Gene Set Enrichment Analysis
Source: Graner et al, Front. Oncology, 2015
Source: Li et al, Scientific Reports, 2015 Source: Bee et al., PLoS ONE, 2011