avoiding nonsense results in your ngs variant studies
DESCRIPTION
Presented at the 2014 Bio-IT World Expo in Boston, this slideshow provides info on the use of Lyons-Weiler's entropy-based measures of genotypic signal to improve concordance among alternative variant calling algorithms and to evaluate various steps in the GATK best practices pipeline. The second part of the talk presented data showing a demarcation bias in the widely used measure of fold change in selected differentially expressed genes, transcripts or proteins from microarray and RNASeq data. http://www.bio-itworldexpo.com/Next-Gen-Sequencing-Informatics/TRANSCRIPT
![Page 1: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/1.jpg)
Avoiding Nonsense Resultsin your NGS Variant Studies
James Lyons-Weiler, PhDScientific Director/
Senior Research ScientistBioinformatics Analysis Core
Genomics & Proteomics Core LaboratoriesUniversity of Pittsburgh
Pittsburgh, PAMay 1, 2014
![Page 2: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/2.jpg)
Two Parts
• Identifying sites with low genotypic signal increases concordance among variant callers
• Hazards in finding differentially expressed genes in RNASeq – how to do it more robustly.
![Page 3: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/3.jpg)
23andMe: High risk of RA and psiriosisGTL: Low risk of RA and psiriosis
![Page 4: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/4.jpg)
NYTimes Article, etc.
![Page 5: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/5.jpg)
Data were from Illumina hi-seq 2000
![Page 6: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/6.jpg)
Among method averageConcordance57.5% overall; 32.7% at high coverage
O’Rawe et al.
![Page 7: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/7.jpg)
TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)
SEQUENCER
MAPPER
VARIANT CALLERS
LOW CONCORDANCE (O’Rawe et al., 2013)
Consensus Analysise.g.,2/3, ¾, set analysis
Information Theory(-> modeling)
Improve Callers(fix errors, modeling) Bake Offs
Simulations
Spiked Ins
![Page 8: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/8.jpg)
Entropy of Base Distributions
A T C GA T C G A T C GLow entropyHigh enthalpy
Low entropyHigh enthalpy
High entropyLow enthalpy
![Page 9: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/9.jpg)
Boltzmann Entropy
• s = k ln w (Planck)
• w = antiln(s/k)
http://schneider.ncifcrf.gov/images/boltzmann/boltzmann-tomb-4.html
![Page 10: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/10.jpg)
Rank Sorted Distribution of w(O’Rawe et al. data)
Homozygotes w = 1
Heterozygotes w = 2
![Page 11: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/11.jpg)
Example w Density Distribution
![Page 12: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/12.jpg)
w and FBVCA T C G w pw Zygosity Genotype200 0 0 0 1 0 Homozygote AA
16 158 13 13 2.102558 0 Homozygote TT100 100 0 0 2 0 Heterozygote AT
58 30 1 111 2.768507 0 Heterozygote AG28 80 14 78 3.303636 0 Heterozygote TG76 38 29 57 3.758733 0 Heterozygote AG33 49 60 58 3.895496 0.0126 Heterzygote? CG?50 50 50 50 4 1 noise unknown
![Page 13: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/13.jpg)
Operational*Equiprobable Null Distribution
{f(A) = f(T) = f(G) = f(C)}
![Page 14: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/14.jpg)
Convergence of significance (pw)
![Page 15: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/15.jpg)
What We Expect
TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)
SEQUENCER
MAPPER
VARIANT/BASE CALLERS
Genotypic Signal Filtering
INCREASED CONCORDANCE
![Page 16: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/16.jpg)
![Page 17: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/17.jpg)
Phom Function
![Page 18: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/18.jpg)
gatkConcordance w/ FBVC Hom Het
ALL 0.5762 11868 17670pw<=0.05 0.9976 11282 5676
pw>0.05 0.0074 586 11994samtools
ALL 0.5649 11541 18799pw<=0.05 0.9917 11489 5761
pw>0.05 0.0002 52 13038snver
ALL 0.6006 11904 16729pw<=0.05 0.9934 11812 5470
pw>0.05 0.0007 92 11259
From the O’Rawe et al. generated resultsFBVC = frequency-based variant caller (Lyons-Weiler et al.)
![Page 19: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/19.jpg)
Signal Tx %ConcordanceFBVC_vs_FBVC Marked ALL 85.64
pw<=0.05 91.08pw>0.05 35.66
FBVC_vs_FBVC Realigned ALL 83.82pw<=0.05 91.69
pw>0.05 28.21FBVC_vs_FBVC Recalibrated ALL 93.14
pw<=0.05 ***99.39pw>0.05 48.53
FBVC_vs_FBVC Reduced ALL 21.54pw<=0.05 24.57
pw>0.05 4.25FBVC_vs_FBVC Marked-Realigned ALL 76.91
pw<=0.05 86.11pw>0.05 15.44
FBVC_vs_FBVC Marked-Realigned-Recalibrated ALL 76.73pw<=0.05 85.99
pw>0.05 15.34
FBVC_vs_FBVC Marked-Realigned-Recalibrated-Reduced ALL 19.98pw<=0.05 22.9
pw>0.05 2.66
![Page 20: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/20.jpg)
![Page 21: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/21.jpg)
TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)
SEQUENCER
MAPPER
VARIANT CALLERS
LOW CONCORDANCE (O’Rawe et al., 2013)
Consensus Analysise.g.,2/3, ¾, set analysis
Information Theory(-> modeling)
Improve Callers(fix errors, modeling) Bake Offs
Simulations
Spiked Ins
![Page 22: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/22.jpg)
Lifescope reads (read)
Shrimp2 reads (blue)
Mappers must be systematically evaluated
![Page 23: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/23.jpg)
Part 2: Good and Bad News forRNASeq (and everything else):
The Bad News:
Fold Change is Biased.
The Good News:
We have identified a much less biased method.
![Page 24: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/24.jpg)
T-test is not appropriatefor small N, large P data
(such as RNASeq)
![Page 25: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/25.jpg)
Fold Change > 2.0
Delta > 25
![Page 26: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/26.jpg)
FC(A/B) is Blind to Large Portionsof Your Data
FC(A/B)
Delta(and J5: Patel & Lyons-Weiler, 2004)
![Page 27: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/27.jpg)
Ratio are Hard to Interpret asBiological Differences
Gene A B delta (A-B) FC(A/B)
gene1 5 3 2 1.667
gene2 50 30 20 1.667
gene3 500 300 200 1.667
gene4 5000 3000 2000 1.667
gene5 50000 30000 20000 1.667
![Page 28: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/28.jpg)
A-B is a differenceA/B is a quotient.
![Page 29: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/29.jpg)
Log2 TransformationDoes not Help
Reveals Minor Delta (&J5) Bias
Pink = FC(A/B)Black = Delta
![Page 30: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/30.jpg)
G-Thresholding J5
![Page 31: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/31.jpg)
FC Bias in Amyotrophic Lateral Sclerosis
0
50000
100000
150000
200000
250000
300000
350000
0 50000 100000 150000 200000
Control
ALS DEGy
FCDEGy
Black circles = FC(A/B). Pink = Gthr-J5 genes
![Page 32: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/32.jpg)
![Page 33: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/33.jpg)
![Page 34: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/34.jpg)
Black circles = FC(A/B). Pink = Gthr-J5 genes
FC(A/B) Bias inAlchohol-Induced Hepatitis
![Page 35: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/35.jpg)
Conclusions• Not all NGS/HTS sites have sufficient genotypic signal to warrant
a base call. High coverage alone does not provide a solution.
• By measuring genotypic signal, we can determine which sites we can call with confidence.
• Fold-change(FC(A/B) is blind to highly expressed genes and should be abandoned as a measure of differential expression altogether – even for single gene or single protein studies!
• Published microarray data sets analyzed to date using FC(A/B) only are a gold-mine for re-analysis using less biased methods.
![Page 36: Avoiding Nonsense Results in your NGS Variant Studies](https://reader033.vdocument.in/reader033/viewer/2022061203/547cf496b4af9fcf338b4ea1/html5/thumbnails/36.jpg)
Credits and Contact• pw, pHom, etc: James Lyons-Weiler, Alan Twaddle, Rahil Sethi.
– (MS in preparation)– Our software is called Gconf (not yet available)
• Fold-Change Bias: James Lyons-Weiler, Tamanna Sultana, Rick Jordan, Rahil Sethi– (Paper in review)– For now, read
• Mariani TJ, Budhraja V, Mecham BH, Gu CC, Watson MA, Sadovsky Y. 2003. A variable fold change threshold determines significance for expression microarrays. FASEB J. 17:321-3. doi: 10.1096/fj.02-0351fje
• Pearson, K. 1897. On a form of spurious correlation that may arise when indices are used for the measurement of organs. Proc Roy Soc Lond 60:489-498 doi: 10.1098/rspl.1896.0076