aug2014 use cases combined

19
Performance assessment Complement to known-pathogenic control samples (e.g. Coriell/GeT-RM, NIBSC). These control samples are most relevant to our product, but only ~1 variant / sample, and a limited # of such samples are available. GIAB boosts n greatly, though variants aren’t generally clinically relevant o We also use: Mike Eberle’s NA12878 calls; internally constructed truth set for CEPH 1463 family & NA19240 Validation docs, performance assessment of genes with poor coverage with control samples, upcoming publications 05/28/2022 1 Use of GIAB NA12878 at Invitae

Upload: genomeinabottle

Post on 24-Jun-2015

280 views

Category:

Health & Medicine


3 download

DESCRIPTION

Example Use Cases

TRANSCRIPT

Page 1: Aug2014 use cases combined

• Performance assessment− Complement to known-pathogenic control samples (e.g.

Coriell/GeT-RM, NIBSC). These control samples are most relevant to our product, but only ~1 variant / sample, and a limited # of such samples are available.

− GIAB boosts n greatly, though variants aren’t generally clinically relevanto We also use: Mike Eberle’s NA12878 calls; internally constructed truth

set for CEPH 1463 family & NA19240− Validation docs, performance assessment of genes with poor

coverage with control samples, upcoming publications

04/13/20231

Use of GIAB NA12878 at Invitae

Page 2: Aug2014 use cases combined

Integrating NIST Call Sets into a Validation Workflow

Validation ReportFalse Positive Ratio FPR=FP/(FP+TN)

False Discovery Rate FDR=FP/(FP + TP)

Sensitivity Sens. = TP/(TP+FN)

Specificity Spec. = TN/(FP +TN)

Balanced Accuracy (Sens. + Spec.)/2

Page 3: Aug2014 use cases combined

Nephropathology Associate’s Kidney Disease Gene Panel: Excerpts from a NA12878 Validation Report

• Data provided by Marjorie Beggs (Nephropathology Associates)• 301 genes from 13 renal disease categories• Agilent oligo-capture followed by MiSeq 2x150 sequencing• Genotypes/probabilities determined with a modified version of MAQ variant caller (Li et al., 2008)

Summary of all targeted positions: Summary of targeted zero coverage positions in experiment:In Standard VCF 614 In Standard VCF 3Not in Standard VCF 803980 Not In Strandard VCF 5100Total 804594 Total 5103

Summary gridDepth* PNotRef** T/P F/P T/N F/N Total FPR FDR Sens. Spec. Bal. Accuracy

10 0.5 592 14 789743 6 790355 0.002% 2.310% 98.997% 99.998% 99.50%10 0.75 591 14 789743 7 790355 0.002% 2.314% 98.829% 99.998% 99.41%10 0.9 591 14 789743 7 790355 0.002% 2.314% 98.829% 99.998% 99.41%

20 0.5 540 11 740860 3 741414 0.001% 1.996% 99.448% 99.999% 99.72%20 0.75 539 11 740860 4 741414 0.001% 2.000% 99.263% 99.999% 99.63%20 0.9 539 11 740860 4 741414 0.001% 2.000% 99.263% 99.999% 99.63%

30 0.5 408 7 611453 3 611871 0.001% 1.687% 99.270% 99.999% 99.63%30 0.75 408 7 611453 3 611871 0.001% 1.687% 99.270% 99.999% 99.63%30 0.9 408 7 611453 3 611871 0.001% 1.687% 99.270% 99.999% 99.63%

* Only positions with a depth greater than or equal to this value will be included in the calculation.** The minimum value for a position to be included as a variant.

Page 4: Aug2014 use cases combined

Ion Benchmarking I

Page 5: Aug2014 use cases combined

Ion Benchmarking II

Page 6: Aug2014 use cases combined
Page 7: Aug2014 use cases combined

Ion Benchmarking III

Page 8: Aug2014 use cases combined

Background• Clinical laboratory – Division of Genomic Diagnostics Certified by regulatory

agencies (CAP).• CWES test requires stringent validation per CAP criteria to establish performance

metrics of the test.

Utilizing NIST data in validation of CWES Test

• Sequence and call variants of NA12878 at CHOP• CHOP ROI: Agilent SureSelect V5+ (SSV5+) baits file• Compare CHOP dataset to NIST data set for concordance

NIST Data Set Details:*High quality reference data set on NA12878 (Dec. 2013)*NIST’s highly confident Region of Interests (ROI) *Variants called in 219,222 regions on hg19 assembly

*: National Institute of Standards and Technology

Analytical Validation of Clinical Whole-Exome Sequencing (CWES) Test

Page 9: Aug2014 use cases combined

SENSITIVITY /SPECIFICITY RefGene +/- 15bp (SSV5+)

CHOP NIST

TPSNVs: 18480 INDELs: 396

FPSNVs: 26INDELs: 3

FNSNVs: 63INDELs: 30

FP: False PositiveTP: True PositiveFN: False NegativeTN: True Negative

SNVs INDELsSensitivity (TP/TP+FN) 99.66% 92.96%Specificity (TN/TN+FP) ~100% ~100%FDR (FP/FP+TN) 0.02% 0.08%Accuracy (TP+TN/TP+TN+FP+FN) ~100% ~100%

TN = NIST highly confident regions – CHOP ROIs

Page 10: Aug2014 use cases combined

Further analysis on presumptive 93 FNs and 29 FPs

63 SNVs 30 INDELs

93 FNs

29 FPs

26 SNVs 3 INDELs

Page 11: Aug2014 use cases combined

• Director– Avni Santani– Mehdi Sarmady

• Clinical WES Team– Zhenming Yu– Kristin McDonald Gibson– Tanya Tischler– Addie I Nesbitt– Elizabeth H Denenberg

Acknowledgment

Page 12: Aug2014 use cases combined

Chr6:151669820 Chr6:151669828

Difficult site in homopolymer in intron of gene AKAP12

Page 13: Aug2014 use cases combined

Chr1:1666303

SNP in Gene SLC35E2, which is also in a pseudogene and a segmental duplication

Page 14: Aug2014 use cases combined
Page 15: Aug2014 use cases combined

Using Genome in a Bottle calls to benchmark clinical exome sequencing

at Mount Sinai School of Medicine

“We evaluate a set of NA12878 technical replicates against GIAB for each new pipeline version.”

Page 16: Aug2014 use cases combined

Benchmarking somatic variant callingat Qiagen

Page 17: Aug2014 use cases combined

NextSeq: New Chemistry – Does it work?

Whole Genome Metrics NextSeq500 HiSeq2500% Genome Covered (>= 10X in Q20 bases) 96% 96%

Mean Coverage in Q20 Bases 28.3X 31.8X

SNPs Called (% dbSNP 129) 3,643,998 (89%) 3,664,014 (88%)

InDels Called (% dbSNP 129) 646,907 (65.7%) 686,547 (64.5%)

Genome in a Bottle SNP Sensitivity & Precision 99.07% | 99.04% 99.25% | 99.90%

Genome in a Bottle Indel Sensitivity & Precision 86.90% | 98.85% 93.29% | 97.54%

Page 18: Aug2014 use cases combined

NextSeq: Exomes

Compare 12-plex Rapid Capture Exome data from HiSeq 2500Rapid to NextSeq500

• 12-plex capture containing• NA12878• 2 cell line Tumor/Normal pairs• TCGA samples

• 2 Runs with 2x76• PF Yield: 72Gb & 75Gb• Run time: 18 hours• Cluster density: 227-238k/mm2

• High level metrics• Error rate: 0.58%• %Q30: 83.6% (72.2% post-BQSR)

Hybrid Selection Metrics NextSeq500 HiSeq2500% Selected 75.4% 74.5%

Penalty 20x 4.83 4.67

Mean Target Coverage 112X 165X

% Target Bases ≥ 20x 92.9% 95.1%

% Target Bases ≥ 50x 79.1% 87.8%

Variant Calling Metrics NextSeq500 HiSeq2500SNPs (% dbSNP 129) 22786 (94.7%) 22953 (94.6%)

GIAB Sensitivity 96.53% 96.79%

GIAB Precision 99.87% 99.96%

InDels (% dbSNP 129) 816 (64.2%) 813 (65.4%)

GIAB Sensitivity 83.16% 83.92%

GIAB Precision 88.43% 92.31%

Page 19: Aug2014 use cases combined

Other use cases

LabCorp (Kyle Hart)• We are using this data to validate

our variant identification pipelines which are based on the Qiagen/CLC software and Illumina sequence data

• We are seeking high clinical sensitivity to minimize false negatives and we have a variety of strategies to rescue un-callable segments and confirm called variants prior to reporting to increase specificity.

NHGRI (Nancy Hansen)• We have a variant analysis

pipeline which analyzes whole exome sequence data (Illumina HiSeq2000/2500) for SNPs and small indels

• We are using the GIAB variant dataset to assess the accuracy of our pipeline and compare it to other publicly available pipelines.