sept2016 sv dnanexus_benchmarking

23
The Global Network For Genomics® Mix and Match: Assessing Structural Variation Calling with Varying Coverages and Algorithms Andrew Carroll Head of Science, DNAnexus

Upload: genomeinabottle

Post on 17-Jan-2017

339 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: Sept2016 sv dnanexus_benchmarking

The Global Network For Genomics™

®

Mix and Match:Assessing Structural Variation Calling with Varying Coverages and AlgorithmsAndrew CarrollHead of Science, DNAnexus

Page 2: Sept2016 sv dnanexus_benchmarking

® 2

xSECUREand compliant

platformx

SHARESafely within

the global network

x

SCALEYour

analysis to any size

necessary

Global Cloud-based Platform Secure, Scalable, and Collaborative

solution for Genomics

Page 3: Sept2016 sv dnanexus_benchmarking
Page 4: Sept2016 sv dnanexus_benchmarking
Page 5: Sept2016 sv dnanexus_benchmarking

® 5

Overview of Structural Variation

Page 6: Sept2016 sv dnanexus_benchmarking

® 6

Calling with short reads is challenging

Alkan, Coe, and Eichler (2011)

• Difficult for reads to span events

• Mapping is hard in low complexity regions

• GC Bias• Rely on other

signals –• Insert size• Clipping• Read

orientation

Page 7: Sept2016 sv dnanexus_benchmarking

® 7

Short-Read Structural Variant Tools

xDELLY

xCREST

xPindel

xBreakDancer

xLUMPY

xCNVnator

xManta

xBreakseq2

Page 8: Sept2016 sv dnanexus_benchmarking

® 8

Calling SV with PacBio Data

Page 9: Sept2016 sv dnanexus_benchmarking

® 9

Tools for PacBio DataPB Honey Sniffles Parliament

Adam English

Fritz Sedlazeck

Adam English

Page 10: Sept2016 sv dnanexus_benchmarking

® 10

Apps

Page 11: Sept2016 sv dnanexus_benchmarking

® 11

Benchmarks – Part IOnly PacBio Data

Page 12: Sept2016 sv dnanexus_benchmarking

® 12

Creation of Multi-Technology Truth Set• SV Calls were contributed for a variety of

technologies (Illumina, PacBio, BioNano, 10X Genomics, Complete Genomics)

• Split confident call lists into deletions occurring in regions with tandem repeats and those not in regions with tandem repeats

Page 13: Sept2016 sv dnanexus_benchmarking

® 13

Recall – PBHoney and Sniffles - Deletions

Page 14: Sept2016 sv dnanexus_benchmarking

® 14

Complementarity at 10-Fold Coverage

Page 15: Sept2016 sv dnanexus_benchmarking

® 15

Benchmarks – Part IIIllumina + PacBio Data

Page 16: Sept2016 sv dnanexus_benchmarking

® 16

Parliament Pipeline

Page 17: Sept2016 sv dnanexus_benchmarking

® 17

Recall – Parliament (Assembly vs PacBio)

Page 18: Sept2016 sv dnanexus_benchmarking

® 18

Ensemble Strategies

Page 19: Sept2016 sv dnanexus_benchmarking

® 19

Call Overlap at 10-Fold Coverage

Page 20: Sept2016 sv dnanexus_benchmarking

® 20

Full Combination Ensemble Strategies

Page 21: Sept2016 sv dnanexus_benchmarking

® 21

Conclusions1. With Illumina data at 30-fold coverage, SV calling can

be effective at PacBio data coverages as low as 3–5 fold

2. More PacBio data coverage seems to be always better over investigated ranges (mostly thanks to PBHoney)

3. With only PacBio data, 10–15 fold gives good SV calling results with reasonable sequencing investment

4. Running both Sniffles and PBHoney gives best results, especially at lower (5–15 fold) PacBio data coverages

Page 22: Sept2016 sv dnanexus_benchmarking

® 22

Thank-you!Genome in a BottleJustin Zook

Baylor College of MedicineAdam EnglishWill Salerno

Schatz LabFritz Sedlazeck

DNAnexusAndrew CarrollSinger MaBrett HanniganYih-Chii HwangMarcus KinsellaAbhiram DasSamantha Zarate

Page 23: Sept2016 sv dnanexus_benchmarking

® 23

QUESTIONS?CONTACT ME:

Andrew Carroll, PhD

[email protected]