aug2014 spiral genetics anchored assembly

20
SV Detection via Anchored Assembly How can we best call structural variants? Becky Drees,Jeremy Bruestle, Cheinan Marks

Upload: genomeinabottle

Post on 24-Jun-2015

500 views

Category:

Health & Medicine


4 download

DESCRIPTION

Aug2014 spiral genetics anchored assembly

TRANSCRIPT

Page 1: Aug2014 spiral genetics anchored assembly

SV Detection via Anchored Assembly

How can we best call structural variants?

Becky Drees,Jeremy Bruestle, Cheinan Marks

Page 2: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Brief Description of Anchored Assembly Method Testing vs GIAB Variant Set & Validated SV Sets

How Do We Describe SVs from Detected Breakpoints? !

SV Detection via Anchored Assembly

Page 3: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Input data

Any Species with a draft genome

Existing NGS Data No special library prep ~20x per ploidy

Page 4: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

0

0 200 400 600 800 1000 1200

1000

2000

3000

4000

5000

K-m

er C

ount

Total K-mer Quality Score

K-mer Quality Score Distribution

A* error correction

Step 1: Read Correction!

• Similar to Euler or Quake

• Corrects the read without using reference information

• Reduces error from 1% to 0.01%

Page 5: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Step 2: Remove Reference Matches

!

• Remove reads that are an exact match to reference

• Significantly reduces the complexity of the graph

• Reduces required memory usage (40GB for whole human genome)

Page 6: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

!

• Construct a read overlap graph with the remaining reads

• Provides more context than a kmer-based de Bruijn graph

7 7 7

7

8 89 9

7

8

7

R1 R2

R3 R5

R8R7

R3 R6 R9

Read overlapassembly

Step 3: Read Overlap Graph

Page 7: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

!

• Anchor assemblies to reference coordinates

• Provide breakpoint information while keeping reference bias low

Anchoring

Step 4: Anchoring

Page 8: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

!

• Assemble variant sequence from read overlap graph

• Computes minimal cost variation (similar to Smith-Waterman)

• Calls variants and QC to remove likely false positives

A A T G A C T T A G . . A

G A C T T A G A T A

A C

C T T A G A T A A C

A T T

A G A T A A C A T TT T A G A T A A C A

G

G A C T T A G A T A A C A T T G

G A T A A C A T T G

T A G

Reference

Assembled

R2

R3

R4

R5

R6

Variant validation

Step 5: Variant Validation

Page 9: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Anchored)Assembly)only)13,307)

Genome)in)a)Bo8le)only)144,463)

!

2,596,897)Sensi@vity:))95%)Precision:))99.5%)

NA12878 SNP Detection vs GIAB

Page 10: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

NA12878 Indel Detection vs GIAB

Page 11: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Chr. Mills   Pindel  50x

AA  50x AA  200x

1 2475799172 2576951 n n2 78558069 n n n2 187143096 n2 191002548 n n n3 43972635 n n n3 100737223 n n n3 100868475 n n n3 195823764 n n n5 78035993 n n n7 1528948 n n n7 20898768 22717662 n n n9 97387403 n9 137361862 n12 103954170 n n13 76345722 n n n13 11376093913 114103496 n n15 26060663 n n15 92686723 n17 3924078217 77134774 n18 74794821 n n18 76182038 n n n19 1278240 n n n19 2247173 n n n20 55992535 n n21 39080014 n n

X 94894756 n n

NA12878 SV Insertions

Mills et al. Eichler Lab, U. Washington, Sanger validated

Page 12: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

NA12878 SV Deletions

Page 13: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

#CHROM   POS     ID   REF   ALT       QUAL  FILTER    1   1500000   bnd_A   T   T[1:1501108[   100   PASS  

INFO                     FORMAT   SAMPLE   DP=26;NS=1;SVTYPE=BND;MATEID=bnd_B;AID=1234   DP:ED:OV   26:72:89  

#CHROM   POS     ID   REF   ALT       QUAL  FILTER    1   1501108   bnd_B   G   ]1:1500000]G   100   PASS  

INFO                     FORMAT   SAMPLE   DP=26;NS=1;SVTYPE=BND;MATEID=bnd_A;AID=1234     DP:ED:OV   26:72:89  

As breakend records:

As SV events:

Page 14: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

CHR$1$

bnd_K$ bnd_L$ bnd_M$ bnd_N$

190000$200000$ 200231$197000$

• Different events can produce similar breakpoints • Multiple breakpoints can represent a single rearrangement event

Assembled breakpoints can reveal variation that is hard to categorize

Page 15: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

A single breakpoint can contain multiple sequence changes: !• Inserted sequence at deletion breakpoints • Deleted or duplicated sequence at insert breakpoints • Deleted or duplicated sequence at inversion breakpoints

CHR$1$

1700000$ 1704100$

1700100$ 1704250$

Inverted(sequence(

deleted sequence duplicated sequence

Page 16: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

Many assemblies anchor to multiple genome locations • Variation in duplicated genome regions • Variation in repetitive elements • Transposons

CHR$1$

Alu$

anchors to multiple places

unique anchor

Page 17: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Contact

• More information • Trial on own data

!

[email protected] [email protected]

!

[email protected]

Page 18: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Questions?

Page 19: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Anchored Assembly SNP Distribution

Page 20: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Anchored Assembly SV Distribution