aug2014 spiral genetics anchored assembly
DESCRIPTION
Aug2014 spiral genetics anchored assemblyTRANSCRIPT
SV Detection via Anchored Assembly
How can we best call structural variants?
Becky Drees,Jeremy Bruestle, Cheinan Marks
Please do not distribute without permission.
Brief Description of Anchored Assembly Method Testing vs GIAB Variant Set & Validated SV Sets
How Do We Describe SVs from Detected Breakpoints? !
SV Detection via Anchored Assembly
Please do not distribute without permission.
Input data
Any Species with a draft genome
Existing NGS Data No special library prep ~20x per ploidy
Please do not distribute without permission.
0
0 200 400 600 800 1000 1200
1000
2000
3000
4000
5000
K-m
er C
ount
Total K-mer Quality Score
K-mer Quality Score Distribution
A* error correction
Step 1: Read Correction!
• Similar to Euler or Quake
• Corrects the read without using reference information
• Reduces error from 1% to 0.01%
Please do not distribute without permission.
Step 2: Remove Reference Matches
!
• Remove reads that are an exact match to reference
• Significantly reduces the complexity of the graph
• Reduces required memory usage (40GB for whole human genome)
Please do not distribute without permission.
!
• Construct a read overlap graph with the remaining reads
• Provides more context than a kmer-based de Bruijn graph
7 7 7
7
8 89 9
7
8
7
R1 R2
R3 R5
R8R7
R3 R6 R9
Read overlapassembly
Step 3: Read Overlap Graph
Please do not distribute without permission.
!
• Anchor assemblies to reference coordinates
• Provide breakpoint information while keeping reference bias low
Anchoring
Step 4: Anchoring
Please do not distribute without permission.
!
• Assemble variant sequence from read overlap graph
• Computes minimal cost variation (similar to Smith-Waterman)
• Calls variants and QC to remove likely false positives
A A T G A C T T A G . . A
G A C T T A G A T A
A C
C T T A G A T A A C
A T T
A G A T A A C A T TT T A G A T A A C A
G
G A C T T A G A T A A C A T T G
G A T A A C A T T G
T A G
Reference
Assembled
R2
R3
R4
R5
R6
Variant validation
Step 5: Variant Validation
Please do not distribute without permission.
Anchored)Assembly)only)13,307)
Genome)in)a)Bo8le)only)144,463)
!
2,596,897)Sensi@vity:))95%)Precision:))99.5%)
NA12878 SNP Detection vs GIAB
Please do not distribute without permission.
NA12878 Indel Detection vs GIAB
Please do not distribute without permission.
Chr. Mills Pindel 50x
AA 50x AA 200x
1 2475799172 2576951 n n2 78558069 n n n2 187143096 n2 191002548 n n n3 43972635 n n n3 100737223 n n n3 100868475 n n n3 195823764 n n n5 78035993 n n n7 1528948 n n n7 20898768 22717662 n n n9 97387403 n9 137361862 n12 103954170 n n13 76345722 n n n13 11376093913 114103496 n n15 26060663 n n15 92686723 n17 3924078217 77134774 n18 74794821 n n18 76182038 n n n19 1278240 n n n19 2247173 n n n20 55992535 n n21 39080014 n n
X 94894756 n n
NA12878 SV Insertions
Mills et al. Eichler Lab, U. Washington, Sanger validated
Please do not distribute without permission.
NA12878 SV Deletions
Please do not distribute without permission.
How to describe SVs from breakpoints?
#CHROM POS ID REF ALT QUAL FILTER 1 1500000 bnd_A T T[1:1501108[ 100 PASS
INFO FORMAT SAMPLE DP=26;NS=1;SVTYPE=BND;MATEID=bnd_B;AID=1234 DP:ED:OV 26:72:89
#CHROM POS ID REF ALT QUAL FILTER 1 1501108 bnd_B G ]1:1500000]G 100 PASS
INFO FORMAT SAMPLE DP=26;NS=1;SVTYPE=BND;MATEID=bnd_A;AID=1234 DP:ED:OV 26:72:89
As breakend records:
As SV events:
Please do not distribute without permission.
How to describe SVs from breakpoints?
CHR$1$
bnd_K$ bnd_L$ bnd_M$ bnd_N$
190000$200000$ 200231$197000$
• Different events can produce similar breakpoints • Multiple breakpoints can represent a single rearrangement event
Assembled breakpoints can reveal variation that is hard to categorize
Please do not distribute without permission.
How to describe SVs from breakpoints?
A single breakpoint can contain multiple sequence changes: !• Inserted sequence at deletion breakpoints • Deleted or duplicated sequence at insert breakpoints • Deleted or duplicated sequence at inversion breakpoints
CHR$1$
1700000$ 1704100$
1700100$ 1704250$
Inverted(sequence(
deleted sequence duplicated sequence
Please do not distribute without permission.
How to describe SVs from breakpoints?
Many assemblies anchor to multiple genome locations • Variation in duplicated genome regions • Variation in repetitive elements • Transposons
CHR$1$
Alu$
anchors to multiple places
unique anchor
Please do not distribute without permission.
Contact
• More information • Trial on own data
!
[email protected] [email protected]
!
Please do not distribute without permission.
Questions?
Please do not distribute without permission.
Anchored Assembly SNP Distribution
Please do not distribute without permission.
Anchored Assembly SV Distribution