jan2016 fritz sedlazeck mapping and sv calling from pac bio

16
Giab workshop Fritz Sedlazeck CSHL, JHU

Upload: genomeinabottle

Post on 17-Jan-2017

224 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Giab workshop

Fritz SedlazeckCSHL, JHU

Page 2: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Previous meetings: Utilizing long reads

1. How to predict the breakpoints?

2. How to assess genotype ?

3. Complex SVs?

Page 3: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

1. Breakpoint prediction

• Over BWA-MEM alignments– First version had a bug…

• Redesigning Sniffles– Improved speed– Improved accuracy on noisy alignment– Improved read filtering -> reducing FDR– Optional realignment step

• Improved breakpoint accuracy• Improving Genotyping

Page 4: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Sniffles v01 error

Page 5: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Sniffles v02

Page 6: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Current limitations

• Linear: gap cost always the same• Affine: separate penalties for opening and extending a gap• Using one gap cost is considered state of the art

• Problem with PacBio/ONT: two different gap models required– Sequencing error: large high number of 1 bp indels– Real indels: extending a gap more likely than opening a new one– Sequencing error + repeats cause one gap cost to fail even for real

indels

AAAGAATTCAA-A-A-T-CA

AAAGAATTCAAAA----TCA

vs.

Page 7: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Convex gap costs• Costs for a gap follow a convex function of gap length

• Close to linear gap costs for 1 - 2 bp gaps• As gap gets longer penalty for "splitting" gaps increases• Problem optimal approach: O(nm2 + n2m)• Heuristic implementation O(nm)

Page 8: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

NGM-LR workflow

Page 9: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

NGM-LR reconcileRead within inversion Read within duplication

Page 10: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Deletion

Page 11: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Deletion

Page 12: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Insertions

Page 13: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Inversions

Page 14: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Translocations

Page 15: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Nested SV (SKBR3)

Page 16: Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Outlook

• Finish new version of Sniffles– Assessment of noisy alignments

• NGM-LR:– MQ calculation– Runtime

• Visual inspection and comparison of SV calls