learning to love de bruijn graphs

Learning to love de Bruijn graphs Ben Woodcroft, Australian Centre for Ecogenomics (ACE) Winter School in Bioinformatics, 2015

Upload: benjwoodcroft

Post on 17-Jan-2017

310 views

Category:

Technology

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Learning to Love De Bruijn Graphs

Learning to love de Bruijn graphsBen Woodcroft,

Australian Centre for Ecogenomics (ACE)

Winter School in Bioinformatics, 2015

Page 2: Learning to Love De Bruijn Graphs

A slide from Torsten Seemann

Page 3: Learning to Love De Bruijn Graphs

K-mers and assembly

• For next-generation sequencing, comparison of each read with each other read is impossible.– E.g. 10 million reads -> 107 x 107 read-read

comparisons. Slowww..

• K-mers and de Bruijn graphs help make things tractable

Page 4: Learning to Love De Bruijn Graphs

K-mers and assembly

Page 5: Learning to Love De Bruijn Graphs

Forks

Page 6: Learning to Love De Bruijn Graphs

K-mer too small

Page 7: Learning to Love De Bruijn Graphs

K-mer too large

Page 8: Learning to Love De Bruijn Graphs

My favourite k-mer size

Page 9: Learning to Love De Bruijn Graphs

My favourite k-mer size

With a 100bp read, this can never happen with a k-mer size of 51

Page 10: Learning to Love De Bruijn Graphs

Less tips, more bubbles

As read lengths get longer, assemblers must move from handling dead ends in the graph to handling bubbles.

Page 11: Learning to Love De Bruijn Graphs

Tips and bubbles

Page 12: Learning to Love De Bruijn Graphs

Metagenome assembly

Me: “I know, why don’t I just assemble all my data together?”

Run assemblyWait 4 daysOut of memory allocating 18.4 million terabytes of RAM.

Page 13: Learning to Love De Bruijn Graphs

Solutions to RAM issues

• Quality trimming• Hard trimming• Throwing away a proportion of reads

randomly• Sequencing something else

Page 14: Learning to Love De Bruijn Graphs

Lossy de Bruijn graphs

The number of k-mers observed is vanishingly small relative to the total number of possible k-mers

The human genome: ~3Gbp = ~3×109 k-mersTotal possible 51-mers: 451 = ~1030

0.00000000000000000002%

When making a list of k-mers, counting extra ones probably has little effect on assembly.

Page 15: Learning to Love De Bruijn Graphs

Bloom filters

A low memory k-mer “store”

Page 16: Learning to Love De Bruijn Graphs

Is my k-mer in these reads?

From a bloom filter, the answer is either “no” or “probably”

Page 17: Learning to Love De Bruijn Graphs

A finishing approach to assembly

A central assumption of this method is that the genome is “mostly” complete

Page 18: Learning to Love De Bruijn Graphs

Scaffolding without mate pair data

Page 19: Learning to Love De Bruijn Graphs

Gap filling vs. assembly

• Regular assembly ain’t easy• Re-assembly is more straightforward because

you are trying to get to somewhere

Page 20: Learning to Love De Bruijn Graphs

Gap filling can correct assembly errors

• Contigs often contain errors right at the ends of contigs

• By starting to search a bit back (e.g. 200bp) away from the end of the contig, these errors can be overcome

Page 21: Learning to Love De Bruijn Graphs

Gap-filling can account for strain variation

github.com/wwood/finishm

Page 22: Learning to Love De Bruijn Graphs

Thanks!

• Slideshare.com/benjwoodcroft

• Github.com/wwood

• Ecogenomic.org

Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,

P.Compeau, P.Pevzner & G.Tesler: How to apply de Bruijn graphs …liacs.leidenuniv.nl/~hoogeboomhj/praatjes/algoritmen/... · 2015. 4. 17. · Edmund Landau's book 'Grundlagen der

de Bruijn graphs for sequencing data - CRIStALcristal.univ-lille.fr/~chikhi/pdf/2016-feb-11-smpgd.pdf · de Bruijn graphs for sequencing data Rayan Chikhi CNRS Bonsai team, CRIStAL/INRIA,

An Efficient, Scalable and Exact Representation of High …compas.cs.stonybrook.edu/~mferdman/downloads.php/RECOMB19... · 2.1 Colored de Bruijn graphs De Bruijn graphs are widely

De Bruijn Graph Assembly - Department of Computer Science › ~langmea › resources › lecture_notes › 17_assembly_dbg_v2.pdfDe Bruijn graph builder implementation class DeBruijnGraph:

de novo assembly - Université de Lille · -String graphs capture whole read information-de Bruijn graphs are conceptually simpler: I single node length I single overlap deﬁnition

De novo transcriptome assembly...2.2.2. De Bruijn graphs Due to the enormous amount of short reads produced by the novel sequencing tech-nologies a fast method based on the De Bruijn

Fall in Love with Graphs and Metrics using Grafana

From Indexing Data Structures to de Bruijn Graphsstelo/cpm/cpm14/10_cazaux.pdfDe Bruijn Graphs De Bruijn Graph The assembly De Bruijn Graph (DBG+ k) Let k be a positive integer satisfying

Ilya Minkin, Anand Patel, Mikhail Kolmogorov, …graphs algorithm, which uses multiple de Bruijn graphs constructed from di er ent values of kto capture the complicated repeat structure

Assembly of long error-prone reads using de Bruijn graphs · genome-rearrangement studies (29). However, as discussed in ref. 30, the original deﬁnition of a de Bruijn graph is

Genomes Comparision via de Bruijn graphsmit.spbau.ru/files/Minkin.pdf · 2015-03-11 · Colored graph I We use colored de Bruijn graphs [Iqball et al., 2012] to handle double-strandness

Eulerian Graphs, De Bruijn Graphs and Sequences

graphs and their spectra - arXiv · Lamplighter groups, de Bruijn graphs, spider-web graphs and their spectra R. Grigorchuk, P.-H. Leemann and T. Nagnibeda May 30, 2017 Abstract We

Intro to Neo4j or why insurances should love graphs

de Bruijn graphs for sequencing data - Rayan Chikhi

The de Bruijn graph and genome assembly › CMSC423_S20 › static_files › ...De Bruijn graph A procedure for making a de Bruijn graph for a genome Start with an input string: a_long_long_long_time

De Bruijn Graph assembly

Comparative Genomics and de Bruijn graphs

The de Bruĳn Graph and its efﬁcient representation · Bloom Filters & De Bruijn Graphs Recall the Bloom Filter: how could this data structure be useful for representing a De Bruijn

De Bruijn Graphs for DNA Sequencing (Part 2) NGS …profs.scienze.univr.it/~liptak/MBD/files/MBDDeBruijn...De Bruijn Graphs for DNA Sequencing (Part 2)1 Course “Discrete Biological

Alignment- and reference-free ... - pub.uni-bielefeld.de · with colored de-Bruijn graphs Roland Wittler Genome Informatics, Faculty of Technology, Bielefeld University, Germany Center

1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee

Fully Dynamic de Bruijn Graphs · Fully Dynamic de Bruijn Graphs Data Structure Hash function f 1 The hashing function we use is the combination of Karp-Rabin and minimal perfect

Dylan de Bruijn - Assessment

Colored de Bruijn Graphs

THEO DE BRUIJN -

Emerce Coversion - Emile de Bruijn - UPC

The Private in-domination number of generalized de Bruijn ... · Domination in graphs has been studied extensively recently, since it has many applications. The book “Fundamentals

Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au/ws/wp-content/uploads/sites/...Learning to love de Bruijn graphs Ben Woodcroft, Australian Centre for

eRecruitment 2017 - Sander de Bruijn (Monsterboard)

De Bruijn Sequence Constructionsprofs.scienze.univr.it/~liptak/FundBA/files/SawadaDB.pdfDe Bruijn Sequences A de Bruijn (DB) sequence is a circular string of length 2n where every

Emerce Conversion - Emile de Bruijn - UPC

introduction to (de novo) assemblypbil.univ-lyon1.fr/members/sagot/htdocs/coursesENS/...Methods • greedy assembly • Overlap-layout-consensus • de bruijn graphs • strings graphs

JOINT SPECTRAL RADIUS AND PATH-COMPLETE GRAPH … · of path-complete graphs, such as the De Bruijn graphs.This provides worst-case perfomance bounds for path-dependent quadratic