le#reads#devo#essere#traate per# trasformare#i#da in ... · greedy#reconstruc1on# it was the best...
TRANSCRIPT
![Page 1: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/1.jpg)
![Page 2: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/2.jpg)
Le Reads devo essere tra,ate per trasformare i da1 in informazioni
Assembly
In bioinforma1cs, sequence assembly refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence
A con1g (from con1guous) is a set of overlapping DNA segments that together represent a consensus region of DNA. In bo,om-‐up sequencing projects, a con1g refers to overlapping sequence data (reads);
![Page 3: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/3.jpg)
Dat
a si
ze
Raw reads
Pre- processing
Assembly: Alignment / de novo
Application specific: Variant calling, count matrix,...
Compare samples / methods
Question
Generalized NGS analysis
Answer?
![Page 4: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/4.jpg)
Merge small DNA fragments together so they form a previously unknown sequence
What is de novo assembly?
Merge millions reads together so they form previously unknown sequences
![Page 5: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/5.jpg)
•
de novo assembly • Assemble reads into longer fragments
Find overlap between reads
• Many approaches
reads
con1gs
scaffolds
![Page 6: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/6.jpg)
•
de novo assembly • Assemble reads into longer fragments
Find overlap between reads
• Many approaches
reads
con1gs
scaffolds
![Page 7: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/7.jpg)
• • • • •
Lets try to assemble some reads!
• Rules: a minimum of 7-bp overlap overlap must not include any N bases same orientation so that the sequence can be read left to right there may be 1-bp differences simplified - no double stranded DNA
Valid assemblies ..NNNNGGACTATGATTCG ||||||| TGATTCGAGGCTAANN.. ..NNNNNNNNCGATTCTGATCCGA ||||||| GTCCTCGATTCTNNNNNNNN..
Invalid assemblies ..NNNNCGGACTATGATT
|||||| ATGATTCGAGGCTAANN..
..NNNNNNNNCGCTACTGATCCGA || | ||| GTCCTCGATTCTGNNNNNNN..
![Page 8: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/8.jpg)
NGS de novo assembly
• Success is a factor of:
• Genome size,genomic repeats(!),ploidy
• High coverage,long read lengths,PE/MP libraries
Repeats in E.coli Domani vedremo una storia di successo di un genemo assembly
![Page 9: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/9.jpg)
Two bacterial genomes de Bruijn graphs
Few repeats “more” repeats
Alla fine di questa giornata non vedrete due scarabocchi, ma molto altro
![Page 10: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/10.jpg)
•
Which approaches? Greedy (“Simple” approach)
• Overlap-Layout-Consensus (Long fewer reads)
• de Bruijn graphs (Many short reads)
![Page 11: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/11.jpg)
•
Simple approach - Greedy • Pseudo code:
1. Pairwise alignment of all reads
2. Identify fragments that have largest overlap 3. Merge these
4. Repeat until all overlaps are used
• Can only resolve repeats smaller than read length
High computational cost with increasing no.reads
![Page 12: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/12.jpg)
Shredded Book Reconstruc1on • Dickens accidentally shreds the first prin1ng of A Tale of Two Ci1es
– Text printed on 5 long spools
• How can he reconstruct the text? – 5 copies x 138, 656 words / 5 words per fragment = 138k fragments – The short fragments from every copy are mixed together – Some fragments are identical
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,
It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …
It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …
It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,
It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …
It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …
It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …
![Page 13: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/13.jpg)
Greedy Reconstruc1on
It was the best of
of times, it was the
best of times, it was
times, it was the worst
was the best of times,
the best of times, it
of times, it was the
times, it was the age
It was the best of
of times, it was the
best of times, it was
times, it was the worst
was the best of times,
the best of times, it
it was the worst of
was the worst of times,
worst of times, it was
of times, it was the
times, it was the age
it was the age of
was the age of wisdom,
the age of wisdom, it
age of wisdom, it was
of wisdom, it was the
wisdom, it was the age
it was the age of
was the age of foolishness,
the worst of times, it
The repeated sequence make the correct reconstruc1on ambiguous • It was the best of 1mes, it was the [worst/age]
Model sequence reconstruc1on as a graph problem.
![Page 14: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/14.jpg)
La teoria dei Grafi
la teoria dei grafi si occupa di studiare i grafi, oggeX discre1 che perme,ono di schema1zzare una grande varietà di situazioni e di processi e spesso di consen1rne l'analisi in termini quan1ta1vi e algoritmici.
La teoria dei grafi è un modo di vedere le cose
• oggeX semplici, deX ver1ci (ver1ces) o nodi (nodes), • collegamen1 tra i ver1ci. I collegamen1 possono essere:
• orienta1, e in questo caso sono deX archi (arcs) o cammini (paths), e il grafo è de,o orientato
• non orienta1, e in questo caso sono deX spigoli (edges), e il grafo è de,o non orientato
• eventualmente da1 associa1 a nodi e/o collegamen1.
Per grafo si intende una stru,ura cos1tuita da:
![Page 15: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/15.jpg)
La stru,ura informa1ca di WikiPedia
![Page 16: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/16.jpg)
Problema dei pon1 di Königsberg
Königsberg, è percorsa dal fiume Pregel e da suoi affluen1 e presenta due estese isole che sono connesse tra di loro e con le due aree principali della ci,à da se,e pon1
Nel corso dei secoli è stata più volte proposta la ques1one se sia possibile con una passeggiata seguire un percorso che a,raversi ogni ponte una e una volta soltanto e tornare al punto di partenza
Nel 1736 Leonhard Euler affrontò tale problema, dimostrando che la passeggiata ipo1zzata non era possibile
![Page 17: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/17.jpg)
Problema dei pon1 di Königsberg
Eulero ha il merito di aver formulato il problema in termini di teoria dei grafi, astraendo dalla situazione specifica di Königsberg; innanzitu,o eliminò tuX gli aspeX con1ngen1 ad esclusione delle aree urbane delimitate dai bracci fluviali e dai pon1 che le collegano; secondariamente rimpiazzò ogni area urbana con un punto, ora chiamato ver1ce o nodo e ogni ponte con un segmento di linea, chiamato spigolo, arco o collegamento.
Eulero rappresentò la disposizione dei se,e pon1 congiungendo con altre,ante linee le qua,ro grandi zone della ci,à, come nella prima immagine. Si no1 che dai nodi A, B e D partono (e arrivano) tre pon1; dal nodo C, invece, cinque pon1. Ques1 sono i gradi dei nodi: rispeXvamente, 3, 3, 5, 3. Prima di raggiungere una conclusione, Eulero ha ipo1zzato delle situazioni diverse di zone e pon1 (nodi e collegamen1): con qua,ro nodi e qua,ro pon1 è possibile par1re, ad esempio, da A, e tornarci passando per tuX i pon1 una e una sola volta. Il grado di ciascun nodo è un numero pari. Se invece si parte da A per arrivare a D, ogni nodo è di grado pari a eccezione di due nodi, di grado dispari (uno). Sulla base di queste osservazioni, Eulero ha enunciato il seguente teorema: Un qualsiasi grafo è percorribile se e solo se ha tu5 i nodi di grado pari, o due di essi sono di grado dispari; per percorrere un grafo "possibile" con due nodi di grado dispari, è necessario par:re da uno di essi, e si terminerà sull’altro nodo dispari.
![Page 18: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/18.jpg)
Overlap Layout
Consensus
de Bruijn
Graph Theory!!!
![Page 19: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/19.jpg)
Create overlap graph by all-vs-all alignment
Contigs created based on overlap
In the graph each node is a read, edges are overlaps between reads
Overlap-Layout-Consensus
![Page 20: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/20.jpg)
• Consensus:Hamiltonian path (visit each node exactly once)
• Computationally hard problem
Overlap-Layout-Consensus
![Page 21: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/21.jpg)
Assemblers: ARACHNE, PHRAP, CAP, TIGR, CELERA
Overlap: find poten1ally overlapping reads
Layout: merge reads into con1gs and con1gs into supercon1gs
Consensus: derive the DNA sequence and correct read errors ..ACGATTACAATAGGTT..
Overlap-‐Layout-‐Consensus
![Page 22: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/22.jpg)
• Find the best match between the suffix of one read and the prefix of another
• Due to sequencing errors, need to use dynamic programming to find the op1mal overlap alignment
• Apply a filtra1on method to filter out pairs of fragments that do not share a significantly long common substring
Overlap
![Page 23: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/23.jpg)
TAGATTACACAGATTAC
TAGATTACACAGATTAC |||||||||||||||||
• Sort all k-‐mers in reads (k ~ 24)
• Find pairs of reads sharing a k-mer
• Extend to full alignment – throw away if not >95% similar
T GA
TAGA | ||
TACA
TAGT ||
Overlapping Reads
Che cos’è un k-‐mer e il k-‐mer?
![Page 24: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/24.jpg)
• A k-‐mer that appears N 1mes, ini1ates N2 comparisons
• For an Alu that appears 106 1mes à 1012 comparisons – too much
• Solu:on: Discard all k-‐mers that appear more than
t × Coverage, (t ~ 10)
Overlapping Reads and Repeats
Alu elements are the most abundant transposable elements in the human genome
![Page 25: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/25.jpg)
Create local mul1ple alignments from the overlapping reads
TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA
Finding Overlapping Reads
![Page 26: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/26.jpg)
• Correct errors using mul1ple alignment
TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA
C: 20 C: 35 T: 30 C: 35 C: 40
C: 20 C: 35 C: 0 C: 35 C: 40
• Score alignments • Accept alignments with good scores
A: 15 A: 25 A: 40 A: 25 -
A: 15 A: 25 A: 40 A: 25 A: 0
Finding Overlapping Reads (cont’d)
![Page 27: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/27.jpg)
• Repeats are a major challenge • Do two aligned fragments really overlap, or are they from two copies of a repeat?
• Solu1on: repeat masking – hide the repeats!!! • Masking results in high rate of misassembly (up to 20%)
• Misassembly means a lot more work at the finishing step
Layout
![Page 28: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/28.jpg)
• Repeats shorter than read length are OK • Repeats with more base pair differencess than sequencing error rate are OK
• To make a smaller por1on of the genome appear repe11ve, try to: – Increase read length – Decrease sequencing error rate
Repeats, Errors, and Contig Lengths
![Page 29: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/29.jpg)
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,
It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …
It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …
It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …
De Bruijn graph assembly
![Page 30: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/30.jpg)
• Dickens accidentally shreds the first prin1ng of A Tale of Two Ci1es – Text printed on 5 long spools
• How can he reconstruct the text? – 5 copies x 138, 656 words / 5 words per fragment = 138k fragments – The short fragments from every copy are mixed together – Some fragments are identical
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,
It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …
It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …
It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,
It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …
It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …
It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …
Shredded Book Reconstruc1on
![Page 31: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/31.jpg)
Greedy Reconstruc1on
It was the best of
of times, it was the
best of times, it was
times, it was the worst
was the best of times,
the best of times, it
of times, it was the
times, it was the age
It was the best of
of times, it was the
best of times, it was
times, it was the worst
was the best of times,
the best of times, it
it was the worst of
was the worst of times,
worst of times, it was
of times, it was the
times, it was the age
it was the age of
was the age of wisdom,
the age of wisdom, it
age of wisdom, it was
of wisdom, it was the
wisdom, it was the age
it was the age of
was the age of foolishness,
the worst of times, it
The repeated sequence make the correct reconstruc1on ambiguous • It was the best of 1mes, it was the [worst/age]
Model sequence reconstruc1on as a graph problem.
![Page 32: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/32.jpg)
• Dk = (V,E) • V = All length-‐k subfragments • E = Directed edges between consecu1ve subfragments
• Nodes overlap by k-‐1 words
• Locally constructed graph reveals the global sequence structure • Overlaps between sequences implicitly computed
It was the best was the best of It was the best of
Original Fragment Directed Edge
de Bruijn, 1946 Idury and Waterman, 1995 Pevzner, Tang, Waterman, 2001
de Bruijn Graph Construc1on
![Page 33: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/33.jpg)
• Can this really work? • How do we choose a value for k?
– Needs to be big enough to be unique – But repeats make it impossible to use such a large k, because en1re reads are not unique
– So pick k to be “big enough”
No need to compute overlaps!
![Page 34: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/34.jpg)
• Dickens accidentally shreds the first prin1ng of A Tale of Two Ci1es – Text printed on 5 long spools
• How can he reconstruct the text? – 5 copies x 138, 656 words / 5 words per fragment = 138k fragments – The short fragments from every copy are mixed together – Some fragments are identical
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,
It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …
It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …
It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,
It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …
It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …
It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …
Shredded Book Reconstruc1on
![Page 35: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/35.jpg)
de Bruijn Graph Assembly
the age of foolishness
It was the best
best of times, it
was the best of
the best of times,
of times, it was
times, it was the
it was the worst
was the worst of
worst of times, it
the worst of times,
it was the age
was the age of the age of wisdom,
age of wisdom, it
of wisdom, it was
wisdom, it was the
A unique Eulerian tour of the graph reconstructs the
original text
If a unique tour does not exist, try to simplify the
graph as much as possible
![Page 36: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/36.jpg)
de Bruijn Graph Assembly
the age of foolishness
It was the best of times, it
of times, it was the
it was the worst of times, it
it was the age of the age of wisdom, it was the A unique Eulerian tour of
the graph reconstructs the original text
If a unique tour does not exist, try to simplify the
graph as much as possible
1
2
It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …
![Page 37: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/37.jpg)
38
Example
TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG
AGTCGAG CTTTAGA CGATGAG CTTTAGA GTCGAGG TTAGATC ATGAGGC GAGACAG GAGGCTC ATCCGAT AGGCTTT GAGACAG AGTCGAG TAGATCC ATGAGGC TAGAGAA TAGTCGA CTTTAGA CCGATGA TTAGAGA CGAGGCT AGATCCG TGAGGCT AGAGACA TAGTCGA GCTTTAG TCCGATG GCTCTAG TCGACGC GATCCGA GAGGCTT AGAGACA TAGTCGA TTAGATC GATGAGG TTTAGAG GTCGAGG TCTAGAT ATGAGGC TAGAGAC AGGCTTT ATCCGAT AGGCTTT GAGACAG AGTCGAG TTAGATT ATGAGGC AGAGACA GGCTTTA TCCGATG TTTAGAG CGAGGCT TAGATCC TGAGGCT GAGACAG AGTCGAG TTTAGATC ATGAGGC TTAGAGA GAGGCTT GATCCGA GAGGCTT GAGACAG
Velvet / Curtain
![Page 38: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/38.jpg)
Velvet / Curtain 09.03.12 39
GTCG (1x)
Example
Read: GTCGAGG
![Page 39: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/39.jpg)
Velvet / Curtain 09.03.12 40
GTCG (1x)
TCGA (1x)
Example
Read: GTCGAGG
![Page 40: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/40.jpg)
Velvet / Curtain 09.03.12 41
GTCG (1x)
TCGA (1x)
CGAG (1x)
Example
Read: GTCGAGG
![Page 41: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/41.jpg)
Velvet / Curtain 09.03.12 42
GTCG (1x)
TCGA (1x)
CGAG (1x)
GAGG (1x)
Example
Read: GTCGAGG
![Page 42: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/42.jpg)
Velvet / Curtain 09.03.12 43
Example New read: CGAGGCT
GTCG (1x)
TCGA (1x)
CGAG (2x)
GAGG (1x)
![Page 43: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/43.jpg)
Velvet / Curtain 09.03.12 44
GTCG (1x)
TCGA (1x)
CGAG (2x)
GAGG (2x)
Example
Read: CGAGGCT
![Page 44: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/44.jpg)
Velvet / Curtain 09.03.12 45
GTCG (1x)
TCGA (1x)
CGAG (2x)
GAGG (2x)
AGGC (1x)
Example
Read: CGAGGCT
![Page 45: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/45.jpg)
Velvet / Curtain 09.03.12 46
GTCG (1x)
TCGA (1x)
GGCT (1x)
CGAG (2x)
GAGG (2x)
AGGC (1x)
Example
Read: CGAGGCT
![Page 46: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/46.jpg)
Velvet / Curtain 09.03.12 47
Example
New read: TCGACGC
GTCG (1x)
TCGA (2x)
CGAG (2x)
GAGG (2x)
AGGC (1x)
![Page 47: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/47.jpg)
Velvet / Curtain 09.03.12 48
GTCG (1x)
TCGA (2x)
CGAG (2x) CGAC (1x)
GAGG (2x) GACG (1x)
AGGC (1x) ACGC (1x)
Example
Read: TCGACGC
![Page 48: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/48.jpg)
Velvet / Curtain 09.03.12 49
AGAT (8x)
ATCC (7x)
TCCG (7x)
CCGA (7x)
CGAT (6x)
GATG (5x)
ATGA (8x)
TGAG (9x)
GATC (8x)
GATT (1x)
TAGT (3x)
AGTC (7x)
GTCG (9x)
TCGA (10x)
GGCT (11x)
TAGA (16x)
AGAG (9x)
GAGA (12x)
GACA (8x)
ACAG (5x)
GCTT (8x)
GCTC (2x)
CTTT (8x)
CTCT (1x)
TTTA (8x)
TCTA (2x)
TTAG (12x)
CTAG (2x)
AGAC (9x)
AGAA (1x)
CGAG (8x)
CGAC (1x)
GAGG (16x)
GACG (1x)
AGGC (16x)
ACGC (1x)
Example
etc…
![Page 49: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/49.jpg)
Velvet / Curtain 09.03.12 50
TAGTCGA
AGAGA TAGA
AGAT
GCTTTAG
GCTCTAG
AGACAG
AGAA
CGAG
CGACGC
GAGGCT
GATCCGATGAG
GATT
Example
After simplification…
GGCT
![Page 50: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/50.jpg)
Velvet / Curtain 09.03.12 51
Example
Tips removed…
TAGTCGA
AGAGA TAGA
AGAT
GCTTTAG
GCTCTAG
AGACAG
CGAG
GAGGCT
GATCCGATGAG
GGCT
![Page 51: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/51.jpg)
Velvet / Curtain 09.03.12 56
TAGTCGA
AGAGA TAGA
AGAT
GCTTTAG AGACAG
CGAG
GAGGCT
GATCCGATGAG
GGCT
Example
Bubbles removed… by TourBus
![Page 52: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/52.jpg)
Velvet / Curtain 09.03.12 57
TAGTCGAG AGAGACAG
AGATCCGATGAG
GAGGCTTTAGA
Example
Final simplification…
![Page 53: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/53.jpg)
Velvet / Curtain 09.03.12 58
One possible walk through the graph ...
TAGTCGAG GAGGCTTTAGA AGATCCGATGAG GAGGCTTTAGA AGAGACAG
TAGTCGAG AGAGACAG
Example TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG Final simplification…
AGATCCGATGAG
GAGGCTTTAGA
![Page 54: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/54.jpg)
Now we create a dra} assembly in con1g
But is not sufficient to understand the characteris1c of a genome
Contigs
Scaffolds
Reads
‘De Bruijn’ assembly
To go ahead we have to talk about the paired-‐end sequencing technology
![Page 55: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/55.jpg)
Paired-‐end Sequencing
![Page 56: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/56.jpg)
Scaffolding
Contigs
Scaffolds
(An assembly)
Reads ‘De Bruijn’ assembly
“Captured” gaps caused by repeats. Represented by “NNN” in assembly
Join contigs using evidence from paired end data
Align reads to DeBruijn contigs
![Page 57: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/57.jpg)
Scaffolding
![Page 58: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/58.jpg)
SUPERSCAFOLDING!!!
![Page 59: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/59.jpg)
A “real” protocol
1. Retrieve reads 2. Quality check of reads 3. Trimming and filtering 4. Assembly 5. Using paired-‐end for scaffolding 6. Check the genome quality
Reads
Overlap
Local Mul1ple Alignment
Con1gs
Scaffolding
Alignment Scoring
Finishing
Assembly Problems: -Repeats
-Chimerism
-Gaps
![Page 60: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/60.jpg)
• Number of large contigs
• Total size • Coverage
• Average length • N50
• Longest contig • % genome assembled
Important Assembler Metrics How can we asses the quality of a genome?
![Page 61: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best](https://reader036.vdocument.in/reader036/viewer/2022070917/5fb74d130d00ee1fed466b06/html5/thumbnails/61.jpg)
How can we understand if we performed a good assembly?
Species Genome size
(Mb) N50 Scaffold
index N50 scaffold size
(Mb) # scaffolds N50 contig size
(Kb) sequencing technology reference
Melon 450 26 4,678 1,594 18.2 454, Sanger this report
Potato 844 121 1,782 2,043 31,4 Illumina, 454,
Sanger The Potato Genome Sequencing Consortium
2011 Apple 743 102 1,542 1,629 13.4 Sanger, 454 Velasco et al 2010
Fragaria 240 n.a. 1,361 3,263 n.a. 454, Illumina,
SOLiD Shulaev et al 2011
Cucumber 367 59 1,144 47,837 19.8 Illumina, Sanger Huang et al 2009
Brassica rapa 529 n.a. 1,97 n.a. 27.3 Illumina
The Brassica rapa Genome Sequencing Project Consortium 2011
Cacao 430 178 0,47 4,792 19,8 454 Argout et al 2011
Date palm 658 n.a. 0,03 57,277 6.4 Illumina Al-Dous et al 2011