exact alignment with fm-index on the intel xeon phi...
TRANSCRIPT
FM-Index Search Algorithm Hardware resources Results Conclusions
Exact Alignment with FM-Index on the Intel XeonPhi Knights Landing Processor
Jose M. Herruzo1 Sonia Gonzalez-Navarro1 Pablo Ibanez2
Vıctor Vinals2 Jesus Alastruey-Benede2 Oscar Plata1
1Departamento de Arquitectura de ComputadoresUniversidad de Malaga
2Grupo de Arquitectura de ComputadoresUniversidad de Zaragoza
Accelerator Architecture in Computational Biology andBioinformatics, 2018
FM-Index Search Algorithm Hardware resources Results Conclusions
Motivation
Genomic Sequencing
New SequencingTechnologies
Sequence alignment
FM-Index Search Algorithm Hardware resources Results Conclusions
MotivationIndices
Suffix Tree
Hash Tables
FM-Index
FM-Index Search Algorithm Hardware resources Results Conclusions
MotivationBWT & FM-Index
Burrows Wheeler Transform(BWT)
FM-Index
BowtieBWASOAP2
banana$
0 banana$
1 anana$b
2 nana$ba
3 ana$ban
4 na$bana
5 a$banan
6 $banana
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
FM-Index
Data structures
C arrayOcc table
Random memoryaccesses
Memory boundalgorithm
Backward Searchalgorithm
Algorithm: Backward Search Based on FM-index
Input: FM-index of T text (C & Occ), Q query, n:|T|, p:|Q|
Ouput: (sp,ep): Interval pointers of Q in T
begin
1: sp = C[Q[p]]
2: ep = C[Q[p]+1]
3: for i from p-1 to 1 step -1
4: sp = LF(Q[i],sp)
5: ep = LF(Q[i],ep)
6: end for
7: return (sp+1,ep)
end
2 LFop-chains
FM-Index Search Algorithm Hardware resources Results Conclusions
FM-Index
Lmem Lop
LLFM
MEMLF(Q{p-1},sp) OP
MEM OPLF(Q{p-1},ep)
MEM OP
MEM OP
LF(Q{p-2},sp)
LF(Q{p-2},ep)
idle time
Liter
FM-Index Search Algorithm Hardware resources Results Conclusions
FM-Index
Lmem Lop
LLFM
MEMLF(Q[1]{p-1},sp) OP
MEM OP
MEM OP
MEM OP
idle time
LF(Q[1]{p-1},ep)
LF(Q[1]{p-2},sp)
LF(Q[1]{p-2},ep)
MEMLF(Q[2]{p-1},sp) OP
MEM OPLF(Q[2]{p-1},ep)
OPLF(Q[Nq]{p-1},ep)
Liter
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
Sampled FM-Index
Occ
rOcc
d=4
SFM
d=4
d=4
Reduce Memory Footprint↓
Increase computingrequirements
From Ferragina, P. and Manzini, G.: ”Opportunistic DataStructures with Applications” (2000)
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
K-Step FM-Index
Searching severalsymbols per
iteration
Increasedmemoryfootprint
Reducednumber of LFoperations
C1 4=X
A C G T
2-C1
AA AC AG AT CA CC TTGT
1
1 2 3
1
4=X
rOccA
C
G
T
16=X2
1 2 3
2-rOccAA
AC
AG
AT
CA
CC
TT
1 2 3 n+1
1
2
3
n+1
M
BWT
G C
T C
A A
A G
G T
1
2
3
n+1
GC
TC
AA
AG
GT
2-BWT
From Chacon et al. (2015)
FM-Index Search Algorithm Hardware resources Results Conclusions
FM-Index Comparison
K1-32S
FM
K1-192
SFM
K1-448
SFM
K2-16S
FM
K2-128
SFM
K2-256
SFM
Implementation
10.0
12.5
15.0
17.5
20.0
22.5
25.0
27.5
30.0
SI (L
FMs/
KB
yte)
0.0 0.0 0.0
26.728.4
20.8
27.6 28.4
15.9
0.0 0.0 0.0
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
Bit-Vector FM-Index
1 Q{i} X 1 d
SFM row
m%d
F A R R A M S M…………C A F L G A O M D
1 1 d
bvSFM row
m%d
1 0 0 0 0 0 0 0……0 0 1 0 0 0 0 0 0
(m-1)/d +1
F A R M S D
F
Q{i} 1 dm%d
0 0 0 0 0 1 0 1……0 0 0 0 0 0 0 1 0
M
X 1 dm%d
0 0 0 0 0 0 0 0……0 0 0 0 0 0 0 0 1
D
FM-Index Search Algorithm Hardware resources Results Conclusions
Bit-Vector FM-Index
Advantages
Reduced data movement
Reduced computing requirements
Disadvantages
Increased memory footprint
FM-Index Search Algorithm Hardware resources Results Conclusions
FM-Index Comparison
K1-32S
FM
K1-192
SFM
K1-448
SFM
K2-16S
FM
K2-128
SFM
K2-256
SFM
K2-64b
vSFM
K2-96b
vSFM
Implementation
10
20
30
40
50
60
SI (L
FMs/
KB
yte)
27.6 28.4
15.9
26.7 28.4
20.8
56.3 56.3
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
Hardware Resources
Xeon Phi 7210 Xeon E5-2630V4(KNL) (Broadwell)
64 cores @ 1.3 GHz 10 cores @ 2.2 GHz4 threads per core 2 threads per core
400 GB/s (MCDRAM) 68 GB/s (DDR4)
Intel Xeon Phi
AVX 512 VectorialProcessing Units
High Bandwidth Memory(HBM)
Cache / Hybrid / Flat
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
ResultsXeon Broadwell RANDOM Benchmark
4 8 12 16 20 24 28 32Dependency chains per thread
30
35
40
45
50
55
60
Ban
dwid
th (G
B/s
)
37.93
43.67 43.46 43.51 43.22 43.49 43.16 43.4
44.31 44.77 45.24 45.74 46.58 46.35 45.19 45.76
1 Cache Block2 Cache Block
FM-Index Search Algorithm Hardware resources Results Conclusions
ResultsXeon Phi KNL RANDOM Benchmark
2 3 4 5 6 8 10 12 14 16Dependency chains per thread
100
150
200
250
300
Ban
dwid
th (G
B/s
)
147.8
197.7209.9 211.3 219.6 217.2 217.6 217.8 216.9 215.5
82.3
108.0 109.6 111.4 112.4 114.5 115.3 117.2 115.0 113.6
1 Cache Block MCDRAM1 Cache Block DDR
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
ResultsExperimental Setup
Human Genome (Around 3GBases).
20 million input queries generated by Mason.
200 symbols per Sequence.
Measurements started after loading the sequences into mainmemory.
LFM/s as performance evaluation metric
FM-Index Search Algorithm Hardware resources Results Conclusions
Results
k1-19
2SFM
k1-32
SFM
k2-12
8SFM
k2-16
SFM
k2-96
bvSFM
k2-64
bvSFM
Implementation
0
2
4
6
8
10
12
Perf
orm
ance
(GLF
M/s
)
1.7
4.3
3.2
5.15.9
11.6
0.61.1 1.0 1.2
1.8 2.0
Theoretical limitsIntel Xeon Phi 7210Intel Xeon E5-2630v4
FM-Index Search Algorithm Hardware resources Results Conclusions
ResultsRoofline (Broadwell)
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Search Intensity (LFM/Byte)
0
2
4
6
8
10
GLF
M/s
STREAM Peak Bandwidth
RANDOM Peak Bandwidth
Peak LF Performance (8 GLFM/s)
Previous CPU Performance (0.5 GLFM/s)
Previous GPU Performance (3.8 GLFM/s)
Tesla P100 NVBIO Match (2.7 GLFM/s)
K2-16SFM (1.23 GLFM/s)K2-64bvSFM (1.97 GLFM/s)
FM-Index Search Algorithm Hardware resources Results Conclusions
ResultsRoofline (Knights Landing)
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Search Intensity (LFM/Byte)
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
GLF
M/s
STRE
AM
Pea
k Ba
ndw
idth
RANDOM Pe
ak B
andw
idth
Peak LF Performance (15.10 GLFM/s)
Previous CPU Performance (0.5 GLFM/s)
Previous GPU Performance (3.8 GLFM/s)Tesla P100 NVBIO Match (2.7 GLFM/s)
K2-16SFM (5.1G GLFM/s)K2-64bvSFM (11.6 GLFM/s)
FM-Index Search Algorithm Hardware resources Results Conclusions
Outline
1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index
2 Hardware resources
3 ResultsRANDOM BenchmarkThroughput results
4 Conclusions
FM-Index Search Algorithm Hardware resources Results Conclusions
Conclusions
Reduced data movement
Efficiently used memory bandwidth
Great performance improvement
Up to 11.7 G LFM/s3x faster than previous GPU versions
FM-Index Search Algorithm Hardware resources Results Conclusions
Exact Alignment with FM-Index on the Intel XeonPhi Knights Landing Processor
Jose M. Herruzo1 Sonia Gonzalez-Navarro1 Pablo Ibanez2
Vıctor Vinals2 Jesus Alastruey-Benede2 Oscar Plata1
1Departamento de Arquitectura de ComputadoresUniversidad de Malaga
2Grupo de Arquitectura de ComputadoresUniversidad de Zaragoza
Accelerator Architecture in Computational Biology andBioinformatics, 2018