exact alignment with fm-index on the intel xeon phi...

32
FM-Index Search Algorithm Hardware resources Results Conclusions Exact Alignment with FM-Index on the Intel Xeon Phi Knights Landing Processor Jose M. Herruzo 1 Sonia Gonz´ alez-Navarro 1 Pablo Ib´ nez 2 ıctor Vi˜ nals 2 Jes´ us Alastruey-Bened´ e 2 Oscar Plata 1 1 Departamento de Arquitectura de Computadores Universidad de M´ alaga 2 Grupo de Arquitectura de Computadores Universidad de Zaragoza Accelerator Architecture in Computational Biology and Bioinformatics, 2018

Upload: others

Post on 02-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Exact Alignment with FM-Index on the Intel XeonPhi Knights Landing Processor

Jose M. Herruzo1 Sonia Gonzalez-Navarro1 Pablo Ibanez2

Vıctor Vinals2 Jesus Alastruey-Benede2 Oscar Plata1

1Departamento de Arquitectura de ComputadoresUniversidad de Malaga

2Grupo de Arquitectura de ComputadoresUniversidad de Zaragoza

Accelerator Architecture in Computational Biology andBioinformatics, 2018

Page 2: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Motivation

Genomic Sequencing

New SequencingTechnologies

Sequence alignment

Page 3: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

MotivationIndices

Suffix Tree

Hash Tables

FM-Index

Page 4: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

MotivationBWT & FM-Index

Burrows Wheeler Transform(BWT)

FM-Index

BowtieBWASOAP2

banana$

0 banana$

1 anana$b

2 nana$ba

3 ana$ban

4 na$bana

5 a$banan

6 $banana

Page 5: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 6: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 7: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index

Data structures

C arrayOcc table

Random memoryaccesses

Memory boundalgorithm

Backward Searchalgorithm

Algorithm: Backward Search Based on FM-index

Input: FM-index of T text (C & Occ), Q query, n:|T|, p:|Q|

Ouput: (sp,ep): Interval pointers of Q in T

begin

1: sp = C[Q[p]]

2: ep = C[Q[p]+1]

3: for i from p-1 to 1 step -1

4: sp = LF(Q[i],sp)

5: ep = LF(Q[i],ep)

6: end for

7: return (sp+1,ep)

end

2 LFop-chains

Page 8: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index

Lmem Lop

LLFM

MEMLF(Q{p-1},sp) OP

MEM OPLF(Q{p-1},ep)

MEM OP

MEM OP

LF(Q{p-2},sp)

LF(Q{p-2},ep)

idle time

Liter

Page 9: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index

Lmem Lop

LLFM

MEMLF(Q[1]{p-1},sp) OP

MEM OP

MEM OP

MEM OP

idle time

LF(Q[1]{p-1},ep)

LF(Q[1]{p-2},sp)

LF(Q[1]{p-2},ep)

MEMLF(Q[2]{p-1},sp) OP

MEM OPLF(Q[2]{p-1},ep)

OPLF(Q[Nq]{p-1},ep)

Liter

Page 10: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 11: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Sampled FM-Index

Occ

rOcc

d=4

SFM

d=4

d=4

Reduce Memory Footprint↓

Increase computingrequirements

From Ferragina, P. and Manzini, G.: ”Opportunistic DataStructures with Applications” (2000)

Page 12: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 13: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

K-Step FM-Index

Searching severalsymbols per

iteration

Increasedmemoryfootprint

Reducednumber of LFoperations

C1 4=X

A C G T

2-C1

AA AC AG AT CA CC TTGT

1

1 2 3

1

4=X

rOccA

C

G

T

16=X2

1 2 3

2-rOccAA

AC

AG

AT

CA

CC

TT

1 2 3 n+1

1

2

3

n+1

M

BWT

G C

T C

A A

A G

G T

1

2

3

n+1

GC

TC

AA

AG

GT

2-BWT

From Chacon et al. (2015)

Page 14: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Comparison

K1-32S

FM

K1-192

SFM

K1-448

SFM

K2-16S

FM

K2-128

SFM

K2-256

SFM

Implementation

10.0

12.5

15.0

17.5

20.0

22.5

25.0

27.5

30.0

SI (L

FMs/

KB

yte)

0.0 0.0 0.0

26.728.4

20.8

27.6 28.4

15.9

0.0 0.0 0.0

Page 15: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 16: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Bit-Vector FM-Index

1 Q{i} X 1 d

SFM row

m%d

F A R R A M S M…………C A F L G A O M D

1 1 d

bvSFM row

m%d

1 0 0 0 0 0 0 0……0 0 1 0 0 0 0 0 0

(m-1)/d +1

F A R M S D

F

Q{i} 1 dm%d

0 0 0 0 0 1 0 1……0 0 0 0 0 0 0 1 0

M

X 1 dm%d

0 0 0 0 0 0 0 0……0 0 0 0 0 0 0 0 1

D

Page 17: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Bit-Vector FM-Index

Advantages

Reduced data movement

Reduced computing requirements

Disadvantages

Increased memory footprint

Page 18: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Comparison

K1-32S

FM

K1-192

SFM

K1-448

SFM

K2-16S

FM

K2-128

SFM

K2-256

SFM

K2-64b

vSFM

K2-96b

vSFM

Implementation

10

20

30

40

50

60

SI (L

FMs/

KB

yte)

27.6 28.4

15.9

26.7 28.4

20.8

56.3 56.3

Page 19: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 20: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Hardware Resources

Xeon Phi 7210 Xeon E5-2630V4(KNL) (Broadwell)

64 cores @ 1.3 GHz 10 cores @ 2.2 GHz4 threads per core 2 threads per core

400 GB/s (MCDRAM) 68 GB/s (DDR4)

Intel Xeon Phi

AVX 512 VectorialProcessing Units

High Bandwidth Memory(HBM)

Cache / Hybrid / Flat

Page 21: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 22: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 23: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsXeon Broadwell RANDOM Benchmark

4 8 12 16 20 24 28 32Dependency chains per thread

30

35

40

45

50

55

60

Ban

dwid

th (G

B/s

)

37.93

43.67 43.46 43.51 43.22 43.49 43.16 43.4

44.31 44.77 45.24 45.74 46.58 46.35 45.19 45.76

1 Cache Block2 Cache Block

Page 24: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsXeon Phi KNL RANDOM Benchmark

2 3 4 5 6 8 10 12 14 16Dependency chains per thread

100

150

200

250

300

Ban

dwid

th (G

B/s

)

147.8

197.7209.9 211.3 219.6 217.2 217.6 217.8 216.9 215.5

82.3

108.0 109.6 111.4 112.4 114.5 115.3 117.2 115.0 113.6

1 Cache Block MCDRAM1 Cache Block DDR

Page 25: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 26: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsExperimental Setup

Human Genome (Around 3GBases).

20 million input queries generated by Mason.

200 symbols per Sequence.

Measurements started after loading the sequences into mainmemory.

LFM/s as performance evaluation metric

Page 27: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Results

k1-19

2SFM

k1-32

SFM

k2-12

8SFM

k2-16

SFM

k2-96

bvSFM

k2-64

bvSFM

Implementation

0

2

4

6

8

10

12

Perf

orm

ance

(GLF

M/s

)

1.7

4.3

3.2

5.15.9

11.6

0.61.1 1.0 1.2

1.8 2.0

Theoretical limitsIntel Xeon Phi 7210Intel Xeon E5-2630v4

Page 28: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsRoofline (Broadwell)

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Search Intensity (LFM/Byte)

0

2

4

6

8

10

GLF

M/s

STREAM Peak Bandwidth

RANDOM Peak Bandwidth

Peak LF Performance (8 GLFM/s)

Previous CPU Performance (0.5 GLFM/s)

Previous GPU Performance (3.8 GLFM/s)

Tesla P100 NVBIO Match (2.7 GLFM/s)

K2-16SFM (1.23 GLFM/s)K2-64bvSFM (1.97 GLFM/s)

Page 29: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

ResultsRoofline (Knights Landing)

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14Search Intensity (LFM/Byte)

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

GLF

M/s

STRE

AM

Pea

k Ba

ndw

idth

RANDOM Pe

ak B

andw

idth

Peak LF Performance (15.10 GLFM/s)

Previous CPU Performance (0.5 GLFM/s)

Previous GPU Performance (3.8 GLFM/s)Tesla P100 NVBIO Match (2.7 GLFM/s)

K2-16SFM (5.1G GLFM/s)K2-64bvSFM (11.6 GLFM/s)

Page 30: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Outline

1 FM-Index Search AlgorithmSampled FM-IndexK-Step FM-IndexBit-Vector FM-Index

2 Hardware resources

3 ResultsRANDOM BenchmarkThroughput results

4 Conclusions

Page 31: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Conclusions

Reduced data movement

Efficiently used memory bandwidth

Great performance improvement

Up to 11.7 G LFM/s3x faster than previous GPU versions

Page 32: Exact Alignment with FM-Index on the Intel Xeon Phi ...webdiis.unizar.es/~chus/pubs/2018_AACBB_FM_Index_presentac.pdf · FM-Index Search Algorithm Hardware resources Results Conclusions

FM-Index Search Algorithm Hardware resources Results Conclusions

Exact Alignment with FM-Index on the Intel XeonPhi Knights Landing Processor

Jose M. Herruzo1 Sonia Gonzalez-Navarro1 Pablo Ibanez2

Vıctor Vinals2 Jesus Alastruey-Benede2 Oscar Plata1

1Departamento de Arquitectura de ComputadoresUniversidad de Malaga

2Grupo de Arquitectura de ComputadoresUniversidad de Zaragoza

Accelerator Architecture in Computational Biology andBioinformatics, 2018