accelerating a bwt-based exact search on multi-gpu ... · pdf filemississippi$ ipssm$pissii ....
TRANSCRIPT
![Page 1: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/1.jpg)
David Nogueira
Accelerating a BWT-based exact
search on multi-GPU heterogeneous
computing platforms David Alberto Baião da Constantina Jácome Nogueira
22/11/2014 1 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms
November 14, 2014
Supervisor : Co-supervisor:
Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering
Chairperson : Supervisor: Member of the Committee:
Doctor Nuno Cavaco Gomes Horta Doctor Nuno Filipe Valentim Roma Doctor Luís Manuel Silveira Russo
Examination Committee
Doctor Nuno Filipe Valentim Roma Doctor Pedro Filipe Zeferino Tomás
![Page 2: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/2.jpg)
David Nogueira
• Introduction
• Motivation
• Problem definition
• Objectives
• Algorithmic background
• Proposed solution
• Experimental evaluation
• Conclusions and Future Work
Index
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 2
![Page 3: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/3.jpg)
David Nogueira
• Exact string matching is of extremely importance in several domains:
• Bioinformatics;
• Pattern recognition;
• Document matching and text mining;
• Detecting plagiarism;
• Intrusion detection systems (IDS);
• Image and signal processing, etc.
Motivation
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 3
![Page 4: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/4.jpg)
David Nogueira
• Exact string matching
• Input:
• Reference string (length m)
• Short string, called pattern or query (length n, n << m)
• Output:
• List of all occurrences of pattern in reference string
Problem definition
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 4
![Page 5: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/5.jpg)
David Nogueira
• Development of a pattern matching tool:
• for parallel offline exact search
• BWT and the FM-Index
• Mainly targeting GPU cards
• Scale throughput with the number of GPUs
• Heterogeneous platforms
• e.g., CPUs and GPUs
BowMapCL
Objectives
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 5
![Page 6: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/6.jpg)
David Nogueira
• Introduction
• Algorithmic background
• Indexed string matching
• How to create the index?
• How to search with the index?
• Proposed solution
• Experimental evaluation
• Conclusions and Future Work
Index
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 6
![Page 7: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/7.jpg)
David Nogueira
• Sequentially searching the string directly in the reference text, without any data structure to support it (online search);
• Through an indexed approach: takes as input a previously computed data structure called index (offline search):
• leads to a reduction in the search execution time;
• Index-based approaches:
• Hash-tables;
• Suffix trees;
• Suffix arrays;
• Burrows-Wheeler Transform and FM-index (Chosen approach).
Indexed string matching
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 7
![Page 8: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/8.jpg)
David Nogueira
Burrows-Wheeler Transform
How to create the index? Algorithmic background: Burrows-Wheeler Transform
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 8
ipssm$pissii mississippi$
![Page 9: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/9.jpg)
David Nogueira
How to create the index? Algorithmic background: FM-index
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 9
C vector C(x) presents number of lexicographically smaller characters than character x in text
OCC matrix Occ(x,y) presents the number of occurrences of character x in the prefix BWT(T’)[1…y]
Two additional data structures are created using the BWT output:
BWT(T’)=BWT(mississippi$)=ipssm$pissii
![Page 10: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/10.jpg)
David Nogueira
How to search with the index? Algorithmic background: Backward search algorithm
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 10
![Page 11: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/11.jpg)
David Nogueira
How to search with the index? Algorithmic background: Backward search algorithm
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 11
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
Query string: “ssi”
L
F
FIRST=1 LAST=12
![Page 12: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/12.jpg)
David Nogueira
How to search with the index? Algorithmic background: Backward search algorithm
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 12
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
Query string: “ssi”
L
L
F F
FIRST=1 LAST=12
FIRST=2 LAST=5
![Page 13: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/13.jpg)
David Nogueira
How to search with the index? Algorithmic background: Backward search algorithm
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 13
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
Query string: “ssi”
L
L
L
F F
F
FIRST=1 LAST=12
FIRST=2 LAST=5
FIRST=9 LAST=10
![Page 14: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/14.jpg)
David Nogueira
How to search with the index? Algorithmic background: Backward search algorithm
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 14
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
$mississippi
i$mississipp
ippi$mississ
issippi$miss
ississippi$m
mississippi$
pi$mississip
ppi$mississi
sippi$missis
sissippi$mis
ssippi$missi
ssissippi$mi
Query string: “ssi”
L
L
L
L
F F
F
F
FIRST=1 LAST=12
FIRST=2 LAST=5
FIRST=9 LAST=10
FIRST=11 LAST=12
![Page 15: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/15.jpg)
David Nogueira
How to search with the index? Algorithmic background: Output conversion with SA
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 15
![Page 16: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/16.jpg)
David Nogueira
• Introduction
• Algorithmic background
• Proposed solution
• Main challenges
• General purpose hardware architectures comparison
• Proposed parallelization approach
• Multiple buffering and events dependencies graph
• Index partitioning
• Other enhancements
• Experimental evaluation
• Conclusions and Future Work
Index
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 16
![Page 17: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/17.jpg)
David Nogueira
Main challenges to overcome:
• Unpredictable/irregular memory access pattern
• How to explore existent parallelism – architectures comparison
• Cope with reduced GPU memory space
• Scale the throughput with number of GPUs
• Overlap computation and data transfers in GPUs
• Reduce I/O operations overhead
• Accept any data type alphabet
• Accept any input file size
Main challenges
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 17
![Page 18: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/18.jpg)
David Nogueira
General purpose hardware architectures comparison
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 18
Intel Haswell architecture
CPU
GPU
• Faster clock frequency
• Low-latency access to memory (due
to multiple levels of cache)
• Complex control for out-of-order and
speculative execution
• Data parallel
instructions
• High-throughput
computation NVIDIA Kepler 110 architecture
![Page 19: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/19.jpg)
David Nogueira
Proposed parallelization approach
Asynchronous multi-threaded host execution
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 19
![Page 20: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/20.jpg)
David Nogueira
Proposed parallelization approach
Asynchronous multi-threaded host execution
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 20
Usage of: • a producer-consumer scheme with multiple threads dedicated to each GPU device and to the involved I/O operations • a multiple buffering technique
Allowed to: • overlap the I/O operations from disk to main memory with the string matching procedure • overlap the OpenCL data transfers between the host device and the target devices with the kernel execution on those same devices.
![Page 21: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/21.jpg)
David Nogueira
Multiple buffering and events dependencies graph
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 21
a) Multiple buffering scheme is not used. b) Multiple (double) buffering scheme is used.
![Page 22: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/22.jpg)
David Nogueira
What if the index size happens to be larger than
the device memory? Index partitioning
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 22
• No longer presents any relevant restrictions:
• on the host memory size (RAM),
• nor on the global memory of the GPUs;
• It is independent of the size of the considered reference input.
![Page 23: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/23.jpg)
David Nogueira
• To the kernel:
• Local memory usage
• Coalesced memory accesses
• To the index data structures:
• OCC matrix bitmap encoding and sampling
• SA sampling and on-the-fly compution
Other enhancements
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 23
![Page 24: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/24.jpg)
David Nogueira
• Introduction
• Algorithmic background
• Proposed solution
• Experimental evaluation
• OCC matrix bit encoding and index sampling
• CPU-based tools comparison
• CUDA GPU-based tools comparison
• Index and query set size scalability
• GPU scalability and multiple buffering study
• Load balancing
• Conclusions and Future Work
Index
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 24
![Page 25: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/25.jpg)
David Nogueira
OCC matrix bit encoding and index sampling
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 25
Memory footprint reduction and runtime performance variation
![Page 26: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/26.jpg)
David Nogueira
CPU-based tools comparison
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 26
![Page 27: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/27.jpg)
David Nogueira
CUDA GPU-based tools comparison (1): CUSHAW
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 27
![Page 28: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/28.jpg)
David Nogueira
CUDA GPU-based tools comparison (2): HPG-BWT
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 28
![Page 29: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/29.jpg)
David Nogueira
CUDA GPU-based tools comparison (3): SOAP3
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 29
![Page 30: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/30.jpg)
David Nogueira
(100M queries) E. Coli genome Human Chromosome 1
Execution time (s) 26,03 s 28,95 s
Index and query set size scalability
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 30
(Human Chromosome 1) 10 M queries 100 M queries
Execution time (s) 2,95 s 28,95 s
Variable index size, fixed query set size
Fixed index size, variable query set size
![Page 31: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/31.jpg)
David Nogueira
(100M queries) E. Coli genome Human Chromosome 1
Execution time (s) 26,03 s 28,95 s
Index and query set size scalability
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 31
(Human Chromosome 1) 10 M queries 100 M queries
Execution time (s) 2,95 s 28,95 s
Variable index size, fixed query set size
Fixed index size, variable query set size
Almost same performance, with different index size and same query set size.
![Page 32: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/32.jpg)
David Nogueira
(100M queries) E. Coli genome Human Chromosome 1
Execution time (s) 26,03 s 28,95 s
Index and query set size scalability
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 32
(Human Chromosome 1) 10 M queries 100 M queries
Execution time (s) 2,95 s 28,95 s
Variable index size, fixed query set size
Fixed index size, variable query set size
It shows scalability regarding the query set size (fixed index).
![Page 33: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/33.jpg)
David Nogueira
GPU scalability and multiple buffering study
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 33
![Page 34: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/34.jpg)
David Nogueira
GPU scalability and multiple buffering study
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 34
It shows that:
• by using more than one set of buffers it is possible to overlap multiple
concurrent operations in the same device:
• resulting in a speed-up of around 2× when using two buffers; • by using a higher number of buffers it is possible to further exploit the GPU
spatial resources.
![Page 35: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/35.jpg)
David Nogueira
Load balancing
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 35
![Page 36: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/36.jpg)
David Nogueira
Load balancing
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 36
![Page 37: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/37.jpg)
David Nogueira
• Introduction
• Algorithmic background
• Proposed solution
• Experimental evaluation
• Conclusions and Future Work
• Conclusions
• Future work
• Contributions
Index
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 37
![Page 38: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/38.jpg)
David Nogueira
• Major contributions of the proposed solution (BowMapCL):
• Performance speed-ups between 10x and 15x when compared with the state-of-art CPU tools and between 1.5x and 5x with state-of-art GPU tools;
• Linear scaling of the offered throughput with the number of GPU devices;
• Efficient load balanced execution among the several devices in the heterogeneous platforms
• Agnostic in what concerns the accepted data input (i.e., DNA, proteins, text)
• Supports any number of queries with no limitation in the input reference size
Conclusions
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 38
![Page 39: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/39.jpg)
David Nogueira
• Extend tool to non-exact search or integrate in pipeline of sequence alignment tool;
• Expand the tool to support text in other encodings (besides ASCII);
• Use MPI to run the application in multiple nodes;
• Allow automatic optimization of execution in other OpenCL devices (besides GPUs).
Future Work
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 39
![Page 40: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/40.jpg)
David Nogueira
• David Nogueira, Pedro Tomás, Nuno Roma, “Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform”, International Conference on High Performance Computing & Simulation (HPCS 2014), Bologna - Italy, July 2014.
• David Nogueira, Pedro Tomás, Nuno Roma, “BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2014 (under review).
Contributions
22/11/2014 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms 40
![Page 41: Accelerating a BWT-based exact search on multi-GPU ... · PDF filemississippi$ ipssm$pissii . David Nogueira How to create the index? Algorithmic background: FM-index 22/11/2014 Accelerating](https://reader034.vdocument.in/reader034/viewer/2022051507/5a782a157f8b9a9c548e951f/html5/thumbnails/41.jpg)
David Nogueira
Accelerating a BWT-based exact
search on multi-GPU heterogeneous
computing platforms David Alberto Baião da Constantina Jácome Nogueira
22/11/2014 41 Accelerating a BWT-based exact search on multi-GPU
heterogeneous computing platforms
November 14, 2014
Supervisor : Co-supervisor:
Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering
Chairperson : Supervisor: Member of the Committee:
Doctor Nuno Cavaco Gomes Horta Doctor Nuno Filipe Valentim Roma Doctor Luís Manuel Silveira Russo
Examination Committee
Doctor Nuno Filipe Valentim Roma Doctor Pedro Filipe Zeferino Tomás