resolve: generation of high‐performance sorting...
TRANSCRIPT
![Page 1: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/1.jpg)
Resolve: Generation of High‐Performance Sorting Architectures from High‐Level Synthesis
Janarbek Matai, Dustin Richmond, Dajung Lee, Zac Blair, Qiongzhi Wu, Amin Abazari, Ryan Kastner
Department of Computer Science and Engineering University of California, San Diego
02/23/2016
![Page 2: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/2.jpg)
Motivation: Sorting is everywhere
[2] Microsoft, “Xpress Compression Algorithm”[3] Brajovic et al. “A Sorting Image Sensor”, ICRA’96
Database [1] Compression/ image processing [2, 3]
~31,500 records (UCSD) <1000 records > 1 Billion records (Facebook)Map Reduce and Big Data[4,5]
[1]Casper et al. "Hardware acceleration of database operations", FPGA''14
[4] Dean et al. “MapReduce: simplified data processing on large clusters”, Communications of ACM’08[5] Herodotou et al. "Starfish: A Self‐tuning System for Big Data Analytics", CIDR'11
![Page 3: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/3.jpg)
Motivation: Sorting is everywhere
Database [1] Compression and image processing [2, 3] Map Reduce and Big Data[4,5]
~31,500 records (UCSD) <1000 records
Resolve provides FPGA version of std:: sortstd:: sort uses insertion sort for <16 std::sort uses quicksort + (heapsort) for larger arrays
> 1 Billion records (Facebook)
Resolve aims to generate sorting architectures based on user constraints (e.g., size, frequency, area,…)
![Page 4: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/4.jpg)
Resolve built upon HLS
Database [1] Compression and image processing [2, 3] Map Reduce and Big Data[4,5]
~31,500 records (UCSD) <1000 records > 1 Billion records (Facebook)
Resolve aims to generate sorting architectures based on user constraints (e.g., size, frequency, area,…)
Resolve provides FPGA version of std:: sortstd:: sort uses insertion sort for <16 std::sort uses quicksort + (heapsort) for larger arrays
![Page 5: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/5.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
![Page 6: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/6.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Insertion Sort
Merge sort
BitonicSort…
Based on function composition:g([]) [sorted]f(g[],g[]) [sorted]
![Page 7: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/7.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Merge Unit
Merge Unit
Merge Unit
![Page 8: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/8.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Sorting Algorithms Sorting Primitive Kernels
Merge Sort Merge unit, compare‐swap
![Page 9: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/9.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Sorting Algorithms Sorting Primitive Kernels
Merge Sort Merge unit, compare‐swap
Radix sort Prefix sum, Histogram
![Page 10: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/10.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Sorting Algorithms Sorting Primitive Kernels
Merge Sort Merge unit, compare‐swap
Radix sort Prefix sum, Histogram
Insertion sort Compare‐swap, Smart Cell
![Page 11: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/11.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Sorting Algorithms Sorting Primitive Kernels
Merge Sort Merge unit, compare‐swap
Radix sort Prefix sum, Histogram
Insertion sort Compare‐swap, Smart Cell
Bubble sort Compare‐swap,
Selection sort Compare‐swap
Quick Sort Compare‐swap, Prefix sum
Counting Sort Prefix sum, Histogram
Rank Sort Histogram, compare‐swap
Bitonic Sort Compare‐swap
Odd‐even transposition sort Compare‐swap (Merge)
![Page 12: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/12.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Insertion Sort
Merge sort
BitonicSort…
Merge Unit
Smart‐Cell
Prefix sum…
Optimized Sorting Algorithms (Templates)
![Page 13: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/13.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Insertion Sort
Merge sort
BitonicSort…
Merge Unit
Smart‐Cell
Prefix sum…
Optimized Sorting Algorithms (Templates)
![Page 14: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/14.jpg)
Optimized Sorting Algorithms
Optimized Sorting Algorithms = Restructured Code + directives
Optimized Sorting Algorithms is a library of HLS sorting code
![Page 15: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/15.jpg)
Optimized Sorting Algorithms
Why we do this way ?
![Page 16: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/16.jpg)
Insertion Sort
2 5 13
![Page 17: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/17.jpg)
Insertion Sort
2 5 13
![Page 18: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/18.jpg)
Insertion Sort
3 2 5 1
2 5 13
![Page 19: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/19.jpg)
Insertion Sort
3 2 5 1
2 5 13
![Page 20: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/20.jpg)
Insertion Sort
3 2 5 1
2 5 13
2 3 5 1
![Page 21: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/21.jpg)
Insertion Sort
3 2 5 1
2 5 13
2 3 5 1
![Page 22: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/22.jpg)
Insertion Sort
3 2 5 1
2 5 13
2 3 5 1
2 3 5 1
![Page 23: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/23.jpg)
Insertion Sort
3 2 5 1
2 5 13
2 3 5 1
2 3 5 1
![Page 24: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/24.jpg)
Insertion Sort
3 2 5 1
2 5 13
2 3 5 1
2 3 5 1
1 2 3 5
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
n = 32
![Page 25: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/25.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
n = 32
![Page 26: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/26.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
n = 32
![Page 27: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/27.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
#pragma HLS PIPELINE II=1
n = 32
![Page 28: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/28.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
#pragma HLS PIPELINE II=1
n = 32
![Page 29: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/29.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
#pragma HLS PIPELINE II=1
n = 32
![Page 30: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/30.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
#pragma HLS PIPELINE II=1 #pragma HLS PARTITION numbers
n = 32
![Page 31: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/31.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
#pragma UNROLL FACTOR= 4
n = 32
![Page 32: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/32.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
We want to be here
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
#pragma UNROLL FACTOR= 4
n = 32
![Page 33: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/33.jpg)
Insertion Sort
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6Area
Throughput (Million samples/s)
void InsertionSort(DTYPE numbers[n]){
int i, j, index;for (i = 1; i < n; i++){
index = numbers[i];j = i;while ((j > 0) && (numbers[j‐1] > index)){
numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
#pragma UNROLL FACTOR= 4
n = 32
![Page 34: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/34.jpg)
251 3 25 13
Insertion Sort: Hardware
![Page 35: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/35.jpg)
251 3
Insertion Sort: Hardware
![Page 36: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/36.jpg)
51 32
Insertion Sort: Hardware
![Page 37: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/37.jpg)
51 3 2
Insertion Sort: Hardware
![Page 38: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/38.jpg)
1 3 2
Insertion Sort: Hardware
5
![Page 39: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/39.jpg)
1 25 3
Insertion Sort: Hardware
![Page 40: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/40.jpg)
251 3 25 13
void cell(hls::stream<DTYPE> &IN, hls::stream<DTYPE> &OUT){static DTYPE CURR_REG=0;DTYPE IN_A=IN.read();
if(IN_A>CURR_REG) {OUT.write(CURR_REG);CURR_REG = IN_A;
}else {
OUT.write(IN_A);}
}
Insertion Sort: Hardware
![Page 41: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/41.jpg)
251 3 25 13
void cell(hls::stream<DTYPE> &IN, hls::stream<DTYPE> &OUT){static DTYPE CURR_REG=0;DTYPE IN_A=IN.read();
if(IN_A>CURR_REG) {OUT.write(CURR_REG);CURR_REG = IN_A;
}else {
OUT.write(IN_A);
}}
void InsertionSort(hls::stream<DTYPE> &INPUT, hls::stream<DTYPE> &OUT){
#pragma HLS DATAFLOWhls::stream<DTYPE> out0("out0_stream");hls::stream<DTYPE> out1("out1_stream");
// Function calls;cell0(INPUT, out1);cell1(out1, out2);… … … cell31(out30, OUT);};
Insertion Sort: Hardware
![Page 42: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/42.jpg)
Insertion Sort: Software vs. Restructured
void InsertionSort(DTYPE numbers[n]){int i, j, index;for (i=1; i < n; i++){index = numbers[i];j = i;while ((j > 0)&& (numbers[j‐1] > index) ){
#pragma HLS PIPELINE II=1numbers[j] = numbers[j‐1];j = j ‐ 1;
}numbers[j] = index;
}}
void InsertionSort(hls::stream<DTYPE> &INPUT, hls::stream<DTYPE> &OUT){
#pragma HLS DATAFLOWhls::stream<DTYPE> out0("out0_stream");hls::stream<DTYPE> out1("out1_stream");
// Function calls;cell0(INPUT, out1);cell1(out1, out2);
cell31(out30, OUT);};
>>n2 2n
![Page 43: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/43.jpg)
Insertion Sort: Software vs. Restructured
void InsertionSort(hls::stream<DTYPE> &INPUT, hls::stream<DTYPE> &OUT){
#pragma HLS DATAFLOWhls::stream<DTYPE> out0("out0_stream");hls::stream<DTYPE> out1("out1_stream");
// Function calls;cell0(INPUT, out1);cell1(out1, out2);
cell31(out30, out2);};
Resolve library = optimized HLS sorting routines
Sorting Algorithms
Merge Sort
Radix sort
Insertion sort
Bubble sort
Selection sort
Quick Sort
Counting Sort
Rank Sort
Bitonic Sort
Odd‐even transposition sort
![Page 44: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/44.jpg)
Resolve: A Domain‐Specific Framework
Sorting Architecture GeneratorConstraints (e.g., size)
Custom Sorter
Insertion Sort
Merge sort
BitonicSort…
Merge Unit
Smart‐Cell
Prefix sum…
Optimized Sorting Algorithms (Templates)
![Page 45: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/45.jpg)
Resolve: A Domain‐Specific Framework
Insertion Sort
Merge sort
BitonicSort…
Merge Unit
Smart‐Cell
Prefix sum…
Optimized Sorting Algorithms (Templates)
(sort, sort) new sort
![Page 46: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/46.jpg)
Resolve: A Domain‐Specific Framework A[]
Insertion Radix Sort Bitonic sort….
Primitive :: = | Selection| Rank| Insertion| Bubble| Merge| Quick| Radix| Odd‐even| Bitonic
![Page 47: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/47.jpg)
Resolve: A Domain‐Specific Framework Primitive :: =
| Selection| Rank| Insertion| Bubble| Merge| Quick| Radix| Odd‐even| Bitonic
Sort :: = | Primitive| Merge Sort Sort
A[]
Insertion Radix Sort Bitonic sort….
A[]/2
Insertion
A[]/2
Insertion
Merge
![Page 48: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/48.jpg)
Resolve: A Domain‐Specific Framework A[]
Insertion Radix Sort Bitonic sort….
A[]/2
Radix Sort
A[]/2
Radix Sort
Merge
Primitive :: = | Selection| Rank| Insertion| Bubble| Merge| Quick| Radix| Odd‐even| Bitonic
Sort :: = | Primitive| Merge Sort Sort
![Page 49: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/49.jpg)
Resolve: A Domain‐Specific Framework A[]
Insertion Radix Sort Bitonic sort….
A[]/4
Insertion
A[]/4
Insertion
Merge
A[]/4
Insertion
A[]/4
Insertion
Merge
Merge
Primitive :: = | Selection| Rank| Insertion| Bubble| Merge| Quick| Radix| Odd‐even| Bitonic
Sort :: = | Primitive| Merge Sort Sort
![Page 50: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/50.jpg)
Resolve: A Domain‐Specific Framework
conf=Configuration(‘xc7vx1140’, ‘10’,…)
# Call data =[2,3,1]HW=sort_fpga(input_data=data);….
from components import RadixSortfrom components import InsertionSort….
Vivado HLS Tools
[1,2,3]
ResolveInput_data
![Page 51: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/51.jpg)
Resolve: Utilization vs. Performance
![Page 52: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/52.jpg)
Resolve: Utilization vs. Performance
BRAMs >2000
BRAMs >1000
BRAMs >500
BRAMs ~250
![Page 53: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/53.jpg)
Resolve: Utilization vs. PerformanceParameters:SizePipeline or unpipelineClock period
![Page 54: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/54.jpg)
Resolve: Utilization vs. PerformanceParameters:SizePipeline or unpipelineClock period
![Page 55: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/55.jpg)
Prototype of ResolveToolsVivado Suite and Vivado HLS
Hardware platform:VC709Xc7vx690tffg1761‐2RIFFA 2.2
Performance~ 0.5 GB/s at 125 MHz
Number of elements
FF / LUT BRAM IP Core II
RIFFA N / A 19472 / 16395 71 N / A
RIFFA + Resolve 16384 25118 / 20368 141 18434
RIFFA + Resolve 65536 26353 / 21707 333 73730
![Page 56: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/56.jpg)
Conclusion & Future Work
Resolve makes it easy to generate and use flexible (customized)sorting accelerators
Sorting accelerator can be integrated with RIFFA framework
Testing with http://sortbenchmark.org/Release of Resolve
![Page 57: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/57.jpg)
Backup slides
![Page 58: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/58.jpg)
Comparison to Previous Work
SN1 and SN3 are high performance with larger areaSN2 and SN4 balance area and performance SN5 is optimized for area.
![Page 59: Resolve: Generation of High‐Performance Sorting ...cseweb.ucsd.edu/~jmatai/presentations/FPGA2016_Resolve.pdfFeb 23, 2016 · Resolve built upon HLS Database [1] Compression and](https://reader036.vdocument.in/reader036/viewer/2022071219/6055856d2953fa66006ca66c/html5/thumbnails/59.jpg)
Insertion Sort: Hardware