template design © 2008 a comparison-free sorting algorithm saleh abdel-hafeez 1 and ann...

1
TEMPLATE DESIGN © 2008 www.PosterPresentations.co m A Comparison-Free Sorting Algorithm Saleh Abdel-hafeez 1 and Ann Gordon-Ross 2 1 Jordan University of Science and Technology; IRBID 22110, Jordan, [email protected] 2 University of Florida; FL. 36211, USA, [email protected] Introduction Sorting algorithms have been widely researched for decades due to the ubiquitous need for sorting in myriad application domains. Much research has focused on moving data iteratively (back-forth) between comparison units and memory components which were realizes in a form of main memory, caches, or registers. Current custom ICs sorting involving numerous shift, swap, comparison operations, and complicated control logic which does not scale well with big data. Besides, consume large power due to substantial activities of data computations. Due to the ever-increasing computational power of parallel CPU- CORE and GPU-based processing, much research has focused on harnessing the computational power of these resources for efficient sorting. A new and a novel sorting algorithm is proposed that sorts data elements without any comparison operations between the elements —comparison-free sorting— with an overall sorting time of 2N clock cycles for N data elements. In addition, the algorithm supports parallelisms computations that utilizes existing parallel recourses. Results N = Number of Elements M = Element binary bit size E = Order Hamming Representation of Binary Number 90nm TSMC Technology with 1V Power Supply Clock Write Cycle Time: CLKW = TD + TC + TW Clock Read Cycle Time :CLKR = T C + T R + T B + T S SRAM Based on 8T-Cell with (NxK) T Parallel Counter with N states based on State Look-ahead Binary-to-Hamming Decoder with N input is One-Hot Decoder Serial Shift Buffer with N Registers Each Register has m-bit size Clock Period: 0.5GHz for N=K=1024 Elements and m=10-bits Methods The input bus is m-bit carrying N=2 m binary data elements Memory stores data elements in Hamming representation of size K=N for lossless type by converting Binary-to-Hamming For Example: input bus of size m=4- bits, we have N=16 elements and size of lossless Hamming K=16-bits: 0111 2 => 0000000001000000 0101 2 => 0000000000010000 Shift buffer stores data in binary column format m-bits Multiply transposed Hamming data (Memory) by Shift buffer (Binary Column) Multiplication operator is a simple gated logic since each bit in memory Enable-Disable associated binary data elements in binary column shift buffer References Figure 1: Sorting example considering a 4-bit data input bus. 0 1 0 0 0 1 1 0 0 InputB uffers H am m ing M axim um O rderM atrix (E) Each elem entofsize 1-bit 3 1 4 Transposed H am m ing M axim um O rderM atrix (E T ) Each elem entofsize 1-bit B inary M atrix Each elem entofsize 4-bit B inary M atrix Each elem entofsize 4-bit Sorted M atrix Each elem entofsize 4-bit 4-bit 0 0 0 0 0 1 0 2 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 3 1 4 2 4 3 1 2 3 0 1 1 2 2 4 3 Sequentially B inary D ata N um ber m =10-bit inputbus provides N =1024 distinct elements Binary-to-Ham ming C onverter K =1024-bitB us 1024-bitB us SerialShiftB ufferis m ade ofsize N =1024 registers and each registerofsize 10-bit 1024-bitB us Sorted ShiftB ufferis m ade ofsize K =1024 and each registerofsize 10-bit E 00 E 01 E 0N E 10 E 11 E 1N E K0 E K1 E KN Transposed H am m ing M axim um O rderM atrix K (1024-bit)X N (1024-bit) Figure 3: Block diagram of our sorting algorithm's hardware data path 1. Enzo Mumolo, Gabriele Capello, and Massimiliano Nolich, VHDL Design of a Scalable VLSI Sorting Device Based on Pipelined Computation, Journal of Computing and Information Technology, Vol. 12, pp. 1-14, 2004. 2. A. A. Colavita, A Cicuttin, F. Fratnik, and G. Capello, SORTCHIP: A VLSI Implementation of a Hardware Algorithm for Continuous data Sorting, IEEE Journal of Solid-State Circuits, Vol. 38, No. , pp. 1076-1079, June 2003. 3. Li Xiao, Xiaodong Zhang, Stefan A. Kubricht, Improving Memory Performance of Sorting Algorithms, ACM Journal on Experimental Algorithmics, Vol. 5, 1-21, 2000. 4. L. M. Busse, M. H. Chehreghani, J. M. Buhmann, The Information Content in Sorting Algorithms, IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 2746-2750, 2012. 5. Saleh Abdel-Hafeez and Anas Matalkah, "CMOS Eight- Transistor Memory Cell for Low-Dynamic-Power High-speed Embedded SRAMS,” Journal of Circuits, Systems and Computers, Vol. 17, No. 5, pp. 845-863,Oct. 2008. 6. Saleh Abdel-Hafeez and Ann Gordon-Ross, "A Digital CMOS 1. Input: integer Element[0 : n – 1] 2. Output: integer Sorted[0 : n – 1] 3. Hamming memory: Boolean H[0 : n – 1][ 0 : n – 1] initialize to zero 4. while i < n-1 do 5. H[i][Element[i]-1] 1 6. endwhile 7. k 0 8. while j >= 0 do 9. while i < n-1 do 10. then if H[i][j] = 1 11. then Sorted[k] Element[0: n-1] 12. k k+1 13. endif 14. endwhile 15. endwhile Figure 2: Pseudo code for our sorting algorithm assuming a uniprocessor system with no threading

Upload: erika-nash

Post on 08-Jan-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TEMPLATE DESIGN © 2008  A Comparison-Free Sorting Algorithm Saleh Abdel-hafeez 1 and Ann Gordon-Ross 2 1 Jordan University of

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

A Comparison-Free Sorting AlgorithmSaleh Abdel-hafeez1 and Ann Gordon-Ross2

1Jordan University of Science and Technology; IRBID 22110, Jordan, [email protected] 2University of Florida; FL. 36211, USA, [email protected]

Introduction Sorting algorithms have been widely researched for decades

due to the ubiquitous need for sorting in myriad application domains.

Much research has focused on moving data iteratively (back-forth) between comparison units and memory components which were realizes in a form of main memory, caches, or registers.

Current custom ICs sorting involving numerous shift, swap, comparison operations, and complicated control logic which does not scale well with big data. Besides, consume large power due to substantial activities of data computations.

Due to the ever-increasing computational power of parallel CPU-CORE and GPU-based processing, much research has focused on harnessing the computational power of these resources for efficient sorting.

A new and a novel sorting algorithm is proposed that sorts data elements without any comparison operations between the elements —comparison-free sorting— with an overall sorting time of 2N clock cycles for N data elements. In addition, the algorithm supports parallelisms computations that utilizes existing parallel recourses.

Results

N = Number of Elements M = Element binary bit size E = Order Hamming Representation of Binary Number 90nm TSMC Technology with 1V Power Supply Clock Write Cycle Time: CLKW = TD + TC + TW Clock Read Cycle Time :CLKR = TC + TR + TB + TS SRAM Based on 8T-Cell with (NxK)T

Parallel Counter with N states based on State Look-ahead Binary-to-Hamming Decoder with N input is One-Hot Decoder Serial Shift Buffer with N Registers Each Register has m-bit size Clock Period: 0.5GHz for N=K=1024 Elements and m=10-bits

Methods

The input bus is m-bit carrying N=2m binary data elements Memory stores data elements in Hamming representation of

size K=N for lossless type by converting Binary-to-Hamming For Example: input bus of size m=4-bits, we have N=16

elements and size of lossless Hamming K=16-bits: 01112 => 0000000001000000 01012 => 0000000000010000

Shift buffer stores data in binary column format m-bits Multiply transposed Hamming data (Memory) by Shift buffer

(Binary Column) Multiplication operator is a simple gated logic since each bit

in memory Enable-Disable associated binary data elements in binary column shift buffer

References

Figure 1: Sorting example considering a 4-bit data input bus.

0 1 0

0 0 1

1 0 0

Input Buffers

Hamming Maximum Order Matrix (E)Each element of size 1-bit

3

1

4

Transposed Hamming Maximum Order Matrix (ET)Each element of size 1-bit

Binary Matrix Each element of size 4-bit

Binary Matrix Each element of size 4-bit

Sorted Matrix Each element of size 4-bit

4-bit

0

0

00 0 1 0 2

0 0 1

1 0 0

0 1 0

0

0

00 0 1 0

3

1

4

2

4

3

1

2

30

11

22

43

Sequentially Binary Data Number

m=10-bit input bus provides N=1024 distinct elements

Binary-to-Hamm

ingConverter

K=1024-bit Bus

1024-bit Bus

Serial Shift Buffer is made of size N=1024 registersand each register of size 10-bit

1024-bit Bus

Sorted Shift Buffer is made of size K=1024 and each register of size 10-bit

E00 E01 E0N

E10 E11 E1N

EK0 EK1 EKN

Transposed Hamming Maximum Order Matrix K(1024-bit) X N(1024-bit)

Figure 3: Block diagram of our sorting algorithm's hardware data path

1. Enzo Mumolo, Gabriele Capello, and Massimiliano Nolich, VHDL Design of a Scalable VLSI Sorting Device Based on Pipelined Computation, Journal of Computing and Information Technology, Vol. 12, pp. 1-14, 2004.

2. A. A. Colavita, A Cicuttin, F. Fratnik, and G. Capello, SORTCHIP: A VLSI Implementation of a Hardware Algorithm for Continuous data Sorting, IEEE Journal of Solid-State Circuits, Vol. 38, No. , pp. 1076-1079, June 2003.

3. Li Xiao, Xiaodong Zhang, Stefan A. Kubricht, Improving Memory Performance of Sorting Algorithms, ACM Journal on Experimental Algorithmics, Vol. 5, 1-21, 2000.

4. L. M. Busse, M. H. Chehreghani, J. M. Buhmann, The Information Content in Sorting Algorithms, IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 2746-2750, 2012.

5. Saleh Abdel-Hafeez and Anas Matalkah, "CMOS Eight-Transistor Memory Cell for Low-Dynamic-Power High-speed Embedded SRAMS,” Journal of Circuits, Systems and Computers, Vol. 17, No. 5, pp. 845-863,Oct. 2008.

6. Saleh Abdel-Hafeez and Ann Gordon-Ross, "A Digital CMOS Parallel Counter Architecture Based on State Look-Ahead logic", Journal of IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 19, Issue 6, pp. 1023-1034, May 23, 2011.

1. Input: integer Element[0 : n – 1] 2. Output: integer Sorted[0 : n – 1] 3. Hamming memory: Boolean H[0 : n – 1][ 0 : n – 1] initialize to zero 4. while i < n-1 do 5. H[i][Element[i]-1] 1 6. endwhile 7. k 0 8. while j >= 0 do 9. while i < n-1 do10. then if H[i][j] = 111. then Sorted[k] Element[0: n-1] 12. k k+113. endif14. endwhile15. endwhile

Figure 2: Pseudo code for our sorting algorithm assuming a uniprocessor system with no threading