Software and Hardware Circular Buffer Operations
First presented in ENCM515 2005. There are 3 earlier lectures that are useful for midterm review.
M. R. Smith, ECEUniversity of Calgary
Canada
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
2
Tackled today
Circular Buffer Issues DCRemoval( ) FIR( )
Coding a software circular buffer in C++ and TigerSHARC assembly code
Coding a hardware circular buffer Where to next?
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
3
DCRemoval( )
Not as complex as FIR, but many of the same requirements Does an “implied” multiplication by a FIR coefficient of 1 and then does the sum.
Easier to handle You use same ideas in optimizing FIR over Labs 2 and 3 Two issues – speed and accuracy. Develop suitable tests for CPP code and
check that various assembly language versions satisfy the same tests
Memory Intensive
Addition intensive
Loops formain code
FIFO implementedas circularbuffer“Memory Shuffle approach”
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
4
Set up timeIn principle 1 cycle / instruction
2 + 4 instructions
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
5
First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log2N)
4 instructions
N * 5 instructions
1 + 2 * log2NNo J parallel shifterDo it M68000 way
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
6
Third key element – FIFO circular buffer-- Order (N) 6
3
6 * N
2
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
7
Next stage in improving code speedSoftware circular buffersSet up pointers to buffers
Insert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return
244 + N * 51 Was 1 + 2 * log2N63 + 6 * N2---------------------------23 + 11 N Was 22 + 11 N + 2
log2N
N = 128 – instructions = 1430
1430 + 300 delay cycles = 1730 cycles
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
8
DCRemoval( )
If there are N points in the circular buffer, then this approach of moving the data from memory to memory location requires N Memory read / N Memory write (possible data bus conflicts) 2N memory address calculations
FIFO implementedas circularbuffer
Uses memory shuffle approach from Lab. 1
NOTE: This approach can sometimes be the “fastest”(see later Labs.)
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
9
Alternative approach Move pointers rather than memory values In principle – 1 memory read, 1 memory
write, pointer addition, conditional equate
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
10
Note: Software circular buffer is NOT necessarily more efficient than data moves Watch out Circular buffers can be implemented with the newest element
placed “last” in the FIFO buffer, or with newest element placed “first” in the FIFO buffer
SHARC (2002, 2003, 2004) – used “first approach” TigerSHARC – used “first approach” and failed max. optimization
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
11
Note: Software circular buffer is NOT necessarily more efficient than data moves Now spending more time on moving / checking the software
circular buffer pointers than moving the data?
SLOWER
FASTER
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
12
On TigerSHARC
Since we can have multiply instructions on one line, then “perhaps” if we can avoid pipeline delays then software circular buffer is faster than memory moves
Pipeline delay
XR4 = R4 + R5;;
XR4 = R4 + R6;;
Second instruction needs result of first
No Pipeline delay
XR4 = R4 + R5;;
XR3 = R4 + R6;;
Second instruction DOES NOT need result of first
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
13
Generate the tests for the software circular buffer routine
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
14
New static pointers needed in Software circular buffer code
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
15
New sets of register definesNow using many of TigerSHARC registers
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
16
Code for storing new value into FIFO requires knowledge of “next-empty” location First you must get the address of where the static variable –
saved_next_pointer Second you must access that address to get the actual
pointer Third you must use the pointer value Will be problem in labs and exams with static variables stored
in memory
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
17
Adjustment of software circular buffer pointer must be done carefully
Get and update pointer
Check the pointer
Save corrected pointer
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
18
Next stage in improving code speedSoftware circular buffersSet up pointers to buffers
Insert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return
28 Was 44 + N * 51 Was 1 + 2 * log2N614 Was 3 + 6 * N2---------------------------37 + 5 N Was 23 + 11 N
N = 128 – instructions = 677 cycles677 + 360 delay cycles = 1011 cycles
Was1430 + 300 delay cycles = 1730 cycles
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
19
Next step – Hardware circular buffer Do exactly the same pointer calculations as with software circular
buffers, but now the calculations are done behind the scenes – high speed – using specialized pointer features
Only available with J0, J1, J2 and J3 registers (On older ADSP-21061 – all pointer registers)
Jx -- The pointer register JBx – The BASE register – set to start of the FIFO array JLx – The length register – set to length of the FIFO array VERY BIG WARNING? – Reset to zero. On older ADSP-21061 it
was very important that the length register be reset to zero, otherwise all the other functions using this register would suddenly start using circular buffer by mistake. Still advisable – but need special syntax for causing circular buffer
operations to occur
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
20
Setting up the circular buffer functionsRemember all the tests to start with
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
21
Store values into hardware FIFO CB instruction ONLY works on POST-MODIFY
operations CB [J1 += J2] not CB [J1 + J2]
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
22
Now perform Math operation using circular buffer operation MUST NOT DO XR2 = CB [J0 + i_J8]; Save N cycles as no longer need to increment index
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
23
Update the static variablesFurther special CB instructions
A few cycles saved here
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
24
Next stage in improving code speedHardware circular buffersSet up pointers to buffers
Insert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return
28 Was 43 + N * 4 Was 4 + N * 51 Was 1 + 2 * log2N614 Was 3 + 6 * N2---------------------------37 + 4 N Was 23 + 5 N
N = 128 – instructions = 549 cycles
549 + 300 delay cycle = 879 cyclesDelays are now >50% of useful time
Was 677 + 360 delay cycles = 1011 cycle
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
25
Tackle the summation part of FIR Exercise in using CB (Lab 2)
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
26
Place assembly code here
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
27
The code is too slow because we are not taking advantage of the available resources Bring in up to 128 bits (4
instructions) per cycle Ability to bring in 4 32-bit values
along J data bus (data1) and 4 along K bus (data2)
Perform address calculations in J and K ALU – single cycle hardware circular buffers
Perform math operations on both X and Y compute blocks
Background DMA activity Off-load some of the processing to
the second processor
04/18/23 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
28
Tackled today
Have moved the DCremoval( ) over to the X Compute block
Circular Buffer Issues DCRemoval( ) FIR( )
Coding a software circular buffer in C++ and TigerSHARC assembly code
Coding a hardware circular buffer Where to next?