excellentway of looking at fir optimization as a function...
TRANSCRIPT
![Page 1: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/1.jpg)
EXCELlent way of looking at FIR optimization as a function of
processor architectureAssignment 3
Knowledge expected by midterm
![Page 2: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/2.jpg)
Start with basic FIR filter
float FIR_Filter(float newValue, float *FIFO, float *coeffs, int numTaps)R4 R8 R12 ?
Course exams– I WILL PROBALY say – pretend numTaps comes in R16
How to handle in real life – write in C++ first and see what compiler does to handle this situation – then copy thatg
![Page 3: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/3.jpg)
Careful – Compiler treats these situation differently as “it knows more in second case”float FIR_Filter_1(float newValue, float *FIFO, float *coeffs, int numTaps){
}
And Extern volatile float FIFO[ ];Extern volatile float coeffs[ ]; float FIR_Filter_2(float newValue, int numTaps){
}
![Page 4: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/4.jpg)
And these differently – and perhaps differently between debug and release modesExtern volatile float FIFO[ ]; Extern volatile float coeffs[ ]; #define numTaps 120float FIR_Filter_2(float newValue) {}
Extern volatile float FIFO[ ]; Extern volatile float coeffs[ ]; Volatile int numTaps = 120;float FIR_Filter_2(float newValue) {
}
Extern volatile float FIFO[ ]; Extern volatile float coeffs[ ]; int numTaps = 120;float FIR_Filter_2(float newValue) {
}
![Page 5: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/5.jpg)
Standard FIR filter from Lab 1
float FIR_Filter(float newValue, float *FIFO, float *coeffs, int numTaps) {For (int count = 1; count < numTaps, count++) FIFO[count – 1] = FIFO[count];
float *FIFOpt = FIFO + numTaps – 1; // DOes C do pointer arithmetic?*FIFOpt = newValue;
sum = 0.0;for (int count = 0; count < numTaps, count++)
sum = sum + *FIFOpt‐‐ * *coeffs++;
return sum
![Page 6: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/6.jpg)
Assume processor architecture is von‐Neumann and can’t do data fetch, add or multiplication in same cycle
![Page 7: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/7.jpg)
Now – increase cycle time by 25% to do pt++ in same cycle as fetch – STEP 1
![Page 8: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/8.jpg)
Now – increase cycle time by 25% to do pt++ in same cycle as fetch – STEP 1 ‐‐ Change pipeline to allow 1 Math op to occur during next fetch – STEP 2
![Page 9: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/9.jpg)
UNROLL LOOP TO OPEN UP OTHER POSSIBLE PARALLEL INSTRUCTIONSTOTALLY MEMORY / DAG 1 RESOURCE LIMITEDNEED TO CHANGE PROCESSOR ARCHITECTURE
Instead of 1 cycle mult + 1 cycle addUse 2 cycle (pipeline MACC instruction)Multiply / Accumulate
![Page 10: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/10.jpg)
Does 1 or 2 cycle MACC improve performance
• FETCH MULT INSTRUCTION• DO MULT ‐‐ FETCH ADD INSTRUCTION• DO ADD
• Compared to 2 cycle MACC• FETCH MACC INSTRUCTION• DO MULT• DO ADD
![Page 11: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/11.jpg)
Assume Harvard – Architecture with floating point MACC (SHARC)
![Page 12: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/12.jpg)
Harvard processor without the MACColour each resource for an instruction
![Page 13: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/13.jpg)
Take advantage (carefully) of parallel DM and PM operations to fetch instructions earlier
![Page 14: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/14.jpg)
In principle 4 cycles faster for twice round the loop – but data dependencies conflict
![Page 15: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/15.jpg)
You complete the analysis with separate Add and Mult instruction
![Page 16: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/16.jpg)
Show the advantages of using a 2 cycle MACC instruction. Is 1 cycle MACC offer any further advantage?
![Page 17: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/17.jpg)
Move over to Super Harvard architecture with instruction cache in use always. Start using PM bus for data ops
![Page 18: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/18.jpg)
• DON’T LOOK AT NEXT SLIDE UNTIL YOU HAVE TACKLED LAST SLIDE
![Page 19: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/19.jpg)
Loop of size 10 for twice around loopKey resource – FETCH INSTR 8 / 10
![Page 20: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/20.jpg)
Using cache ONLY when instr / data conflict on pm bus means can have smaller (cheaper) cache
![Page 21: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/21.jpg)
Get more speed by UNROLLING THE LOOP 3 times and then thinking
![Page 22: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/22.jpg)
Re‐Roll the loop and execute N‐2 times
![Page 23: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/23.jpg)
Next step – MOVE TO VLIW instruction setWHERE INSTR ALLOWS MATH‐OP, dm and pm fetch at the same time
DOES NOT HAVE TO WAIT
![Page 24: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/24.jpg)
Next step – MOVE TO V‐VLIW instruction setWHERE INSTR ALLOWS + and *, dm and pm fetch at the same time
DOES NOT HAVE TO WAIT
IF USE V‐VLIW INSTR* + dm pm
then loop is 1 cycle
![Page 25: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/25.jpg)
FIR loop look like this
• FETCH DATA1• FETCH DATA2, DO MULT OF DATA1• FETCH DATA3, DO MULT OF DATA2, ADD OF DATA1• FETCH DATA4, DO MULT OF DATA3, ADD OF DATA2• DO MULT OF DATA4, ADD OF DATA3• ADD OF DATA4
![Page 26: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function](https://reader033.vdocument.in/reader033/viewer/2022051921/600dbc32e7dd691b72101e44/html5/thumbnails/26.jpg)
Lab 2
• Programming VLIW assembly code (single cycle FIR hardware loop)
• Does C++ automatically switch to this mode in release mode if we pass dm and pm memory array pointers
• If not – how do we make C++ switch to this mode