tigersharc clu exploration of xcorrs for take-home quiz 4 biawpqhi -- 13 april – start of class

TigerSHARC CLUExploration of XCORRS for Take-Home Quiz 4BIAWPQHI -- 13 April – start of class

M. Smith,

University of Calgary, Canada

smithmr@ucalgary.ca

Ideal -- Take Home Quiz

Develop tests for complex correlation Time and functionality

Evaluate on “C++” – in default and optimized mode

(especially optimized) Your optimized complex assembly code in

complex correlation in SID and SIMD modes XCORRS in complex correlation in SID and

SIMD modes

Reasonable -- Take Home QuizCode and report Develop Functionality and Time tests for real FIR -- based on Lab. 3

Use on optimized C++ and your SISD and SIMD FIR Develop Functionality and Time tests for real correlation -- based on Lab. 3 / 4

Use on optimized C++ and your SISD and SIMD correlation Work out (theory) speed changes expected on your SISD and SIMD if went to

complex. Use as template for expected changes in optimized C++ Develop Functionality and Time tests for complex FIR

Use on optimized C++ Develop Functionality and Time tests for complex correlation

Use on optimized C++ and your SISD and SIMD XCORRS only Report on whether changes in C++ code speed work the way you expect

Use these figures to scale for FIR and correlation to complex data Report on relative speeds

“C++” – in default and optimized mode (especially optimized) Your optimized complex assembly code in complex correlation in SID and SIMD

modes XCORRS in complex correlation in SID and SIMD modes

Mark assignment

My tests and C++ are available on the web If you use my tests, then you must say so, and

10% of marks are deducted If you use my C++ code, then you must say

so, and 10% of marks are deducted If you use my C++ code and my test, then you

must say so, and 20% of marks are deducted

Speed comparison – Part 1

Real FIRfloat / int values[ ], params[ ]

Loop:sum = sum + values * params

2 memory fetches1 add and 1 mult per loop cycle – done in ½ cycle in theory

Time N / 2 + overhead

Determine overhead by measuring with and without the loop-sum

Complex FIRCMPX float / int values[ ], params[ ]

Loop: many common factors with FFT – Hint for final?

sum = sum + values * params

Real sum = v.re * p.re – v.im * p.imImag sum = v.re * p.im + v.im * p.re

8 memory fetches 3 add / sub and 4 mult per loop

Time ??? + overhead

Speed comparison – Part 2

Speed in theory without doing anything special

Any special way to store complex values to speed up memory access?

Do we need to do 8 memory fetches On the Blackfin? In the TigerSHARC?

Expected optimal speed? Time ??? + overhead

Complex FIRCMPX float / int values[ ], params[ ]

Loop: many common factors with FFT – Hint for final?

sum = sum + values * params

Real sum = v.re * p.re – v.im * p.imImag sum = v.re * p.im + v.im * p.re

8 memory fetches 3 add / sub and 4 mult per loop

Time ??? + overhead

Speed comparison – Part 3?

Do these speed calculations scale the same way for complex correlation as for complex FIR?

Do a theory calculation and then compare result for debug and optimized C++ code to validate – within 25% of predicted changes is probably more than reasonable for a back-of-envelope calculation

Use scaling factor on your real FIR and correlation functions

Tests for following functions neededWhen convert from float to int?void ConvertReal2Complex(float *, CMPX32 *, int size) Make Complex = Real + j0

bool ConvertC32_2_C8(CMPX32 * , CMPX8 *, int size) Take bottom 8 bits of complex 32 Return false if overflows Complex 8 is padded 2 complex in to 32 bits --- int in format

bool ConvertC32_2_C1(CMPX32 * , CMPX1 *, int size) Take bottom 1 bits of complex 32 Return false if overflows, or if not +-1 +-j1 format Complex 1 is padded 16 complex in to 32 bits --- int in format

void ConvertC8_2_C32(CMPX8 * , CMPX32 *, int size) needed? YESumvoid ConvertC1_2_C32(CMPX1 * , CMPX32 *, int size) needed?

Tests for following functions needed

float RealFIR(float *vals, float *params, int size, bool overhead);

CMPLX ComplexFIR(CMPLX* vals, CMPLX params, int size, bool overhead);vals in dm and params in pm

void RealCorrs(float *vals, int size1, float *params, int size2, float *result, int *size3, bool overhead);

void ComplexCorrs(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead);

void XCORRS(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead, int version);

version is 0 – works, = 1 SISD, = 2 SIMD

Some hints

void XCORRS(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead, version) {

bool ConvertC32_2_C8(CMPX32 * , dm CMPX8 *, int size1)

bool ConvertC32_2_C1(CMPX32 * ,pm CMPX1 *, int size2)

size3 = size1 – size2

for result = 1 to size 3

result[ ] = 0;

if (!overhead) XCORRS(dm CMPX8 *, pm CMPX1 *, dm? Result, size1, size2, size 3, whichversion

Some Hints

void ComplexCorrs(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead) {

if (overhead) return;

*size3 = size1 – size 2;

for loop to size 3

result[loop] = ComplexFIR(vals, CMPLX params, int size, bool overhead);

val++;

end loop;

Some decisions

Complex 32 – first decision Store real in dm space and imaginary in pm space?

Complex8 in dm space, Complex1 in pm space Doing everything with static pm variables

Using dm variables on stack, in an attempt to avoid running out of memory

Try with satellite of size 2048 and PRN data of size 1024 but suspect may not have enough room when doing with Complex 32 so may have to test on smaller for comparison I ended up generating the same data as for the

xcorrs( ) shown last Friday – size 48 = 16 * 3. Decided that if I could handle that (3 times round xcorrs loop) then far enough test

Some Tests developed 1

TEST(ConvertReal2CMPLX32, D_TEST) {TEST_LEVEL(1);

#define TEST_SIZE 8float values[TEST_SIZE] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};float zeros[TEST_SIZE] = {0, 0, 0, 0, 0, 0, 0, 0};

ConvertReal2Complex(values, C32Real, C32Imag, TEST_SIZE);ARRAYS_EQUAL(values, C32Real, TEST_SIZE);ARRAYS_EQUAL(zeros, C32Imag, TEST_SIZE);

Test for padded data – C8 format

#define TEST_SIZE 8pm float imag1 [TEST_SIZE] = {0x04, 0x14, -0x8, -0x18, 0x24, 0x34, 0x44, 0x54};float real1[TEST_SIZE] = {0x08, 0x18, -1, -2, 0x28, 0x38, 0x48, 0x58 };

TEST(ConvertToCMPLX8, D_TEST) {TEST_LEVEL(1);

#define TEST_SIZE 8unsigned int result[4] = {0x14180408, 0xE8FEF8FF, 0x34382428, 0x54584448};CHECK(!ConvertC32_2_C8(real1, imag1, DATAC8, 1));CHECK(ConvertC32_2_C8(real1, imag1, DATAC8, TEST_SIZE));

ARRAYS_EQUAL(DATAC8, result, TEST_SIZE / 2);}

Test for padded data C1 format

#define LONGER_SIZE 32pm float imag2[LONGER_SIZE] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ……..float real2[LONGER_SIZE] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ………..pm float imag4[LONGER_SIZE];float real4[LONGER_SIZE];

TEST(ConvertCMPLX1, D_TEST) {TEST_LEVEL(1);unsigned int result1[2] = {0x00000000, 0x00000000};unsigned int result2[2] = {0xFFFFFFFF, 0xFFFFFFFF};CHECK(!ConvertC32_2_C1(real1, imag1, PRNC1, 1));CHECK(!ConvertC32_2_C1(real1, imag1, PRNC1, TEST_SIZE));CHECK(!ConvertC32_2_C1(real2, imag2, PRNC1, 1));CHECK(ConvertC32_2_C1(real2, imag2, PRNC1, LONGER_SIZE));ARRAYS_EQUAL(PRNC1, result1, LONGER_SIZE / 16);for (int i = 0; i < LONGER_SIZE; i++) {

real4[i] = -1 * real2[i];imag4[i] = -1 * imag2[i];

}CHECK(ConvertC32_2_C1(real4, imag4, PRNC1, LONGER_SIZE));ARRAYS_EQUAL(PRNC1, result2, LONGER_SIZE / 16);

RealFIR

#define TEST_SIZE 8pm float params[TEST_SIZE] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};

TEST(RealFIR, D_TEST) {TEST_LEVEL(1);float impulse[TEST_SIZE];float results[TEST_SIZE];

for (int i = 0; i < TEST_SIZE; i++) {for (int j = 0; j < TEST_SIZE; j++) // Set to zero

impulse[j] = 0;impulse[i] = 1;results[i] = RealFIR(impulse, params, TEST_SIZE, false);

}ARRAYS_EQUAL(results, params, TEST_SIZE);

Complex FIR tests (3 of them)To see if I got both Real and Imag correct

pm float resultsI[TEST_SIZE];TEST(ComplexFIR, D_TEST) {

TEST_LEVEL(1);float impulse[TEST_SIZE];float resultsR[TEST_SIZE];float zeros[TEST_SIZE] = {0, 0, 0, 0, 0, 0, 0, 0};for (int i = 0; i < TEST_SIZE; i++) {

for (int j = 0; j < TEST_SIZE; j++) // Set to zeroimpulse[j] = 0;

impulse[i] = 1;for (int j = 0; j < TEST_SIZE; j++) {

C32Real[j] = impulse[j]; C32Imag[j] = 0;C32Real1[j] = params[j]; C32Imag1[j] = 0;

ComplexFIR(C32Real, C32Imag, C32Real1, C32Imag1, &resultsR[i], &resultsI[i], TEST_SIZE, false);

}ARRAYS_EQUAL(resultsR, params, TEST_SIZE);ARRAYS_EQUAL(resultsI, zeros, TEST_SIZE);

Real Correlation

pm float PRN32I[TEST_SIZE] = {1, -1, 1, -1, 1, 0, 0, 0};TEST(RealCorrelation, D_TEST) {

TEST_LEVEL(1);float data[TEST_SIZE * 2] = {0, 0, 0, 0, 1, -1, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0 };

float result[TEST_SIZE];int Iresult[TEST_SIZE];int size3; RealCorrs(data, 2 * TEST_SIZE, PRN32I, TEST_SIZE, result,

&size3, false);CHECK(size3 == TEST_SIZE);for (int j= 0; j < TEST_SIZE; j++)

Iresult[j] = result[j];CHECK(MaximumLocation(Iresult, TEST_SIZE) == 4);

Complex Correlation -- Simple Test

pm float dataI[TEST_SIZE * 2] = {0, 0, 0, 0, 1.0, -1, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0};pm float resI[TEST_SIZE];

TEST(ComplexCorrelation, D_TEST) {TEST_LEVEL(1)float dataR[TEST_SIZE * 2] = {0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };float resR[TEST_SIZE];int Iresult[TEST_SIZE];float parR[TEST_SIZE] = {0, 0, 0, 0, 0, 0, 0, 0 };int size3;

ComplexCorrs(dataR, dataI, TEST_SIZE * 2, parR, PRN32I, TEST_SIZE, resR, resI, &size3, false);

CHECK(size3 == TEST_SIZE);for (int j= 0; j < TEST_SIZE; j++) {

Iresult[j] = abs(resR[j]);}CHECK(MaximumLocation(Iresult, TEST_SIZE) == 4);

Complex Correlation– related to results from last lecture

for (int i = 0; i < 96; i += 3) {satXCORRSR[i] = -1; satXCORRSR[i+1] = 1; satXCORRSR[i+2] = 1;satXCORRSI[i] = 0; satXCORRSI[i+1] = 0; satXCORRSI[i+2] = 0;

}for (int i = 0; i < 48; i += 3) {

prnXCORRSR[i] = -1; prnXCORRSR[i+1] = 1;prnXCORRSR[i+2] = 1; prnXCORRSI[i] = -1;prnXCORRSI[i+1] = 1; prnXCORRSI[i+2] = 1;

}ComplexCorrs(satXCORRSR, satXCORRSI, 96, prnXCORRSR, prnXCORRSI,

48, resXCORRSR, resXCORRSI, &size3, false);

CHECK(size3 == 48);for (int j= 0; j < 48; j++) { Iresult[j] = abs(resXCORRSR[j]); }for (int j = 1; j < 45; j += 3) {

CHECK(resXCORRSR[j-1] == 48);CHECK(resXCORRSR[j] == -16);CHECK(resXCORRSR[j+1] == -16);CHECK(MaximumLocation(Iresult + j, 48 - j) == 2);

Complex Correlation ASM– related to results from last lecture

for (int i = 0; i < 96; i += 3) {satXCORRSR[i] = -1; satXCORRSR[i+1] = 1; satXCORRSR[i+2] = 1;satXCORRSI[i] = 0; satXCORRSI[i+1] = 0; satXCORRSI[i+2] = 0;

}for (int i = 0; i < 48; i += 3) {

prnXCORRSR[i] = -1; prnXCORRSR[i+1] = 1;prnXCORRSR[i+2] = 1; prnXCORRSI[i] = -1;prnXCORRSI[i+1] = 1; prnXCORRSI[i+2] = 1;

} ComplexCorrsASM(satXCORRSR, satXCORRSI, 96, prnXCORRSR, prnXCORRSI, 48, resXCORRSR, resXCORRSI, &size3, false);

CHECK(size3 == 48);for (int j= 0; j < 48; j++) { Iresult[j] = abs(resXCORRSR[j]); }for (int j = 1; j < 45; j += 3) {

CHECK(resXCORRSR[j-1] == 48);CHECK(resXCORRSR[j] == -16);CHECK(resXCORRSR[j+1] == -16);CHECK(MaximumLocation(Iresult + j, 48 - j) == 2);

bool ConvertC32_2_C8(float *inR, pm float *inI, unsigned int *C8, int size) { float *holdR = inR; pm float *holdI = inI; for (int i = 0; i < size; i++) { if ((*inR > 127) || (*inR < -128)) return false;

if ((*inI > 127) || (*inI < -128)) return false; inR++; inI++;

}// Not going to bother with things that don't fit

if (size & 1) return false;

inR = holdR; inI = holdI; for (int half = 0; half < size; half +=2) { unsigned int first = ( (int) *inR++) & 0xFF; unsigned int second = ( (int) *inI++) & 0xFF; unsigned int third = ( (int) *inR++) & 0xFF; unsigned int fourth = ( (int) *inI++) & 0xFF; *C8++ = ((((((fourth << 8) + third) << 8) + second) << 8) + first) ; } return true;}

float UINT8ToFloat(unsigned int value) {if (value & 0x80) { value = value | 0xFFFFFF00;

return ( (int) value);}else return value;

void ConvertC8_2_C32(unsigned int *C8, float *inR, pm float *inI, int size) { for (int i = 0; i < size; i +=2) { unsigned int value = *C8++; *inR++ = UINT8ToFloat(value & 0xFF); value >>= 8; *inI++ = UINT8ToFloat(value & 0xFF); value >>= 8; *inR++ = UINT8ToFloat(value & 0xFF); value >>= 8; *inI++ = UINT8ToFloat(value & 0xFF); }}

C8 C32 and C16 C32

FIR filtersfloat RealFIR(float *values, pm float *params, int size, bool overhead) {

if (overhead) return 0.0;float sum = 0;for (int i = 0; i < size; i++) sum += *values++ * *params++;return sum;

pm float sumI = 0;void ComplexFIR(float *valR, pm float *valI, float *parR, pm float *parI,

float *resultR, pm float* resultI, int size, bool overhead) {

if (overhead) { *resultR = *resultI = 0; return;}float sumR = 0; sumI = 0; // Was a static

variable for (int i = 0; i < size; i++) {

sumR += *valR * *parR - *valI * *parI;sumI += *valR * *parI + *valI * *parR;valR++; valI++; parR++; parI++;

}*resultR = sumR;*resultI = sumI;return;

Correlation

void RealCorrs(float *vals, int size1, pm float *params, int size2, float *result, int *size3, bool overhead) {

if (overhead) return;*size3 = size1 - size2;for (int j = 0; j < size2; j++)

*result++ = RealFIR(vals++, params, size2, overhead);}

void ComplexCorrs(float* valR, pm float* valI, int size1, float* parR, pm float* parI, int size2, float* resR, pm float* resI, int *size3, bool overhead) { if (overhead) return;

*size3 = size1 - size2;

for (int j = 0; j < size2; j++) ComplexFIR(valR++, valI++, parR, parI, &resR[j], &resI[j], size2, false);

Correlation XCORRS

extern "C" void xcorrsfunc(unsigned int *C8, pm unsigned int *C1, unsigned int *C16, int size);

void ComplexXCORRS(float* valR, pm float* valI, int size1, float* parR, pm float* parI, int size2, float* resR, pm float* resI, int *size3, bool overhead) {

ConvertC32_2_C8(valR, valI, DATAC8, size1);*PRNC1 = 0x0; // Need to shift hte PPRN to location C15ConvertC32_2_C1(parR, parI, PRNC1 + 1, size2); *size3 = size1 - size2;if (!overhead) xcorrsfunc(DATAC8, PRNC1, RESULTC16, *size3);ConvertC16_2_C32(RESULTC16, resR, resI, *size3);

XCORRS – same code as beforeexcept – need to transfer results out

// Shift out the values in TR registers into resultsxR3:0 = TR3:0;;Q[J6 += 4] = xR3:0;;xR3:0 = TR7:4;;Q[J6 += 4] = xR3:0;;xR3:0 = TR11:8;;Q[J6 += 4] = xR3:0;;xR3:0 = TR15:12;;Q[J6 += 4] = xR3:0;;IF NLC0E, JUMP OUTERLOOP;;

Need to get inpars and go round more than 16 times

J0 = zeros;; // Clear the THR registers the hard wayR3:0 = Q[J0 += 4];; THR3:0 = R3:0;; R7:4 = R3:0;;// K0 = prn;;

J2 = J4;; // satellite_data;;

LC0 = 3;;OUTERLOOP:

K0 = J5;;J2 = J4;;J4 = J4 + 8; // Increment by 8 and not 16

REST OF CODE UNCHANGED// Load THR with PRN codeR1:0 = L[K0 += 2];; THR1:0 = R1:0;;R1:0 = L[K0 += 2];; THR3:2 = R1:0;;

Test results

tigersharc clu exploration of xcorrs for take-home quiz 4 biawpqhi -- 13 april – start of class

complex values

complex fir use

sum values

optimized c code

complex correlationtime

complex correlationuse

complex datareport

complex assembly code

Documents

data sheet final - adsp-ts201 tigersharc embedded ... · pdf...

tigersharc clu closer look at the xcorrs

clu paper presentation mv

a first attempt at learning about optimizing the tigersharc...

adsp-ts201 tigersharc processor hardware reference · a...

list of clu cases: rejected/ returned ... clu cases/clu...26...

boot issues processor comparison tigersharc multi-processor

clu egap b

generating “rectify( )” test driven development approach...

detailed list of change of land use (clu) permissions...

understanding the tigersharc alu pipeline

clu magazine - winter 2009

clu magazine - december 2014

adsp-ts101 tigersharc processor programming … analog...

clu string

tigersharc and blackfin different applications. introduction...

tigersharc clu exploration of xcorrs for take-home quiz 4...

clu jeopardy setup2

tigersharc hardware reference

tigersharc processor and evaluation board different...