s3.kth.se dsp lecture 30/3-2010 per zetterberg. agenda general. starting ccs comparing matlab and...
TRANSCRIPT
![Page 1: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/1.jpg)
s3.k
th.s
eDSP Lecture 30/3-2010
Per Zetterberg
![Page 2: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/2.jpg)
Agenda
• General.• Starting CCS• Comparing matlab and DSP results. • Profiling when comparing matlab and DSP results. • Matlab<->DSP communication.• EDMA• QUAD_DAC_ADC (headphones). • _empty• State-machine using case statement.• Data formats.• Overlap and add.• Stack and heap.• Simple optimization rules.• Cache• Some advices.
![Page 3: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/3.jpg)
DSP Programming
Setup in the project course:
PC or ”host”
DSP or “target” (or
DSK)
![Page 4: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/4.jpg)
What is a DSP ?
A CPU which is optimized for signal processing:
• Special instructions for common signal processing operations, e.g. multiply and accumulate.
• Often on-chip circuits that handle input/output (IO).
• Low power consumption.• Cheap (compared to processors in e.g.
desktop computers).
n
nn yx
![Page 5: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/5.jpg)
Project Prototype: DSP versus PC
•Concurrently running programs at both the DSP and the PC.
•DSP-card used for:
•Signal processing
•IO (sampling/playback)
•PC used for:
•Graphical User Interface (GUI)
•Controlling the application, receiving results.
![Page 6: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/6.jpg)
The DSP in the project course
You will use a Texas Instruments C6713 floating point digital signal processor.
• Massively parallel architecture (VLIW) - up to eight 32 bit instructions are executed simultaneously.
• Running at 225 MHz, giving 1.2 GFlops peak performance.
Belongs to the TI C6x family of DSPs
Widely used in industry
![Page 7: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/7.jpg)
Software pipelining
• The processor can be programmed to perform eight operations in paralell (e.g. MULT, ADD, MV)
• Every instruction has a certain latency.• The compiler will pipeline code i.e. perform several
instructions in parallell in loops if:– There are no function calls in the loop.– Optimization –o3 is selected.– ..
• Check that important loops are pipelined.
![Page 8: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/8.jpg)
Technical Requirements of Prototype
• Real-time functionality• DSP-card: signal processing, • PC: user interface• User interface through a GUI (windows style)
implemented in matlab.• No unnecessary use of processor time on the PC• Well structured and adequately commented
source code
For more details see
www.s3.kth.se/signal/edu/projekt/examination.shtml
![Page 9: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/9.jpg)
Development Tools
• Matlab– Algorithm development.– Prototype verification.– User interface development (GUI)– Control of DSP card– Control of code profiling.
• DSP: Code Composer Studio– Algorithm implementation in C/Assembler– Debugging in conjunction with Matlab implementation– Code profiling.
![Page 10: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/10.jpg)
How to learn …
How to Quickly Learn DSP Programming :
http://www.s3.kth.se/signal/edu/projekt/DSPsupport/getting_started.shtml
Our web-pages:
http://www.s3.kth.se/signal/edu/projekt/DSPsupport/
Ask me:
Search on the net, newsgroups, ….
![Page 11: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/11.jpg)
PC programming (GUI)
Two methods:
• Using a GUIDE (a GUI for creating a GUI )• Programmatically.
![Page 12: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/12.jpg)
• CCStudio v3.3 is the code development environment.
• Use Setup CCStudion v3.3 when you need to change between targets.– C6713 DSK-USB– C6713 Device Cycle Accurate Simulator (little endian)– C6416 Device Cycle Accurate Simulator (little endian)
• Connnect to matlab– cc=ccsdsp; – cc.visible(0), cc.run, cc.isrunning.
Starting CCS
The hardware
When doing tutorial
![Page 13: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/13.jpg)
Comparing matlab and DSP result
Principle to test isolated functions e.g. a decoder:
• Generate input in matlab.• Write input to the DSP.• Call DSP version of function.• Read output from the DSP.• Call matlab version of function.• Compare results.
Let’s have a look at the compare_with_matlab_31 skeleton!
![Page 14: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/14.jpg)
Test important functions by
• Copy the entire compare_with_matlab_31.pjt project.
• Replace FuncionToBeTested with your code:– In the C-code.– In the matlab code.
• Define input and output data¶meters as relevant for your function.
• Change the matlab code to generate relevant input data.
• Sometimes called ”test harness” in industry.
![Page 15: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/15.jpg)
Sending data between matlab and DSP when the DSP is not running:
Input_obj=createobj(cc,’Input’); % Input is a global % in the DSP
code.
write(Input_obj,Input); % write data
Input=read(Input_obj); % read data
Matlab <-> DSP communication 1(2)
.
matlab code
![Page 16: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/16.jpg)
DSP -> PC communication 2(3)
When the DSP is running (RTDX):
On the DSP side:
RTDX_write(&ctrl_chan_dsp2pc, &data_to_matlab, sizeof(float)*NO_FLOATS_TO_MATLAB );
On the matlab side:
data_from_DSP=readmsg(cc.rtdx,'ctrl_chan_dsp2pc', 'single')
Recommendation: Re-use code in the ”_empty” skeletons.
![Page 17: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/17.jpg)
Matlab <-> DSP communication 3(3)
The PC<->DSP interface is slow
Allowed cheating (if necessary):
Pre-read data into memory before real-time processing.
Read result from memory, after real-time processing.
Large memory areas available in external memory:#pragma DATA_SECTION(Data,".external_mem") // On DSP
short Data[1000]; // On DSP
write(cc,h_Data.address(1), int16(Data)); %% In matlab
The data is not cleared when the program is reloaded.
![Page 18: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/18.jpg)
Enhanced Direct Memory Access (EDMA)
TX buffer
RX buffer
DXR
McBSP
DRRADC
DAC
EDMAchannel
EDMAchannel
Memory
Triggers interruptHWI_INT8 when ready.
Leaves DSP free from moving data back and forth to ADC/DAC!
![Page 19: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/19.jpg)
EDMA PaRAM
![Page 20: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/20.jpg)
Ping-Pong Buffering
hEdmaReloadXmtPing hEdmaReloadXmtPong
SRC=&gBufferXmtPing SRC=&gBufferXmtPong
LINK= hEdmaReloadXmtPong
LINK= hEdmaReloadXmtPing
DST=DXR DST=DXR
Let me show you EDMA_RTDX_GPIO_empty and QUAD_DAC_ADC_empty!
![Page 21: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/21.jpg)
Skeleton programs handling EDMA+RTDX
”Single-antenna”EDMA_RTDX_GPIO_31_empty
EDMA_RTDX_GPIO_31.
”Dual-antenna”
QUAD_ADC_DAC_31_empty
QUAD_ADC_DAC_31.
Code development
Matlab prototype
Code development
Matlab prototype
![Page 22: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/22.jpg)
QUAD_DAC_ADC_31
Let’s go through QUAD_DAC_ADC_31_empty
Then go through QUAD_DAC_ADC_31
This is the DSP<->matlab interface to be used in the matlab prototype!!
Note: Documentation in “main.c”!
![Page 23: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/23.jpg)
State Machine using Case Statement in appl_Process
int State=START; // The state values may be integers defined using #define in header file.
void appl_Process(short *receive_buffer,short *transmit_buffer) {{
switch(State) {case START:
res1=func1(arg1,arg2); res2=func2(signal1);
State=WAITIING;break;
case WAITING:sync=func3(arg1,res2);if (sync) {State=RUNNING; transmit(signal2)}
else {State=WAITING; transmit(signal1)};break;
case RUNNING:.....
} }
![Page 24: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/24.jpg)
Data formats
• C-types: char=8bits, short=16bits, int=32bits, float 32bits.
• Integers are signed or unsigned.• Float. Sign=1bit, exponent=8bits, fraction 23 bits.• In C, conversion is automatic (when pointers are not
involved…). • However, note the range …..
![Page 25: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/25.jpg)
The buffers in QUAD_DAC_ADC …
appl_Process(short *receive_buffer,short *transmit_buffer)
• The buffers consists of BUFFSIZE shorts (range [-2^15,2^15-1]).• BUFFSIZE is defined in EDMA_RTDX_GPIO.h to be 1024.• The number of bytes is 2*BUFFSIZE=2048.• In EDMA_RTDX_GPIO there are 4 channels (i.e. ADC and DAC
converters) which are interleaved.• Thus the number of 4-dimensional vector samples is
BUFFSIZE/2=256.• BUFFSIZE can be changed.
![Page 26: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/26.jpg)
Overlap and add
Say we want to do implement a FIR filter.• The input buffer is 128 samples.• The filter is 10 samples.• The filtered signal is 128+10-1=137 samples.• But the output filter is 128 samples ….• Solution: overlap and add.• Variant 1: Save the last 9 samples. Add them to
the next buffer.• Variant 2: Overlap-and-add. See next slide.
![Page 27: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/27.jpg)
Overlap and Add: With additional buffer
128 samples 128 samples 9
128 samples 128 samples 9
Zero these samples
Add the new signal
Move 128+9 samples
Good if transmit signal is 128 samples and unsynchronized!
![Page 28: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/28.jpg)
Stack and Heap
float myfunction(short *buffer)
{
float internal_buffer[1000];
…
This data is stored in the stack. At least
4000 bytes needed.
The stack size is set in ”build options”. No warning is given by the compiler of the stack size is to small!!!
float *internal_buffer;
internal_buffer = (float *) malloc(1000*sizeof(float));
…
Allocated in heap
The heap size is also set in ”build options”. Also no warning!!!
![Page 29: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/29.jpg)
Code Optimization
Let me show you optimization_example .
![Page 30: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/30.jpg)
Simple Optimization Rules 1(2)
• Turn optimization on. Flags ”-o3”, program mode compilation ”–pm” and ”-op3” if possible.
• Turn debug off i.e do not use ”-g”.• Avoid function calls inside loops!• Use of division ”/” is a function call!, use _rcpsp
instead. Other intrinsics see table 8-6 in spru187n.• Avoid math-functions such as ”sin(x)” use look-up
tables instead.• Check that all important loops are pipelined by
searching for "SOFTWARE PIPELINE INFORMATION“ in generated “.asm” files.
![Page 31: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/31.jpg)
Simple Optimization Rules 2(2)
• Allocate all time-critical code and data in internal memory (in our skeletons this is default allocating to external memory requires #pragma statement).
• Use the touch function in an initialization routine to have the most important data structure cached in internal memory. (This function can be copied from the cache_miss_example skeleton)
float ImportantData[100];…. touch(ImportantData,100);
![Page 32: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/32.jpg)
TMS320C6713 cache
CPU core
L1P. (Program cache) 4kB
L1D. (Data cache)
4kB
Memory
256kb Internal
16Mb External
![Page 33: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/33.jpg)
One-way cache (L1P)
Line 0
Line 1
Line 127
Mem 0x-0x1F
Mem 0x20-0x3F
Mem 0x0FE0-0x0FFF
Mem 0x1000-0x101F
Mem 0x1020-0x103F
Mem 0x1FE0-0x1FFF
Cache
SDRAM
![Page 34: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/34.jpg)
Two-way cache (L1D)
Line 0A
Line 1A
Line 63A
Line 0B
Line 1B
Line 63B
Mem 0x-0x1F
Mem 0x20-0x3F
Mem 0x7E0-0x7FF
Mem 0x800-0x81F
Mem 0x820-0x83F
Mem 0x0FE0-0x0FFF
![Page 35: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/35.jpg)
L1D cache
Tag Set index Offset
045101131L1D address allocation:
•A new line of 32bytes is loaded on a read-miss with a penalty 4 clock-cycles.
•If two words are loaded per clock-cycle (reading sequentially from a memory segment) the overhead is 8/32*4=1clock-cykle per instruction cycle.
•A write-miss doesn’t lead to a loading of a new-line. A write buffer of four words handle up to four misses without penalty.
![Page 36: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/36.jpg)
main.c: Illustrates impact of L1D write and read misses (compulsory misses).
main2.c: Illustrates the problem with several data objects in the same set (thrashing)
Two data objects are in the same set if:Aa = K*2048+ Ab,
for some address Aa and Ab in Object A or B respectively, and for some K.
Two code objects are in the same set if:Aa = K*4096+ Ab,
for some address Aa and Ab in Object A or B respectively, and for some K.
cache_miss_example
![Page 37: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/37.jpg)
What to consider when programming to make good use of the cache
• Align all data buffers on 32byte boundaries. (#pragma DATA_ALIGN).
• Avoid to allocate more than two objects that map to the same set in the same algorithm.
• Avoid having two or more computationally complex algorithms that map to the same set.
• Profile the algorithms with and without cached data and program (see cache_miss_example).
• Force caching of important data and code before starting the realtime program starts (e.g in appl_Init()) by reading the data (touch) and calling the functions.
• Test processing data in smaller buffers to see if performance improves.
![Page 38: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/38.jpg)
Some advices 1(2)
• Start with a skeleton.• Only insert functions which have been checked
against matlab.• Make one change at a time => much easier to find
out what went wrong.• Save ”before” and ”after” code.• Don’t use printf.
![Page 39: S3.kth.se DSP Lecture 30/3-2010 Per Zetterberg. Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP](https://reader036.vdocument.in/reader036/viewer/2022062314/56649ddd5503460f94ad5d1b/html5/thumbnails/39.jpg)
Some advices 2(2)
• Check that all pointers are initialized.• If a variable are corrupted, check .map file to se
how it could be over-written.• Use extern declaration both in the file where
variable is declared and where it is used.• In real-time debugging. Store results to ”debug-
globals”.• When using sqrt, log, log10 use ”#include
<math.h>”.