Download - CtoS Coding Tips
-
#
Jan 31, 2011
Coding Tips for High Quality of Results
Module 6
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-2
Module Objective
Your objective:
To code your design for optimal Quality of Results
Topics:
Hardcoding compiler optimizations
Controlling expression size and dynamics
Facilitating scheduler optimizations
Current C-to-Silicon known problems and solutions
Miscellaneous issues
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-3
General Compiler Optimizations
Compilers automatically do some of these optimizations:
Move loop-invariant code out of loop statements
Reduce operation strength
You can guarantee these optimizations by coding them yourself!
The tool may
perform such
optimizations
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-4
Move Invariant Expressions out of the Loop
Move loop-invariant calculations to before or after the loop.
Schedules unnecessary operations.
for (i=0; i b ? a : b; c[i] = max * b;
}
max = a > b ? a : b; for (i=0; i
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-5
Reduce Operation Strength
Convert multiplication and division operations to shift operations to extent practical.
Synthesis infers at least 6-bit ops. Operation strength reduced.
Reduce strength
a = b * 48; c = b / 48;
a = (b > 4) / 3;
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-6
Controlling Expression Size and Dynamics
Synthesis tools automatically do some of these optimizations:
Explicitly specify constant expressions
Explicitly size state variables
Explicitly size expressions
Control variable dynamics
Pad array inner dimensions to powers of 2
You can guarantee these optimizations by coding them yourself!
The tool may
perform such
optimizations
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-7
Explicitly Specify Constants
Synthesis cannot always statically determine your design intent.
Explicitly declare constants to clarify your design intent
This code infers a barrel shifter. This code infers a constant shift (wires).
sc_int a, b;
sc_int c;
c = 10; ...
a = b >> c;
sc_int a, b;
const sc_int c = 10; ...
...
a = b >> c;
Use a constant
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-8
Explicitly Size State Variables
Synthesis cannot always statically determine your design intent.
Explicitly size state variables to clarify your design intent
Synthesis infers 32-bit counter. Synthesis infers 5-bit counter.
int counter = 0; ...
counter++;
if (counter == 25)
counter = 0;
sc_uint counter = 0; ...
counter++;
if (counter == 25)
counter = 0;
Size the variable
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-9
Explicitly Size Expressions
Synthesis cannot always statically determine your design intent.
Explicitly size expressions to clarify your design intent
Synthesis infers 64-bit comparator. Synthesis infers 4-bit comparator.
Explicitly size expressions only when needed and be very
careful to not induce errors!
sc_uint a, b;
...
if ((a-1) > b)
...
sc_uint a, b;
...
if ((a-sc_uint(1)) > b)
...
Size the expression
Synthesis assumes maximum width i.e. long long (1LL)
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-10
Control Variable Dynamics
Synthesis cannot always statically determine your design intent.
Explicitly control variable dynamics to clarify your design intent
32-bit variable shift. 16-bit variable shift.
sc_in valid_in;
sc_in word_in;
...
unsigned shift(unsigned data)
{
while (!valid_in) wait();
sc_uint word = word_in;
return data
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-11
Pad Array Inner Dimensions to Powers of 2
Simplify address calculation concatenate instead of multiply and add.
If mapped to registers, unused registers are removed
If mapped to RAM, unused RAM may remain
Multiply and add: i*9+j Concatenate: { i[1:0], j[3:0] }
int A[3][9]; ...
for (int i=0; i
-
#
01/31/2011 6-13 SystemC Synthesis using C-to-Silicon Compiler
Code an Optimal Control Flow
Eliminate code not reachable in the target operating environment
Caused by input value constraints synthesis does not know about
Simplify and compact consecutive or nested if conditions to reduce the
number of multiplexors
Rewrite a cascaded if else if statement (priority implementation) as a switch statement (parallel implementation) where applicable
if (cond) do_this();
if (!cond) do_that();
if (cond)
do_this();
else do_that();
if (value==0) do_this(); else
if (value==1) do_that(); else
do_other();
switch (value) { case 0: do_this(); break;
case 1: do_that(); break;
default: do_other(); break;
}
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-14
Provide Realistic Timing Constraints
Do not overconstrain the clock!
An overconstrained clock unnecessarily increases area and timing
Can prevent resource sharing that otherwise would occur
Can prevent operator rescheduling to a less-utilized pipeline state
If the operator delay exceeds the clock cycle
The tool will not move the operator
Potentially leaving it bundled with other ops
clock constraint +
clock
realized constrain
latency
and ops,
not clock
-
#
01/31/2011 6-15 SystemC Synthesis using C-to-Silicon Compiler
Fully Describe a Datapath in One Thread
The scheduler cannot share resources between threads (or by extension, modules).
Group datapath operations into as few threads as practical
(cannot always group operations executing at different throughputs)
Operations that can be grouped. Operations grouped into one thread.
my_module::proc1() {
wait();
for (;;;) {
if (cond)
ya = a1 + a2;
wait();
}
}
my_module::proc2() {
wait();
for (;;;) {
if (!cond)
yb = b1 + b2;
wait();
}
}
my_module::proc() {
wait();
for (;;;) {
if (cond)
ya = a1 + a2;
else
yb = b1 + b2;
wait();
}
}
Reduce number of
threads
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-16
Pass Function Arguments by Value
Pass-by-Pointer Pass-by-Reference Pass-by-Value
int func(int *in,
int *out);
int func(int &in,
int &out);
int func(int in);
Accepted Better Best
May produce
inoptimal timing
May produce
inoptimal area
Most aggressive
optimization
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-17
Move Local Write/Read Arrays to Module Body
Synthesis initializes non-const function-local variables i.a.w. C++ semantics
Synthesis must schedule initialization of non-const local arrays mapped to RAM
Local array mapped to RAM. Member array mapped to RAM.
SC_MODULE (my_module) {
...
private:
...
};
void my_module::foo() {
int array[100]={}; ...
}
SC_MODULE (my_module) {
...
private:
int array[100]={}; };
void my_module::foo() {
...
...
} Initialized
Not initialized
Make array
member
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-18
Separate I/O and Computation to Facilitate Scheduling
Synthesis must schedule I/O operations in the cycle where coded.
Separate I/O and computation to allow scheduling flexibility
I/O and computation in one cycle. Flexible scheduling.
while (true) {
...
wait();
...
result.write( subtract.read()
? a - b
: a + b ); }
while (true) {
bool sub = subtract.read(); wait();
...
result.write( sub
? a - b
: a + b );
}
Generally separate
out I/O ops
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-19
Combine I/O and Computation to Facilitate Sharing
Combining I/O and computation in one cycle can reduce resources.
Opcode registers are not shared.
ALU is shared.
Opcode register is shared (muxed).
ALU is shared.
while (true) {
op1 = opcode1.read();
op2 = opcode2.read();
...
opN = opcodeN.read();
result1 = ALU(data,op1);
result2 = ALU(data,op2);
...
resultN = ALU(data,opN);
wait(N);
}
while (true) {
op1 = opcode1.read();
result1 = ALU(data,op1);
wait();
op2 = opcode2.read();
result2 = ALU(data,op2);
wait();
...
opN = opcodeN.read();
resultN = ALU(data,opN);
wait();
}
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-20
Forcing Signal Semantics Suppresses Register Sharing
Prohibiting resource sharing can improve timing by removing multiplexors.
Not recommended style but sometimes can be useful
Assume each ALU operation fully utilizes the clock cycle
Register shared between cycles. Registers not shared between cycles.
int result1;
int result2;
};
int module::func(int data_in) {
result1=ALU(data_in,opcode1);
wait();
result2=ALU(result1,opcode2);
wait();
return ALU(result2,opcode3);
}
sc_signal result1;
sc_signal result2;
};
int module::func(int data_in) {
result1=ALU(data_in,opcode1);
wait();
result2=ALU(result1,opcode2);
wait();
return ALU(result2,opcode3);
}
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-21
Known Problems and Solutions
Tips and current limitations specific to the C-to-Silicon Compiler:
Declare large classes as SystemC modules
Limit each pointer to maximum of 16 objects
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-22
Declare Large Classes as SystemC Modules
The C-to-Silicon Compiler handles modules more efficiently than arbitrary classes.
Converting arbitrary classes to modules may solve a capacity problem.
Potential capacity problem. May resolve capacity problem.
class mpeg_decoder { // A really big class
...
};
SC_MODULE(my_module) {
...
private:
mpeg_decoder my_decoder;
};
SC_MODULE (mpeg_decoder) { // A really big class
...
};
SC_MODULE(my_module) {
...
SC_CTOR(my_module)
: my_decoder("my_decoder")
{...}
private:
mpeg_decoder my_decoder;
};
Make it a module
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-23
Limit Each Pointer to Maximum of 16 Objects
The C-to-Silicon Compiler tracks up to 16 objects that a pointer can point to.
You can assign any number of addresses of the up to 16 objects.
Cannot use 1 pointer for 18 objects. Use 1 pointer for maximum 16 objects.
SC_MODULE(...) {
...
private:
int buf00[32];
int buf01[32];
...
int buf19[32];
...
int *ptr00to19; };
SC_MODULE(...) {
...
private:
int buf00[32];
int buf01[32];
...
int buf19[32];
...
int *ptr00to15; int *ptr16to19;
};
Maximum of 16
objects
01/31/2011 6-26 SystemC Synthesis using C-to-Silicon Compiler
Coding for High QoR Quiz
1. Explain how explicitly sizing expressions might cause problems.
2. Suggest a reason why synthesis might not be able to remove code
representing functionality that your device will never use.
-
#
01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-27
01/31/2011 6-28 SystemC Synthesis using C-to-Silicon Compiler
Coding for High QoR Quiz Solution
1. Explain how explicitly sizing expressions might cause problems.
While explicitly sizing expressions, you can very easily inadvertently lose the more significant result bits for operations such as addition
and multiplication.
2. Suggest a reason why synthesis might not be able to remove code
representing functionality that your device will never use.
Synthesis cannot be aware of how the target environment might restrict the value ranges of data inputs and combinations of control
inputs, thus sometimes cannot strip design functionality that will
never be used.