optimizing systemc for higher speed and coverage
DESCRIPTION
Optimizing SystemC for higher speed and coverage. Dogan Fennibay. Y?. SystemC becoming the de facto system-level design language SystemC emulates parallelism via scheduling Additional element effecting the result Makeup for this hole in coverage We want faster simulations - PowerPoint PPT PresentationTRANSCRIPT
Optimizing SystemC forhigher speed and coverage
Dogan Fennibay
Y? SystemC becoming the de facto system-level
design language SystemC emulates parallelism via scheduling
Additional element effecting the result Makeup for this hole in coverage
We want faster simulations To do more executions / cheaper executions SystemC’s flexibility adds up to slowness
Outline Automatic generation of schedulings for higher
coverage Introduction Related work Definitions Algorithms Evaluation
Scoot Introduction Related work Idea Evaluation
Conclusion
SystemC Do you know SystemC?
No Yes
Introduction
3 different schedulings => 3 different results a; b; a; te; b; a => “Ok” a; b; a; te; a; b => “Ko” b; a; te; b => deadlock (lost notification)
Include process C 30 different schedulings => same 3 different results Equivalence classes
void top::A() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";}
void top::B() { e.notify(); x = 0; wait(20,SC_NS); x = 1;}
void top::C() { sc_time T(20,SC_NS); wait(T);}
Introduction Scheduling also effects the results
Not just input data We have to test all possible schedulings
Impossible schedulings: do not test them Due to synchronization constraints
Equivalent schedulings: test only one e.g. two reads from a shared variable
Focus is on scheduling Input data generation is not considered
Introduction Dynamic Dependency Graphvoid top::P() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";}
void top::Q() { e.notify(); x = 0; wait(20,SC_NS); x = 1;}
green: non-permutable, red: non-commutative
Related work Formal model
Extract a formal model from SystemC model Combine with a formal model of the non-
deterministic scheduler => Model checking State space explosion
Partial order reduction Dynamic extension is new Used by model checkers, but no non-abstract uses
Test case generation & output checker Assertion based verification promising
Definitions SUTD: System Under Test + one test data
Assume: Independent test case generator Generator always independent of scheduling
Process: event or thread p, q, r, …
Transition: one execution of a process in a scheduling a, b, c, … or p1, p2, q1, r1, p3, r2, …
Scheduling String of transitions & new cycles (delta or
te) Full state
Full memory dump incl. PC of processes
p:a = x;wait(e);printf(“%d\n”, a);wait(e2);a = x * 2;
q:e.notify();b = x;wait(20, SC_NS);x = b * 2;
p1
p2
p3
q1
q2
Definitions Permutation
Modify a scheduling Change the order of a and b Other transitions may come in-between
Equivalence Two different schedulings lead to the same full
statep:
a = xq:
b = x
Definitions Permutability: a and b in a
scheduling can be exchanged An equivalent scheduling with a
& b consecutive available a and b can be exchanged in
this equivalent scheduling Commutativity: which
permutations are useful? Exchanging a and b produces an
equivalent scheduling Non-commutative permutations
are interesting
void t1() {…wait(e1);v2 = 2;}
void t2() {…v2 = 1;e1.notify();wait();}
void t3() {…printf(“%d\n”, v1);wait();}
void t4() {…printf(“%d\n”, v1 + 1);wait();}
++v1
Definitions Dependency
Boolean:permutable’ + permutable.commutative’
a must come before b, otherwise (1) is impossible or (2) a different result will be produced
Causal order: Permutable transitions wrt dependency Equivalent schedulings have the
same causal order
void t1() {…wait(e1);v2 = 2;}
void t2() {…v2 = 1;e1.notify();wait();}
void t3() {…printf(“%d\n”, v1);wait();}
void t4() {…printf(“%d\n”, ++v1);wait();}
Algorithms: Computing commutativity Shared variables
Read, then modifying write
Modifying write, then read
Write, then modifying write
Events Notification, then wait Wait, then notification Caught notification,
then notification
Non-commutative actions All other actions do not harm commutativity
Algorithms: Causal Partial Order Computed step-by-step Start with empty scheduling Choose candidates a, b; where
a or b are new cycles (delta or te) a and b from the same process b is woken up by a
Extend CPO set Add (a, b) Add non-commutative transitions of b Compute & add transitive closure of calculated
relations
Algorithms: Generating schedulings Generating one alternative scheduling
Choose two non-commutative transitions: a and b Execute the scheduling until a Execute additional transitions not causally ordered
to a Execute b, then a Execute the rest
Generating all schedulings
Evaluation: prototype
Model and kernel instrumented Checker
Get the scheduling, generate new one, feed it to patched kernel
Until no more schedulings available
Evaluation: experiments Is the overhead of
calculating schedulings worth it? V. T vs G.T + O
3 examples Indexer
Small, V calculable MPEG Decoder
50 KLOC, 4 processes Full SoC
250 KLOC, 57 processes
Indexer 128 element array for
hash table n components, each
with 2 threads, each write 4 elements
G << V n = 2, V = 3.35e11;
n = 3, V = 2.43e25
Evaluation: experiments MPEG decoder
Overhead is insignificant G.T = 50 s, O = 18 s
Special structures in code not recognized Persistent events
Complete SoC Scalability Not tested fully
because of manual instrumentation
Expectation: ok up to 200 transitions
Observation: more detailed models produce more constrained schedulings => longer schedulings testable
Scoot Helmstetter et al explore all schedulings
To much time spent Let’s go in the opposite direction
Make SystemC less flexible to get it faster Blanc et al
Introduction Faster execution (up to 5 times!) Use verification back-ends
CBMC, SATABS => Get a plain C++ model from SystemC => Use C++ frontend to support more
language constructs
Related work Work on HW synthesis via model extraction
Kostaras & Vergos and Castillo et al Only for small subset of C++
Savoiu et al Speedup via Petri-net reductions Only 1.5 times
Pérez et al Static scheduling Only event processes considered
Idea SystemC is very
flexible Dynamic run-time
binding of ports Via polymorphism
Sensitivity lists Module hierarchy => Consolidate
hierarchy
Scheduler’s inefficiencies Run-time memory
allocations Processes triggered
via function pointers => Convert to static
schedule
Evaluation AES encryption/decryption Speedup achieved up to 5.3 times
Conclusion Helmstetter et al
Eliminated the effect of scheduling
At a reasonable overhead
Problems at scability
Scoot Significant speedup
achieved Most structures of C+
+ supported Preparation for model
checking Further discussion
Why equivalence classes among schedulings? Shouldn’t all schedulings produce the same result?
Why not use Helmstetter’s algorithm for regular software to catch races?
Uses for Scoot?
Extras
SystemC Primer A system-level
design language Used for HW/SW
codesign Based on native C++ Different abstraction
levels: TLM to RTL
SC_MODULE(nand2) {sc_in<bool> A, B;sc_out<bool> F;
void do_nand2() // a C++ function
{F.write( !(A.read() &&
B.read()));}
SC_CTOR(nand2){
SC_METHOD(do_nand2); sensitive << A << B;
}};
SystemC Primer: Concepts Modules
Containers for other SystemC elements incl. modules
Channels Communication means
of modules Ports
Connection point of modules to channels
Interfaces Connection point of
channels to modules
Method processes Non-blocking code parts
triggered on events they’re sensitive to
Thread processes Independent flows of
executions May call wait
Events Basic means of
synchronization Shared variables
Same as C++
SystemC Primer: Scheduler Non-deterministic
specification: unspecified order
Non-preemptive Delta cycles used to
imitate concurrency True parallelism on
the real system
Properties of relations Reflexivity:
aRa Symmetry
aRb => bRa e.g. “is equal to”
Totality aRb or bRa
Transitivity aRb and bRc => aRc e.g. “is ancestor of”
Transitive closure e.g. all ancestors in a community
ReferencesBlanc, N., Kroening, D. and Sharygina, N., 2008,
“Scoot: A Tool for the Analysis of SystemC Models”, TACAS, 2008, 467-470.
Helmstetter, C., Maraninchi, F., Maillet-Contoz, L. and Moy, M., 2006, “Automatic Generation of Schedulings for Improving the Test Coverage of Systems-on-a-Chip”, Verimag Research Report, TR-2006-06.
Helmstetter, C., Maraninchi, F. and Maillet-Contoz, L., 2007, “Test Coverage for Loose Timing Annotations”, Formal Methods: Applications and Technology, 4346/2007, 100-115.