bluespec talk
DESCRIPTION
Bluespec is a language based on haskell for designing VHDL and verilog hardware.TRANSCRIPT
Synthesis of Synchronous Assertions with Guarded Atomic
ActionsMIT && Bluespec, Inc. guys
Presented by: Suman Karumuri
Andy Bartholomew
Life Cycle of a chip
Quarks to Parallel Universes …
• PFET/NFET• Transistor ( 2 FETS)• NAND / OR / NOT
gates.• Circuits• Modules• Integrated Circuits
(IC)• ASIC’s / Chip
Birth of a chip
• Requirements
• Design
• Coding
• Testing and simulation.
• Formal verification.
• Synthesis
25 % time
75 % time
Birth of a chip• Requirements• Design
• Coding– HDL, RTL.– Verilog HDL– VHDL
• System (transistor level)• Behavioral (expressions)• Structural (functions)• OO (Regular Languages)
– SystemC – Lava, Bluespec (High level languages).
• Testing and simulation• Formal verification.• Synthesis
Birth of a chip• Requirements• Design• Coding
• Testing and simulation– Software– Hardware
• Formal verification– Model Checking– Proving programs
• Synthesis
3% of test space
This paper
Turing Award 2008
Birth of a chip• Requirements• Design• Coding• Testing and simulation• Formal verification
• Synthesis (Burn the design on FPGA)– Chip Area– Power consumption– Minimal number of transistors– Speed
Bluespec
Motivation
• SystemC lessons– Single assignment.– No state.– No destructive assignment.– Chaining of states.– Weak Type system.
• Lava lessons.– Haskell. ( Functional, Monads, Polymorphic Type inference).– Modules.
• Bluespec– Full fledged language instead of haskell modules.– “ Behavioral model is atomic actions with guards on state”.
• Data flow model – OO for Reuse.
Bluespec -> Chip
ExtendedHaskell
TRSVerilog
Or C
RTLSynthesis
Concurrency and
atomicity
Correct Programs
TRS: Term Rewriting systemRTL: Register transfer language.
Bluespec Language
• Extended Haskell + Bit Vectors (Data types)
• No clocks.• Modules for OO.
– Rules– Methods– Scheduler
• Data Flow language.– Guarded atomic actions.
Rules
• Atomic Expressions.• Execute when the guard is true.• Run for 1 clock cycle.• Local to a module (private methods).• Can call methods.rule sync_cache(state == Synchronize); case (cache[index]) matches
tagged Valid {.tag, .data, .isDirty}: if (isDirty) begin
writeToMemory({index, tag}, data); enddefault: noAction;
endcaseendrule
Guard
Method Call
Methods
• Set of commands invoked by a rule or other methods.• Public methods in C++.• Perform an Action, Value or ActionValue.method Action get_data(Address addr)
if (state == Ready);Index i = get_index(addr);case (cache[i])
tagged Valid {.tag, .data, .isDirty}:if (tag == get_tag(addr))
sendToProc(addr, data); //hitelse //conflict miss
getFromMemory(addr);endcase
endmethod
Another way of adding guards
Modules
• Consists of Interfaces, Rules and Method implementation.
• Enables Reuse.interface CacheController;
method Action get_data(Address addr);
method Action write_data(Address addr,Value v);
method Action sync();
method Action flush();
endinterface
Summary: Bluespec
All state (e.g., Registers, FIFOs, RAMs, ...) is explicit.Behavior is expressed in terms of guarded atomic actions on the state: Rule: condition action Rules can manipulate state in other modules only via their interfaces.
interface
module
Scheduler
• Generates a static schedule by looking at guard conditions on rules and methods.
• Ensures atomicity.
• Runs non-conflicting rules concurrently.
• Rules are scheduled locally.
• Methods are scheduled globally.
Compiler model
Muxing for each stateelement
1
n
Modules(Current state)
Modules(Next state)
Rules
n
nguard
action
Scheduler
1
n
1
n
“CAN_FIRE” “WILL_FIRE”
Compiler generates a scheduler to pick a non-conflicting subset of “ready” rules
SVA + Bluespec = BSV
System Verilog Assertions (SVA)
• A temporal logic.
• Validate behavior of a design.
• Uses: test benches, formal verifiers , simulation.
• Sequences, Properties.
Sequence
• Simple Sequence
sequence seq;
(x ##1 y)
or
(x ##1 y ##1 z);
endsequence
Sequence
• Simple Sequence
sequence seq;
(x ##1 y)
or
(x ##1 y ##1 z);
endsequence
True on CC1.
Sequence
• Simple Sequence
sequence seq;
(x ##1 y)
or
(x ##1 y ##1 z);
endsequence
True on CC2.
Complex Sequence
sequence reqack;req && data_in == 0##1 data_in > 0 [*3:5]##1 ack && data_in == 0;
endsequence
First clock cycle
Starting Second Clock Cycle for the
next 3-5 clock cycles
Finally data_in is low when we get
an ack.
Properties
• Made up of sequences. • Implication operator |->.• sequence |-> property
property goodbuffer;
(req ##1 data_in > 0)
|-> !fifo_in.full;
endproperty
• sequence |=> property
Assertions
• Properties are checked via assertions.
always assert property (goodbuffer);
Bluespec System Verilog(BSV)
Challenges
• SVA model is clocked.
• Bluespec model is not.
• Some schedules will not be valid; designer intervention required. – Achieved through scheduler configuration.
Compiling assertions
• Sequences and properties are compiled into FSMs.
• An assertion is turned into a module.
• Assertions are run as rules.
• We can use the same Bluespec compilation techniques as before.
Compiling sequences
• x ##1 y
x y end
Assertions in hardware
• Properties can run across multiple clock cycles.
• In software, we just spawn a concurrent thread to check the assertion.
• You can’t do that in hardware.
• Instead we create multiple copies of the same FSM along the length of the sequence.
Assertions in hardware
• always assert x ##1 y
x y end
x y end
or
t=0
x
y
Assertions in hardware
• always assert x ##1 y
x y end
x y end
or
t=1
x
y
Assertions in hardware
• always assert x ##1 y
x y end
x y end
or
t=2
x
y
Composing sequences
• Simple booleans can be generalized into a sequence module.
Other combinations
General model of an assertion
Coverage
• A bunch of productions in SVA are not covered in BSV
• Recursion– Solve halting problem to generate FSMs.– Can be used when recursion depth can be
statically determined.
• Disable iff and other properties.
Case study
functional assertion
• “On a write request only one cache-way is written”
property goodWriteRequest;write_request |=>
if (cache_tag_resp.next_evict_way0)isWrite(way0_req)
&& !isWrite(way1_req)else isWrite(way1_req) && !isWrite(way0_req);
endproperty
Performance assertion
“When a cpu request is made a cache memory read is made in the same cycle. For read requests, either main memory is read or result returned in next cycle.”
property cpu_read_perf;read_request |->
isRead(way0_req) && isRead(way1_req) && isRead(tag_req) ##1 isRead(c2memory_req)
|| isRead(c2p_data);endproperty
Statistic-gathering assertion!
• You couldn’t do this before!• Counting read hitsproperty count_read_hits;read_request |=> isValid(c2p_data);
endproperty
always assert property (count_read_hits)read_hits <= read_hits + 1;
elseread_misses <= read_misses + 1;
Advantages
• Code Reuse. High level semantics.
• High-level programming constructs from Bluespec + the temporal logic ala SVA.
• More tests. Hardware simulation is a lot faster (1000x) than software simulation.
• Dynamic testing.
• Statistics gathering.
Misgivings
• Ad-hoc design (from a theoretical view point)
• Guards may reduce concurrency.• Correct concurrent behavior can’t be
guaranteed.• No public docs.• Tweaking scheduler for clocked model can
be problematic.• Subset of SVA is supported.
Extensions
• BSV could be extended to assertions checked at specific times instead of always.
• Further coverage of SVA• Constraint-guided scheduler.
Compiling Guards
Before compilation
rule r1 (fifo1)
… do r1
… call r2
rule r2 (fifo2)
… do r2
After Compilation
rule r3
(fifo1 and
fifo2)
… do r1
… do r2
Better model
rule r1
(if fifo1)
… do r1
(if fifo2)
… do r2
Now another rule can use fifo1 while fifo2 is being used by r1.
Guards
• No correctness guarantees.
• Reduced concurrency.
Solution:
Transactions in Bluespec.
Questions?
BSV code compliation
function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data);
//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data;
stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]);
Stage_f can be inlined now.But the number of transistors has tripled.
stage_data[1] = stage_f(0,stage_data[0]);stage_data[2] = stage_f(1,stage_data[1]);stage_data[3] = stage_f(2,stage_data[2]);
f g
Folding
Reuse a block over multiple cycles
we expect:
Throughput to
Area to
ff g
decrease – less parallelism
Speed up clock to compensate hyper-linear increase in energy
decrease – reusing a block
802.11a Transmitter Synthesis results (Only the IFFT block is changing)
IFFT Design Area (mm2)
ThroughputLatency
(CLKs/sym)
Min. Freq Required
Pipelined 5.25 04 1.0 MHz
Combinational 4.91 04 1.0 MHz
Folded
(16 Bfly-4s)
3.97 04 1.0 MHz
Super-Folded
(8 Bfly-4s)
3.69 06 1.5 MHz
SF(4 Bfly-4s) 2.45 12 3.0 MHz
SF(2 Bfly-4s) 1.84 24 6.0 MHz
SF (1 Bfly4) 1.52 48 12 MHZ
TSMC .18 micron; numbers reported are before place and route.
The same source code
All these designs were done in less than 24 hours!