simplifying the integration of processing elements in computing systems using a programmable...
TRANSCRIPT
Simplifying the Integration of Processing Elements in Computing
Systems using a Programmable Controller
By
Lesley Shannon and Paul Chow
University of Toronto
Motivation
• FPGAs are used to implement increasingly complex designs
• Need to minimize system design time
• Previously designed modules can be reused as Processing Elements (PEs)
Objective
• Simplify the reuse of PEs in new applications
– Facilitate the physical integration of PEs
– Abstract data transfers from the physical design
– Make it easier for designers to alter a PE’s functionality
Solution
• Standardize the physical interconnections between modules
• Standardize the communication protocols for passing data between modules
• Separate the functionality from the communication protocols
How a Hardware CE Works
PE
LocalProgram
ExecutionEngine
RxRemote
Instr
off-chipcommunication
TxRemote
Instr
Internal Structure of a Hardware CE
PE(Hardware IP)
SIMPPL ControlSequencer (SCS)
External I/O Signals
Internal Rx and Tx Communication Links
(FIFOs)
SIMPPLController
ComputingElement (CE)
Rx Tx
Internal Structure of a Hardware CE
PE(Hardware IP)
SIMPPL ControlSequencer (SCS)
External I/O Signals
Internal Rx and Tx Communication Links
(FIFOs)
SIMPPLController
ComputingElement (CE)
Rx Tx
SIMPPL Controller DatapathEX
IR
a0
REG
ProgInstr
InternalRx
Link
InternalTx
Link
ReceivedData
TransmittedData
ControllerStatus
Bits
Processing Element (Hardware IP)
SIM
PP
L C
ontr
ol
Seq
uenc
er (
SC
S)
OptionalAsynchronous
FIFOs
SIMPPL Controller
Instruction Packet Format
} Instruction
Immediate Address
Data 0
Data 1
Data NDW - 1
1
0
Data 2
0
0
0
0
opcode
program wordcontrol bit
TxCE
Num Data Words (NDW)
.
.
.
} *Optional
RxCE
DataPacket
Instruction Types• Immediate Data Transfer
• Immediate Data + Immediate Address
• Address Register Initialization
• Address Register Arithmetic
• Immediate Data + Indirect Addressing
• Immediate Data + Autoincrementing
• Wait Receive
• Noop
• Reset
Internal Structure of a Hardware CE
PE(Hardware IP)
SIMPPL ControlSequencer (SCS)
External I/O Signals
Internal Rx and Tx Communication Links
(FIFOs)
SIMPPLController
ComputingElement (CE)
Rx Tx
SIMPPL Controller Sequencer
SIMPPL Controller
ProgramWord
ProgramControl
BitValid
Instruction
ProgramInstruction
ReadStatus
Bits
SIMPPL Control Sequencer (SCS)
Store Unit(Program)
PC
A SIMPPL Examplewrite start addr to a0;for (i=0; i<1024; i++){ while (!valid_sensor_data); write 8 data words starting at addr (a0); a0 = a0 + 8;}return;
Memory CE(32 KB)
Sensor UnitCE
Environmental Data Sampling Unit
SIMPPL Controller Sequencer
SIMPPL Controller
ProgramWord
ProgramControl
BitValid
Instruction
ProgramInstruction
ReadStatus
Bits
SIMPPL Control Sequencer (SCS)
Store Unit(Program)
PC
Done state: nextPC = Done state;}
Sensor Unit SCS Program Counter
Write autoinc state: if (SampleCntr=1024) nextPC = Done state; else nextPC = Write autoinc state;
if (rst=1){ PCstate <= Write a0 state;else PCstate <= nextPC;}
Write a0 state: if ((Instruction Read) && (rst=0)) nextPC = Write address state; else nextPC = Write a0 state;Write address state: if (Instruction Read) nextPC = Write autoinc state; else nextPC = Write address state;
//Next-state state machine for the PC:Case(PCstate){
write start addr to a0;
return;
for (i=0; i<1024; i++){ while (!valid_sensor_data); write 8 data words starting at addr (a0); a0 = a0 + 8;}
SIMPPL Controller Sequencer
SIMPPL Controller
ProgramWord
ProgramControl
BitValid
Instruction
ProgramInstruction
ReadStatus
Bits
SIMPPL Control Sequencer (SCS)
Store Unit(Program)
PC
Done state: valid_instruction = 0;}
Done state: program_word = Stall controller; program_control_bit = 0;}
Sensor Unit SCS Program
Write autoinc state: program_word = Write data line instr; program_control_bit = 1;
Write autoinc state: valid_instruction = valid_sensor_data;
Write a0 state: program_word = Write a0 instruction; program_control_bit = 1;Write address state: program_word = Write address to a0; program_control_bit = 0;
Write a0 state: valid_instruction = 1;Write address state: valid_instruction = 1;
Case(PCstate){
Case(PCstate){
write start addr to a0;
return;
for (i=0; i<1024; i++){ while (!valid_sensor_data); write 8 data words starting at addr (a0); a0 = a0 + 8;}
Controller Implementation Results
Measured Quantity Vid_In CE Vid_Out CE Mem CENumber of LUTs 350 260 436Number of flipflops 177 163 161Instr. Fetch Overhead 1 cycle 1 cycle 1 cycleInstr. Decode Overhead 1 cycle 1 cycle 1 cycleMem. Arb. Overhead N/A N/A 3 cyclesInstr. Execute Overhead 2 cycles 4 cycles 2 cyclesBuffering Overhead 1 cycle 1 cycle 1 cycle*Early Indication Cycles -4 cycles -20 cycle N/ATotal Overhead 1 cycle -13 cycles 8 cycles
SCS Implementation Results
Sample System Vid_In SCS Vid_Out SCS MemA SCS MemB SCSStreaming Video
LUTs29 2 0 42
Snap Shot LUTs
34 2 0 40
Streaming Video Flipflops
20 3 0 19
Snap Shot Flipflops
23 3 0 22
SCS Implementation Results
• Both systems were implemented on-chip in 6 hours!
Sample System Vid_In SCS Vid_Out SCS MemA SCS MemB SCSStreaming Video
LUTs29 2 0 42
Snap Shot LUTs
34 2 0 40
Streaming Video Flipflops
20 3 0 19
Snap Shot Flipflops
23 3 0 22
Summary
• Described the SIMPPL computing model that significantly reduces design time
• Created a hardware CE architecture to simplify PE reuse
• Demonstrated that CEs can easily be adapted to different applications
Future Work
• What types of on-chip debugging and verification tools can be used for designing with the SIMPPL model?
• Can the SCS be autogenerated from a high-level description?
• Can a PE-specific controller be generated from a high-level description?
Simplifying the Integration of Processing Elements in Computing
Systems using a Programmable Controller
Thank you.
Standardizing IP Interconnect
OCP to Bus B
(b)
Bus A
(a)
Bus B
H/W IPto
OCP to Bus A
OCP
H/W IPto
OCP
H/W IP H/W IPIPInterface
IPInterface
Shared Memory Computing Element
ARBITER
Mem Bank 1Mem Bank 0
Mem Bank 1Controller
Mem Bank 0Controller
SIMPPLController
Mem Bank A
SIMPPLController
Mem Bank BSCS A SCS B
A B
A B
0 1
A B
InternalCommunication
Links to other CEs
sel0 sel1
req req
ack ack
I/O CommunicationLinks to off-chip Memory
Mem CE
Reusing Processing Elements
• PEs may require redesign to be incorporated into new Computing Systems due to:
– Differences in the physical interface
– Differences in the communication protocols
– Differences in the functional requirements
Controller Implementation Results
Measured Quantity Vid_In CE Vid_Out CE Mem CENumber of LUTs 350 260 436Number of flipflops 177 163 161Instr. Fetch Overhead 1 cycle 1 cycle 1 cycleInstr. Decode Overhead 1 cycle 1 cycle 1 cycleMem. Arb. Overhead N/A N/A 3 cyclesInstr. Execute Overhead 2 cycles 4 cycles 2 cyclesBuffering Overhead 1 cycle 1 cycle 1 cycle*Early Indication Cycles -4 cycles -20 cycle N/ATotal Overhead 1 cycle -13 cycles 8 cycles
Design Space
• Data Intensive systems
• Point-to-Point Communications (Directed Communications)
• Modular Design