simplifying the integration of processing elements in computing systems using a programmable...

37
Simplifying the Integration of Processing Elements in Computing Systems using a Programmable Controller By Lesley Shannon and Paul Chow University of Toronto

Upload: harmony-lory

Post on 14-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Simplifying the Integration of Processing Elements in Computing

Systems using a Programmable Controller

By

Lesley Shannon and Paul Chow

University of Toronto

Overview

• Motivation

• Computing System Design and Architecture

• SIMPPL Controller

• Future Work

Motivation

• FPGAs are used to implement increasingly complex designs

• Need to minimize system design time

• Previously designed modules can be reused as Processing Elements (PEs)

Objective

• Simplify the reuse of PEs in new applications

– Facilitate the physical integration of PEs

– Abstract data transfers from the physical design

– Make it easier for designers to alter a PE’s functionality

Solution

• Standardize the physical interconnections between modules

• Standardize the communication protocols for passing data between modules

• Separate the functionality from the communication protocols

The SIMPPL Model

CE

CE CE

CE CE

CE

off-chip

on-chip

CE= Computing Element

The SIMPPL Model

CE

CE CE

CE CE

CE

off-chip

on-chip

CE= Computing Element

How a Hardware CE Works

PE

LocalProgram

ExecutionEngine

RxRemote

Instr

off-chipcommunication

TxRemote

Instr

Internal Structure of a Hardware CE

PE(Hardware IP)

SIMPPL ControlSequencer (SCS)

External I/O Signals

Internal Rx and Tx Communication Links

(FIFOs)

SIMPPLController

ComputingElement (CE)

Rx Tx

Internal Structure of a Hardware CE

PE(Hardware IP)

SIMPPL ControlSequencer (SCS)

External I/O Signals

Internal Rx and Tx Communication Links

(FIFOs)

SIMPPLController

ComputingElement (CE)

Rx Tx

SIMPPL Controller DatapathEX

IR

a0

REG

ProgInstr

InternalRx

Link

InternalTx

Link

ReceivedData

TransmittedData

ControllerStatus

Bits

Processing Element (Hardware IP)

SIM

PP

L C

ontr

ol

Seq

uenc

er (

SC

S)

OptionalAsynchronous

FIFOs

SIMPPL Controller

Instruction Packet Format

} Instruction

Immediate Address

Data 0

Data 1

Data NDW - 1

1

0

Data 2

0

0

0

0

opcode

program wordcontrol bit

TxCE

Num Data Words (NDW)

.

.

.

} *Optional

RxCE

DataPacket

Instruction Types• Immediate Data Transfer

• Immediate Data + Immediate Address

• Address Register Initialization

• Address Register Arithmetic

• Immediate Data + Indirect Addressing

• Immediate Data + Autoincrementing

• Wait Receive

• Noop

• Reset

Internal Structure of a Hardware CE

PE(Hardware IP)

SIMPPL ControlSequencer (SCS)

External I/O Signals

Internal Rx and Tx Communication Links

(FIFOs)

SIMPPLController

ComputingElement (CE)

Rx Tx

SIMPPL Controller Sequencer

SIMPPL Controller

ProgramWord

ProgramControl

BitValid

Instruction

ProgramInstruction

ReadStatus

Bits

SIMPPL Control Sequencer (SCS)

Store Unit(Program)

PC

A SIMPPL Example

Memory CE(32 KB)

Sensor UnitCE

Environmental Data Sampling Unit

A SIMPPL Examplewrite start addr to a0;for (i=0; i<1024; i++){ while (!valid_sensor_data); write 8 data words starting at addr (a0); a0 = a0 + 8;}return;

Memory CE(32 KB)

Sensor UnitCE

Environmental Data Sampling Unit

SIMPPL Controller Sequencer

SIMPPL Controller

ProgramWord

ProgramControl

BitValid

Instruction

ProgramInstruction

ReadStatus

Bits

SIMPPL Control Sequencer (SCS)

Store Unit(Program)

PC

Done state: nextPC = Done state;}

Sensor Unit SCS Program Counter

Write autoinc state: if (SampleCntr=1024) nextPC = Done state; else nextPC = Write autoinc state;

if (rst=1){ PCstate <= Write a0 state;else PCstate <= nextPC;}

Write a0 state: if ((Instruction Read) && (rst=0)) nextPC = Write address state; else nextPC = Write a0 state;Write address state: if (Instruction Read) nextPC = Write autoinc state; else nextPC = Write address state;

//Next-state state machine for the PC:Case(PCstate){

write start addr to a0;

return;

for (i=0; i<1024; i++){ while (!valid_sensor_data); write 8 data words starting at addr (a0); a0 = a0 + 8;}

SIMPPL Controller Sequencer

SIMPPL Controller

ProgramWord

ProgramControl

BitValid

Instruction

ProgramInstruction

ReadStatus

Bits

SIMPPL Control Sequencer (SCS)

Store Unit(Program)

PC

Done state: valid_instruction = 0;}

Done state: program_word = Stall controller; program_control_bit = 0;}

Sensor Unit SCS Program

Write autoinc state: program_word = Write data line instr; program_control_bit = 1;

Write autoinc state: valid_instruction = valid_sensor_data;

Write a0 state: program_word = Write a0 instruction; program_control_bit = 1;Write address state: program_word = Write address to a0; program_control_bit = 0;

Write a0 state: valid_instruction = 1;Write address state: valid_instruction = 1;

Case(PCstate){

Case(PCstate){

write start addr to a0;

return;

for (i=0; i<1024; i++){ while (!valid_sensor_data); write 8 data words starting at addr (a0); a0 = a0 + 8;}

Streaming System Architecture

Vid_OutCE

Vid_InCE

MemBank

0

MemBank

1

MemCE

Snap-Shot System Architecture

Vid_OutCE

Vid_InCE

MemBank

0

MemBank

1

switchMemCE

Controller Implementation Results

Measured Quantity Vid_In CE Vid_Out CE Mem CENumber of LUTs 350 260 436Number of flipflops 177 163 161Instr. Fetch Overhead 1 cycle 1 cycle 1 cycleInstr. Decode Overhead 1 cycle 1 cycle 1 cycleMem. Arb. Overhead N/A N/A 3 cyclesInstr. Execute Overhead 2 cycles 4 cycles 2 cyclesBuffering Overhead 1 cycle 1 cycle 1 cycle*Early Indication Cycles -4 cycles -20 cycle N/ATotal Overhead 1 cycle -13 cycles 8 cycles

SCS Implementation Results

Sample System Vid_In SCS Vid_Out SCS MemA SCS MemB SCSStreaming Video

LUTs29 2 0 42

Snap Shot LUTs

34 2 0 40

Streaming Video Flipflops

20 3 0 19

Snap Shot Flipflops

23 3 0 22

SCS Implementation Results

• Both systems were implemented on-chip in 6 hours!

Sample System Vid_In SCS Vid_Out SCS MemA SCS MemB SCSStreaming Video

LUTs29 2 0 42

Snap Shot LUTs

34 2 0 40

Streaming Video Flipflops

20 3 0 19

Snap Shot Flipflops

23 3 0 22

Adding to the System

Vid_OutCE

Vid_InCE

MemBank

0

MemBank

1

switch

MemCE

ProcImage

CE

Summary

• Described the SIMPPL computing model that significantly reduces design time

• Created a hardware CE architecture to simplify PE reuse

• Demonstrated that CEs can easily be adapted to different applications

Future Work

• What types of on-chip debugging and verification tools can be used for designing with the SIMPPL model?

• Can the SCS be autogenerated from a high-level description?

• Can a PE-specific controller be generated from a high-level description?

Simplifying the Integration of Processing Elements in Computing

Systems using a Programmable Controller

Thank you.

Standardizing IP Interconnect

OCP to Bus B

(b)

Bus A

(a)

Bus B

H/W IPto

OCP to Bus A

OCP

H/W IPto

OCP

H/W IP H/W IPIPInterface

IPInterface

Snap-Shot System Architecture

Vid_OutCE

Vid_InCE

MemBank

0

MemBank

1

switchMemCE

Shared Memory Computing Element

ARBITER

Mem Bank 1Mem Bank 0

Mem Bank 1Controller

Mem Bank 0Controller

SIMPPLController

Mem Bank A

SIMPPLController

Mem Bank BSCS A SCS B

A B

A B

0 1

A B

InternalCommunication

Links to other CEs

sel0 sel1

req req

ack ack

I/O CommunicationLinks to off-chip Memory

Mem CE

Reusing Processing Elements

• PEs may require redesign to be incorporated into new Computing Systems due to:

– Differences in the physical interface

– Differences in the communication protocols

– Differences in the functional requirements

Controller Implementation Results

Measured Quantity Vid_In CE Vid_Out CE Mem CENumber of LUTs 350 260 436Number of flipflops 177 163 161Instr. Fetch Overhead 1 cycle 1 cycle 1 cycleInstr. Decode Overhead 1 cycle 1 cycle 1 cycleMem. Arb. Overhead N/A N/A 3 cyclesInstr. Execute Overhead 2 cycles 4 cycles 2 cyclesBuffering Overhead 1 cycle 1 cycle 1 cycle*Early Indication Cycles -4 cycles -20 cycle N/ATotal Overhead 1 cycle -13 cycles 8 cycles

Design Space

• Data Intensive systems

• Point-to-Point Communications (Directed Communications)

• Modular Design

Example: Block Diagram of MPEG4

Switch

SWITCH

FrameStore

Shapecoding

QDCT

pred. 3

Q -1

motiontexturecoding

videomultiplex

Motionestimation

+

+

IDCT

+_

pred. 2

pred. 1