lev kirischian, irina terterian, pil woo chun and vadim geurkov
DESCRIPTION
Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture. Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov Embedded and Re-configurable Systems Lab RYERSON University, CANADA. Example of Multi-task Data-Flow workload - PowerPoint PPT PresentationTRANSCRIPT
Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture
Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture
Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov
Embedded and Re-configurable Systems Lab
RYERSON University, CANADA
Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov
Embedded and Re-configurable Systems Lab
RYERSON University, CANADA
Example of Multi-task Data-Flow workload Example of Multi-task Data-Flow workload where each task can run in different modeswhere each task can run in different modes
Time
Tasks
Task 1: Mode 1 Mode 2 Mode 3
Task 2: Mode 1 Task 2: Mode 2
Task 3
Task 4: Mode 1 Mode 3 Mode 4 Mode 7
Software-to-task optimizationSoftware-to-task optimization allows using conventional allows using conventional computing platforms with computing platforms with fixed architecturefixed architecture (Superscalar, (Superscalar, VLIW, etc.) coupled with software compilers and OS. VLIW, etc.) coupled with software compilers and OS.
Limitations of the conventional processors Limitations of the conventional processors
1.1. If tasks are executed on sequential computing system – processing If tasks are executed on sequential computing system – processing time often cannot fit specification requirementstime often cannot fit specification requirements
2.2. If tasks are executed on parallel computing system with fixed If tasks are executed on parallel computing system with fixed architecture – cost-effectiveness of these parallel computers strongly architecture – cost-effectiveness of these parallel computers strongly depend on the tasks algorithm or data structure depend on the tasks algorithm or data structure
Usual Approach: Conventional Processors with Software-to-Task Optimization (Compilers +OS)
ASP allows reaching required cost-performance ASP allows reaching required cost-performance parameters because ASP-architecture is optimized on parameters because ASP-architecture is optimized on data-flow graph of the task and task data structuredata-flow graph of the task and task data structure
Alternative Approach: Application Specific Processors(ASP) with Static Hardware-to-Task OptimizationAlternative Approach: Application Specific Processors(ASP) with Static Hardware-to-Task Optimization
1.1. Decrease of performance if task algorithm or data Decrease of performance if task algorithm or data structure changesstructure changes
2.2. Limited possibility for further modernization Limited possibility for further modernization 3.3. High cost for multi-task or multi-mode custom High cost for multi-task or multi-mode custom
computing systemscomputing systems
Limitations for the Application Specific Processors Limitations for the Application Specific Processors
Proposed Approach: Reconfigurable Processor with Dynamic Architecture-to-Task Optimization
High-performance computing system for multi-task data-High-performance computing system for multi-task data-flow applications should contain two major components:flow applications should contain two major components:
1. 1. Dynamically Re-configurable Computing PlatformDynamically Re-configurable Computing Platform based on partially-configurable FPGA devices to provide based on partially-configurable FPGA devices to provide maximum possible hardware flexibility.maximum possible hardware flexibility.
2. Library of 2. Library of Application Specific Virtual ProcessorsApplication Specific Virtual Processors (ASVP) – configuration bit-streams to program On-Chip (ASVP) – configuration bit-streams to program On-Chip Application Specific Processor’s circuitry for the period Application Specific Processor’s circuitry for the period of time while Application (Task) is active. of time while Application (Task) is active.
Architecture of Partially Reconfigurable FPGA devices (Xilinx “Virtex” Family)Architecture of Partially Reconfigurable FPGA devices (Xilinx “Virtex” Family)
I / OI / OFrameFrame
I / OI / OFrameFrame
CLBsCLBsFrameFrame
# 1# 1
CLBsCLBsFrameFrame
# N# N
BlockBlockRAMRAM
CLBsCLBsFrameFrame
# i# i
BlockBlockRAMRAM
Internal (Virtual BUS)Internal (Virtual BUS)
Internal Configuration SRAMInternal Configuration SRAM
Configuration Data FilesConfiguration Data Files
CLB - CCLB - Configurableonfigurable L Logicogic B Block - Uniform Logic Element of a lock - Uniform Logic Element of a Frame, smallest individually configurable component in the FPGAFrame, smallest individually configurable component in the FPGA
In Out
Application Specific Virtual ProcessorApplication Specific Virtual Processor (ASVP) (ASVP) – – a group of logic resources dedicated and optimally a group of logic resources dedicated and optimally configured to reflect the algorithm and data structure configured to reflect the algorithm and data structure of the task.of the task.
ASVP is presented in a form of configuration data file ASVP is presented in a form of configuration data file (configuration bit-stream) to be downloaded into the (configuration bit-stream) to be downloaded into the FPGA when task should be activatedFPGA when task should be activated
Concept of Application Specific Virtual Concept of Application Specific Virtual Processor (ASVP)Processor (ASVP)
1. ASVP-core downloads to the Reconfigurable 1. ASVP-core downloads to the Reconfigurable platform before task activationplatform before task activation
2. ASVP performs the task data processing as long as it 2. ASVP performs the task data processing as long as it is necessary without interruption or time sharing of is necessary without interruption or time sharing of dedicated logic resources with any other taskdedicated logic resources with any other task
3. After task completion all resources included in3. After task completion all resources included in the ASVP can be re-configured for any other task.the ASVP can be re-configured for any other task.
Life-cycle of Application Specific Virtual ProcessorLife-cycle of Application Specific Virtual Processor
ASVP Architecture-to-Task Optimization ASVP Architecture-to-Task Optimization in Partially Reconfigurable FPGAin Partially Reconfigurable FPGA
Data-Flow GraphData-Flow Graph
XOR XOR
Data InData In
+
Data OutData Out
InputInput
OutputOutput
XXOORRXXOORR
++
FPGAFPGA
FPGA Slots: 1 2 3 ... FPGA Slots: 1 2 3 ...
Internal (Virtual) BUSInternal (Virtual) BUS
VirtualHardware
ComponentXOR
Processing
Element (PEi)
Interface Element (IEj)
Local routing
Tri-state Buf-fers
VHC
Global Routing
Lines
Micro-architecture of a Virtual Hardware Component
Virtual Bus
Virtual Hardware Component Boundary
Virtual Hardware Component & Virtual Bus Interconnection
MOi
X i Y i … X n Y n
MOn
MOk
Result
VHCi
{MO i} IE i
VHC n
{MOn} IE n
VHC k
{MOk} IE k
Application Specific Virtual Processor (ASVP)
I/O B LOCK
Xi Yi … Xn Yn Result
Virtual. Bus Lines # i # i+1 # i+2 # n #n+1 #n+2 #k
Micro-architecture of Application Specific Virtual Processor (ASVP)
Micro-architecture of ASVP is based on Virtual Hardware Components interconnected via Virtual Bus lines
ASVP 2ASVP1 for Task 1
Virtual Bus
Data in #2
FU 3FU 2FU 1 FU 4
Data out #2
I/O 3 I/O 4I/O 1 I/O 2Data in #1
Data out #1
Data out #3
ASVP 3
Parallel Task Processing on the Dynamically Re-Parallel Task Processing on the Dynamically Re-configurable Stream Processor (DRSP) configurable Stream Processor (DRSP)
RIM 1 RIM 2 RIM 3 RIM 4
DRSP: System Level ArchitectureDRSP: System Level Architecture
Host PCHost PC
Task Memory Task 1:{Afix+Amodes}
………………….Task h:{Afix+Amodes}
PCI-
Bus
PCI-InterfaceModule
PRCP-basePRCP-baseReconfigurableReconfigurableFunctional UnitFunctional Unit
Afix Afix ii + … + …
ReconfigurableReconfigurableFunctional UnitFunctional Unit
Afix Afix ii + … + …
Data Stream SourceData Stream SourceData Stream SourceData Stream Source
Configuration& Data Bus
Configuration& Data Bus
Data OutData OutData OutData Out
RT-HOS
Cache Memory
{Amodes i}
Cache Memory
{Amodes i}
Architecture of Reconfigurable Computing Module Architecture of Reconfigurable Computing Module
2 x 3.43 Gbit / S (12 bit*300 MHz) Input LVDS ports2 x 3.43 Gbit / S (12 bit*300 MHz) Input LVDS ports
2 x 3.43 Gbit / S (12 bit*300 MHz) Output LVDS Ports2 x 3.43 Gbit / S (12 bit*300 MHz) Output LVDS Ports
8.12 Gbit /SLVTTL
BUS(64 bit x133MHz)
Reconfig.Functional
Unit [ RFM
0111-002]
Reconfig.Functional
Unit [ RFM
0111-002]
Real-Time HardwareOperating SystemBased on XCV50E
Vertex FPGA
Real-Time HardwareOperating SystemBased on XCV50E
Vertex FPGA
PCIInterface800
Mbit/S
PCIInterface800
Mbit/S
SPISPI
SPISPI
Config.Files / DataCache (4x512KB)
Config.Files / DataCache (4x512KB)
Reconfigurable Computing Module based on Reconfigurable Computing Module based on Xilinx “Virtex-E family of FPGA DevicesXilinx “Virtex-E family of FPGA Devices
Restoration of ASVP using spare CLB-columnRestoration of ASVP using spare CLB-columnRestoration of ASVP using spare CLB-columnRestoration of ASVP using spare CLB-column
InputInput
OutputOutput
XXOORRXXOORR
++
AP AP ii
Column # 1 2 3 ... Column # 1 2 3 ...
Communication FieldCommunication Field
++
If hardware fault occurs If hardware fault occurs the damaged Virtual the damaged Virtual Hardware Component Hardware Component can be relocated to the can be relocated to the reserved CLB-columnreserved CLB-column..
When the proposed technology is most beneficial?
• Workload consists of many tasks, where each task can run in different modes.
• Each task requires high-speed data-stream processing
• Task algorithms may be modified within life cycle of a system
• Active tasks must run in parallel and should not be interrupted in any case when one of the tasks switches its mode or terminates.
• System can be remotely or self-restored even if some hardware fault occurs
DRSP Application for Networked Intelligent Manufacturing Systems
DRSP Application for Networked Intelligent Manufacturing Systems
High performance parallel data-stream processing (up to thousands of billions operations / sec.) of big volume of data (up to hundreds of Giga bits) for:
a) Complex image processing and image recognition,
b) Spectrum analysis and digital signal processing,
c) Data transmission via LAN with data compression / decompression and encryption / decryption,
d) Control of high performance manufacturing equipment and robotic systems.
High performance parallel data-stream processing (up to thousands of billions operations / sec.) of big volume of data (up to hundreds of Giga bits) for:
a) Complex image processing and image recognition,
b) Spectrum analysis and digital signal processing,
c) Data transmission via LAN with data compression / decompression and encryption / decryption,
d) Control of high performance manufacturing equipment and robotic systems.
Acceleration of Task / Mode SwitchingAcceleration of Task / Mode Switching
0
5
10
15
20
25
1 2 3 4 5 6 7 8 9 10
Number of CLB-slots in Virtual Component
Ac
ce
lera
tio
n
Acceleration of task or mode switching comparing with Entire FPGA-based system increases when number of CLB-columns in ASVP is minimal and can be over that 20 times faster
Modes
Tasks
2 4 8 16
4 2.8 4.4 7.6 14
8 5.6 8.8 15.2 28
16 11.2 17.6 30.4 56
When number of tasks and task modes increases in a workload, respectively increases the cost-effectiveness of DRSP
Minimization of Hardware Resources
Minimization of Logic resources in DRSP approach Comparing with entire FPGA-based systems:
SUMMARY: RDSP Comparing with SUMMARY: RDSP Comparing with Conventional CPU, DSP or ASP PlatformsConventional CPU, DSP or ASP Platforms
DRSP DRSP Conv. CPU DSP ASP Conv. CPU DSP ASP
PerformancePerformance
FlexibilityFlexibility
ReliabilityReliability Much lower Much lower than DRSPthan DRSP
Lower than Lower than DRSPDRSP
Much lower Much lower than DRSPthan DRSP
Much lower Much lower than DRSPthan DRSP
Much lower Much lower than DRSPthan DRSP
Lower than Lower than DRSPDRSP
Somewhat Somewhat higherhigher
None, or very None, or very littlelittle
Lower than Lower than DRSPDRSP
Thank youThank you