optimization of parallel task execution on the adaptive reconfigurable group organized computing...

25
Optimization of Parallel Task Optimization of Parallel Task Execution on the Adaptive Execution on the Adaptive Reconfigurable Group Organized Reconfigurable Group Organized Computing System Computing System Presenter: Presenter: Lev Kirischian Lev Kirischian Department of Electrical Department of Electrical and Computer and Computer Engineering Engineering RYERSON Polytechnic RYERSON Polytechnic University University Toronto, Ontario, CANADA Toronto, Ontario, CANADA

Upload: shania-townley

Post on 29-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Optimization of Parallel Task Execution Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group on the Adaptive Reconfigurable Group Organized Computing SystemOrganized Computing System

Presenter: Presenter: Lev KirischianLev Kirischian

Department of Electrical and Department of Electrical and Computer EngineeringComputer Engineering

RYERSON Polytechnic UniversityRYERSON Polytechnic University

Toronto, Ontario, CANADAToronto, Ontario, CANADA

Page 2: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Digital signal processing (DSP);Digital signal processing (DSP);High performance control & Data acquisition;High performance control & Data acquisition;Digital communication and broadcasting;Digital communication and broadcasting;Cryptography and data security;Cryptography and data security;Process modeling and simulation.Process modeling and simulation.

Application of parallel computing systemsApplication of parallel computing systems for data-flow tasksfor data-flow tasks

Page 3: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

MO 1

MO 2 MO 3

MO 5 MO 4

MO 6

Presentation of a data-flow task in the form Presentation of a data-flow task in the form of a data-flow graphof a data-flow graph

Data InData In

Data OutData Out

MO MO 11 - MO - MO nn - Macro-operators, - Macro-operators, e.g. digital filtering, FFT, matrix e.g. digital filtering, FFT, matrix scaling, etc.scaling, etc.

Page 4: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

If the data-flow task is processed on conventional SISD If the data-flow task is processed on conventional SISD architecture – processing time often cannot satisfy specification architecture – processing time often cannot satisfy specification requirements; requirements;

If the task is processed on SIMD or MIMD architectures - If the task is processed on SIMD or MIMD architectures - cost-effectiveness of these parallel computers strongly depend cost-effectiveness of these parallel computers strongly depend on the task algorithm or data structure.on the task algorithm or data structure.

One of possible solutions to reach required cost-performance One of possible solutions to reach required cost-performance requirements is to develop a custom computing system where requirements is to develop a custom computing system where architecture “covers” data-flow graph of the task.architecture “covers” data-flow graph of the task.

Correspondence between task andCorrespondence between task and computing system architecturecomputing system architecture

Page 5: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

1. Decrease of performance if task 1. Decrease of performance if task algorithm or data structure changesalgorithm or data structure changes

2. No possibility for further modernization2. No possibility for further modernization

3. High cost for multi-task or multi-mode 3. High cost for multi-task or multi-mode custom computing systems. custom computing systems.

Limitations for the custom computing Limitations for the custom computing systems with fixed architecturesystems with fixed architecture

Page 6: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

One of possible solutions – One of possible solutions – Reconfigurable parallel computing systemsReconfigurable parallel computing systems

1. Ability for custom configuration of each processing 1. Ability for custom configuration of each processing (functional) unit for a specific macro-operator (functional) unit for a specific macro-operator

2. Ability for custom configuration of information links2. Ability for custom configuration of information links between functional units;between functional units; The above features allow hardware customization for any The above features allow hardware customization for any data-flow graph and reconfiguration when task processing is data-flow graph and reconfiguration when task processing is completed.completed.

Page 7: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

FU 5

Data Stream

FU 3FU 2FU 1

FU 4FU 6

MO 1

MO 2 MO 3

MO 5 MO 4

MO 6

Example of FPGA-based system with architecture Example of FPGA-based system with architecture configured for the data-flow taskconfigured for the data-flow task

Page 8: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Concept of Group Processor in the Concept of Group Processor in the reconfigurable computing systemreconfigurable computing system

• Group Processor (GP) – a group of computing Group Processor (GP) – a group of computing resources dedicated for the task and configured to resources dedicated for the task and configured to reflect the task requirements.reflect the task requirements.

Page 9: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Group processor life- cycleGroup processor life- cycle

1. In the GP -links and functional units are1. In the GP -links and functional units are configured before task processingconfigured before task processing

2. GP performs the task as long as it is necessary2. GP performs the task as long as it is necessary without interruption or time sharing with anywithout interruption or time sharing with any other taskother task

3. After task completion all resources included in3. After task completion all resources included in the GP can be reconfigured for any other task.the GP can be reconfigured for any other task.

Page 10: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

The concept of Reconfigurable The concept of Reconfigurable Group Organized computing systemGroup Organized computing system

Host PC

Virtual Bus

Reconfigurable Interface Module (RIM)

Functional Unit (FU)

Reconfigurable Interface Module (RIM)

Reconfigurable Interface Module (RIM)

Functional Unit (FU)Functional Unit (FU)

Data Stream

I/O I/OI/O

Input / Output data busInput / Output data bus

Configuration BusConfiguration Bus

Page 11: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

GP 2GP1 for Task 1

Virtual Bus

Data in #2

FU 3FU 2FU 1 FU 4

Data out #2

I/O I/OI/O I/OData in #1

Data out #1

Data out #3

GP 3

Parallel processing of different tasks on the Parallel processing of different tasks on the separated Group Processorsseparated Group Processors

Page 12: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Multiplier Adder Filter

Data in Memory Memory

T0 T1 T2TIME

Concept of adaptation of the Group ProcessorConcept of adaptation of the Group Processor architecture on the task architecture on the task

Architecture-to-task adaptation for the GP =Architecture-to-task adaptation for the GP = selection of resources configuration which:selection of resources configuration which:• satisfies all requirements for task processing satisfies all requirements for task processing (e.g. performance, data throughput, reliability, etc.)(e.g. performance, data throughput, reliability, etc.)• requires minimal hardware (I.e. logic gates) requires minimal hardware (I.e. logic gates)

Page 13: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Virtual Hardware Objects - the resource baseVirtual Hardware Objects - the resource base of reconfigurable computing systemof reconfigurable computing system

• For FPGA-based systems all architecture components For FPGA-based systems all architecture components (resources) can be presented as Virtual Hardware Objects (resources) can be presented as Virtual Hardware Objects (VHOs) described in one of the hardware description (VHOs) described in one of the hardware description languages (for example VHDL or AHDL)languages (for example VHDL or AHDL)

• Each resource can be presented in different variants – Each resource can be presented in different variants – Ri,jRi,j, where , where i – i – indicates the type of resource (adder, indicates the type of resource (adder, multiplier, interface module, etc.) and multiplier, interface module, etc.) and j- j- indicates variant indicates variant of resource presentation in the architecture (for example: of resource presentation in the architecture (for example: 8-bit adder, 16-bit adder, etc.).8-bit adder, 16-bit adder, etc.).

Page 14: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Concept of Architecture Configuration Graph (ACG)Concept of Architecture Configuration Graph (ACG)

Adder

Multiplier

Adder Adder

Bus Bus Bus Bus Bus Bus

1 2 3 4 5 6 7 8 9 10 11 12

Page 15: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Architecture Configurations Graph arrangementArchitecture Configurations Graph arrangement

Local arrangement of Local arrangement of variants for each type of variants for each type of system resources system resources

Adder

40 nS 20 nS

Processing Processing timetime

Architecture graph partial arrangement requires twoArchitecture graph partial arrangement requires two procedures: 1. Local arrangement and procedures: 1. Local arrangement and 2. Hierarchic arrangement 2. Hierarchic arrangement

Page 16: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Multiplier

Adder Adder Adder

80nS 40nS 20nS

1 2 3 4 5 6

40nS

120 100 80 60 60 40

Adder

Multiplier Multiplier

1 2 3 4 5 6

40nS 20nS

20nS 120 80 60 100 60 40

20nS 80nS

Hierarchical arrangement of system resourcesHierarchical arrangement of system resources

Arrangement criteria - Arrangement criteria - K(Ri )K(Ri ) = [ = [ T max(Ri) T max(Ri) - - Tmin (Ri)Tmin (Ri) ] / (m] / (mii - 1) - 1)

120 120 - - 60 60 120120 - - 100 100 K(Mult)= ----------- =30 > K(Adder)= ------------ = 20 K(Mult)= ----------- =30 > K(Adder)= ------------ = 20 3 - 1 2 - 13 - 1 2 - 1

Page 17: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Multiplier

Adder Adder Adder

80nS 40nS 20nS

1 2 3 4 5 6 120 100 80 60 60 40

20nS40nS

Selection of Group Processor architecture Selection of Group Processor architecture based on the arranged ACGbased on the arranged ACG

Required processing time Required processing time

for the task Y = A* X + Bfor the task Y = A* X + B

is T is T << 80 nS 80 nS

Required performance

Required performance

GP-architecture = GP-architecture = = Multiplier (#2) + Adder (#1)= Multiplier (#2) + Adder (#1)GP-architecture = GP-architecture = = Multiplier (#2) + Adder (#1)= Multiplier (#2) + Adder (#1)

Page 18: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Number of experiments for GP-architectureNumber of experiments for GP-architectureselectionselection

N (GP opt )= ( n + 1 ) + log N (GP opt )= ( n + 1 ) + log 2 2 (m(m 1 1 * m * m 22 *...m *...m n n ) ) nn - number of resources (VHO) included in the- number of resources (VHO) included in the architecture of the Group Processorarchitecture of the Group Processor

mm ii - - number of variants of each type of resourcesnumber of variants of each type of resources

Example: If Example: If nn= 16 and = 16 and mm1 1 = m = m 2 2 = … m = … m nn = 32= 32Total number of experiments (task run on estimated Total number of experiments (task run on estimated

GP-architecture) GP-architecture) N (GP opt) = 16 + 1 + 16 *5 = 97N (GP opt) = 16 + 1 + 16 *5 = 97

Page 19: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Self-adaptation mechanism for FPGA-based Self-adaptation mechanism for FPGA-based reconfigurable data-flow computing systems reconfigurable data-flow computing systems

Reconfigurableplatform

Data Source

PerformanceAnalyzer

Host - PCHost - PC

Architecturegenerator

Architecturegenerator

Configuration BusConfiguration BusLibrary of Virtual Hardware ObjectsObjects

Library of Virtual Hardware ObjectsObjects

ArchitectureSelector

ArchitectureSelector

Page 20: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

First prototype of Adaptive ReconfigurableFirst prototype of Adaptive ReconfigurableGroup Organized (ARGO) computing platformGroup Organized (ARGO) computing platform

Page 21: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Input Data Streem - MPEG 2

Synchro-Signal Detect

PCR - detection

Null-packet analysis & removing

Output frequency adjustment

PCR re-stamping

Reference Frequency

Data Flow Graph for DVB MPEG2 processingData Flow Graph for DVB MPEG2 processing

Output MPEG 2 data stream

Page 22: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Mode # Packet Length(bits) Output frequency (Mbit/S) Architec. Selection time (mS)1 188 28 115.22 188 30 162.83 188 34 205.94 204 28 129.25 204 30 187.26 204 34 253.1

Architecture selection time for 6-mode DVBArchitecture selection time for 6-mode DVBMPEG 2 stream processorMPEG 2 stream processor

1. Average time for each architecture configuration- 1. Average time for each architecture configuration- 7.18 mS7.18 mS2. Average time for GP-architecture selection 2. Average time for GP-architecture selection (for the specific mode) - (for the specific mode) - 175.6 175.6 mSmS3.Total time for architecture selections for all modes-3.Total time for architecture selections for all modes-1.054 1.054 SS

Page 23: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Input Data -MPEG 2 stream

Synchro-Signal Detect

PCR - detection

Null-packet analysis & removing

Output frequency adjustment

PCR re-stamping

Reference Frequency

Output MPEG 2 data stream

FU #1 (8 bit In- port)FU #1 (8 bit In- port)

Virtual bus (16 lines)Virtual bus (16 lines)

FU # 1FU # 1

FU # 2FU # 2

FU #2 Out-portFU #2 Out-port

Hardware implementation of DVB MPEG 2 Hardware implementation of DVB MPEG 2 stream processor for mode 1 and 4stream processor for mode 1 and 4

Page 24: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

Input Data -MPEG 2 stream

Synchro-Signal Detect

PCR - detection

Null-packet analysis & removing

Output frequency adjustment

PCR re-stamping

Reference Frequency

Output MPEG 2 data stream

FU #1 (8 bit In- port)FU #1 (8 bit In- port)

Virtual bus (16 lines)Virtual bus (16 lines)

FU # 1FU # 1

FU # 2FU # 2

FU # 3 Out-portFU # 3 Out-portFU # 3FU # 3

Hardware implementation of DVB MPEG Hardware implementation of DVB MPEG stream processor for modes 2, 3, 5 and 6 stream processor for modes 2, 3, 5 and 6

Page 25: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical

SummarySummary

1. Adaptive Reconfigurable Group Organized (ARGO) parallel 1. Adaptive Reconfigurable Group Organized (ARGO) parallel computing system - FPGA-based configurable system with ability for computing system - FPGA-based configurable system with ability for adaptation on the task algorithm / data structure.adaptation on the task algorithm / data structure.

2. ARGO -system allows parallel processing of different data-flow 2. ARGO -system allows parallel processing of different data-flow tasks on the dynamically configured Group Processors (GPs), where tasks on the dynamically configured Group Processors (GPs), where each GP-architecture configuration corresponds to the algorithm / data each GP-architecture configuration corresponds to the algorithm / data specifics of the task assigned to this processor.specifics of the task assigned to this processor.

3. Above principles allows development of cost-effective parallel 3. Above principles allows development of cost-effective parallel computing systems with programmable performance and reliability computing systems with programmable performance and reliability with minimum cost of hardware components and development time. with minimum cost of hardware components and development time.