madness - cordis...madness deliverable-6.6-v1.0 page 4 of 14 public 2.3 interrupt generation support...

35
Contract no. 248424 FP7 STREP Project MADNESS Methods for predictAble Design of heterogeNeous Embedded System with adaptivity and reliability Support D6.6: Prototype integrating the system level design framework for adaptive MPSoC Due Date of Deliverable 31 st December, 2012 Completion Date of Deliverable 31 st December, 2012 Start Date of Project 1 st January, 2010 - Duration 36 Months Lead partner for Deliverable UL Revision: v1.0 Project co-funded by the European Commission within the 7th Framework Programme (2007-2013) Dissemination Level PU Public X PP Restricted to other programme participants (including Commission Services) RE Restricted to a group specified by the consortium (including Commission Services) CO Confidential, only for members of the consortium (including Commission Services)

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

Contract no. 248424

FP7 STREP Project

MADNESSMethods for predictAble Design of heterogeNeous Embedded

System with adaptivity and reliability Support

D6.6: Prototype integrating the system leveldesign framework for adaptive MPSoC

Due Date of Deliverable 31st December, 2012

Completion Date of Deliverable 31st December, 2012

Start Date of Project 1st January, 2010 - Duration 36 Months

Lead partner for Deliverable UL

Revision: v1.0

Project co-funded by the European Commission within the 7th Framework Programme (2007-2013)

Dissemination Level

PU Public X

PP Restricted to other programme participants (including Commission Services)

RE Restricted to a group specified by the consortium (including Commission Services)

CO Confidential, only for members of the consortium (including Commission Services)

Page 2: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

Contents

1 Introduction 2

2 Hardware platform 3

2.1 Programming model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Message Passing support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Interrupt generation support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Fault tolerance support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4.1 Task migration hardware module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Prototype platform template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Middleware layer 7

3.1 PPN communication API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Process migration mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 DSE-based run-time remapping policies 10

4.1 Generation of alternative application mappings . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Selection of application mapping at run-time . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.3 Example of process migration scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Attachment 13

MADNESS Deliverable-6.6-v1.0 Page 1 of 14

Page 3: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

1. Introduction

This document describes the prototype that was developed to integrate the outcome of WP5 and

WP6 with the rest of the MADNESS framework. In particular, among all the techniques developed

in WP5 and WP6, we choose the ones that suit better the considered application (H.264 decoder) and

hardware platform features (Virtex-6 FPGA board).

The prototype can be logically divided into three components:

1. the hardware platform (see Section 2);

2. the middleware infrastructure, which allows dynamic run-time remapping of processes (Section 3);

3. the DSE techniques, which make remapping decisions at run-time, given a fault scenario (Section 4).

A complete version of the developed prototype, including hardware and software/middleware sources,

is available online at http://www.madnessproject.org/open-source/dl/adaptivity˙prototype.tar.gz.

MADNESS Deliverable-6.6-v1.0 Page 2 of 14

Page 4: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

2. Hardware platform

The prototype hardware platform is actually an instance of the MADNESS reference platform

described in greater detail in Deliverables D3.1 and D6.1. The MADNESS reference platform is a

distributed-memory tile-based template, in which tiles are interconnected through a Network-on-Chip

(NoC). All the tiles in the prototype are homogeneous and comprise a MicroBlaze processor as CPU.

The communication network is built using an extended version of the the ×pipes-lite library of

synthesizable components [1]. The topology can be completely arbitrary, since it includes a fabric of

routers and links that can be almost entirely customized. Network access points are Network Interfaces

(NI), that are in charge of constructing the packets on the basis of the communication transactions

requested by the cores. NIs, placed at the interface between processing elements and the communication

network, have been extended with support for message-passing communication model. For instance,

a programmable message manager with DMA capabilities is integrated within the NI inside a module

called Network Adapter.

As described in Deliverable D6.3, some modifications at the hardware level have been implemented to

ease the execution of PPN applications on our platform and to perform process migration in a reactive,

efficient way. These modifications are summarized in the following Sections (2.1-2.4).

Figure 2.1: A general overview of an example template instance

2.1 Programming model

Reference primitives implementing message-passing communication are built, according to the general

definition of such model, upon two base functions: send() and receive(). These two primitives are

implemented in C, and interact with the hardware structures described in Section 2.2. According to

the usual message-passing signatures, to send a message with a send(), the programmer has to specify

the address (SendAddress hereafter) inside the private memory that contains the information to be sent

MADNESS Deliverable-6.6-v1.0 Page 3 of 14

Page 5: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

(message data), a tag assigned to the message (SendTag), the size of the transfer (SendDim), and the

ID of the destination processor (or process, in case of multi-context execution in the processing elements

- SendID). The receive() parameters are the tag of the expected message (ReceiveTag), the sender ID

(ReceiveID) and the address where the received message data has to be stored (ReceiveAddress). Two

implementations of the receive() are provided, with blocking and non-blocking behavior.

2.2 Message Passing support

The Network Adapter architecture is depicted in Fig. 2.1 (left side). Both the instruction and data

private memories of the processor have two access ports, in order to allow the processor to keep on

accessing code and data from one instruction and one data port, while, at the same time, the other ports

can be used to directly load/store data from/to the memory in case of message send/receive. In this

way, communication and computation can overlap, potentially leading to a significant speed-up. The

NA integrates a local bus, that, according to the address requested by the processor interface, enables

access to:

the private memory,

a module called DMA message-passing handler (MPH),

a set of performance counters to obtain statistics about the application execution

In the figure, the gray part represents the additional circuitry supporting fault tolerance, that will be

described in Section 2.4.1. The MPH embeds a set of memory-mapped registers that are programmed

by the processor, to control send and receive operations, setting the previously described parameters.

It also includes an address generator in charge of generating the addresses when the private memories

must be accessed from the port reserved for message passing.

When the processor wants to call a send(), the code that implements the primitive stores the required

values into the send-related memory-mapped registers. As soon as the registers are programmed, the

address generator starts to load SendDim words from the memory, starting from address SendAddr, and

propagates them to the NI. The destination address requested for the network transaction is obtained

by the address generator according to the content of SendID, translating the destination process ID into

the network address of the destination processor private memory.

At the other end of the communication, the processor needs to execute a receive() to complete the

transaction. It may happen that the receive() has not been called at the moment the packets composing

the message actually arrive to the destination network node. In this case the message data is stored in

the memory, inside a (configurable) memory buffer reserved for such a purpose. The identification fields

related to the incoming message (sender, tag, buffer address) are stored inside an event file, in order

to enable the receive() primitive to retrieve the message from the memory when it will eventually be

executed. The receive() microcode, as a first step, stores the parameters inside three memory-mapped

registers. Once such registers are programmed, the processor must keep accessing the DMA, scanning

the event file locations, to check if the message under reception is already inside the buffer. In the case

of a match, the processor copies the message data from the buffer to the ReceiveAddress. If the message

is not found in the event file, the processor keeps polling the DMA handler, where a dedicated circuitry

is in charge of comparing the incoming messages with the contents of the three registers. In case of

matching, the message data is stored in memory, directly at the location identified by ReceiveAddress.

In order to allow partial buffer de-fragmentation, the buffer is treated as a list.

MADNESS Deliverable-6.6-v1.0 Page 4 of 14

Page 6: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

2.3 Interrupt generation support

A tag decoder has been instantiated inside the Network Adapter. It is in charge of detecting a set of

pre-determined tag configurations, that are reserved for the purpose of remote interrupt generation. In

case of matching, the tag decoder triggers an interrupt signal that is connected to the processor interrupt

controller. This feature can be used to allow a processor in the system to generate an asynchronous event

on another processor, such as, for example, the initiation of the migration process.

2.4 Fault tolerance support

The MADNESS project focuses on the development of fault tolerance solutions which are not depen-

dent on a technology-related low-level fault model, but rather on technology-abstracting functional-level

error models. The implemented fault tolerance approaches focus on the detection of run-time faults and

on the use of reconfiguration strategies at different levels. In the MADNESS framework, three main

types of components are considered, i.e., processing cores, storage elements, and NoC modules. In the

final prototype we implemented only the solutions proposed for the case of faults on processing cores.

2.4.1 Task migration hardware module

Task migration can be used as a reconfiguration mechanism to let the system survive in presence

of faulty processing cores. However, one fundamental restriction in such a scenario is that the faulty

processor cannot aid in carrying out the migration procedure. As a remedy to this problem, a task

migration hardware (TMH) module is proposed which is responsible for extracting the critical data from

the faulty tile.

As shown in Fig. 2.1, the TMH resides alongside the network adapter of each tile. Upon the detection

of the fault, the TMH initiates the migration procedure with the following actions:

the TMH isolates the faulty processing core;

the TMH notifies the run-time manager (RM), that resides on a fault-free core, that a fault has

been detected;

The rest of the migration procedure is carried out by the RM as described in Section 3.

The main figure of merit adopted when designing this module has been circuit complexity, so as to

guarantee that failure rate will be much lower than the processing core. Moreover, the TMH and the

software-based task migration procedure are loosely coupled such that the modifications to the software-

based task migration procedure in the later stages would not affect the functionality of the TMH, thus

incurring minimal changes to the TMH, if any.

2.5 Prototype platform template

As depicted in the left part of Figure 2.2, the platform instance that was chosen for our final prototype

is a mesh of 2x3 tiles. In the first place, to choose the mesh size, we derived experimentally the size of

the instruction and data memory required to execute at least 3 nodes of the test-case application (H.264

decoder), to provide an adequate level of flexibility to the remapping decisions. This memory requirement

is 128 KB for each instruction and data memory of the tiles. Then, given this memory requirement and

MADNESS Deliverable-6.6-v1.0 Page 5 of 14

Page 7: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

the on-board memory limitation of the Virtex-6 FPGA board, we derived the maximum number of tiles

(6) which could fit into the FPGA.

Figure 2.2: Prototype hardware platform topology (left) and internal tile structure (right).

MADNESS Deliverable-6.6-v1.0 Page 6 of 14

Page 8: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

3. Middleware layer

Each tile of the platform described in Section 2 is endowed with the software stack depicted in

Figure 3.1. The application level resides at the top of the software stack. In MADNESS, applications

are specified using the Polyhedral Process Networks (PPNs) model of computation.

At the bottom of the software stack, the local operating system provides basic functionalities such as

process management (process creation/deletion, setting process priorities) and multitasking capabilities.

The middleware level of the software stack, highlighted in the left part of Figure 3.1, comprises the

three main components listed below:

1. PPN communication API: it provides a set of primitives which allow the execution of applications

modeled as PPNs on NoC-based MPSoC platforms. A brief explanation of the communication

API is given in Section 3.1.

2. Process migration mechanism: the migration mechanism described in detail in Deliverable D6.3

has been extended to emulate the behavior of the Task Migration Hardware; i.e., the steps which

have to be performed on the faulty tile have been reduced to the minimum, such that they can be

executed by a very simple piece of hardware external to the processor. Further details are provided

in Section 3.2.

3. Run-time manager: it runs on a single tile and makes decisions, at run-time, concerning the

adaptation of the system to changing resource availability. The remapping decisions are based on

the DSE techniques described in Section 4.

tile0

tile3

tile1

tile4

tile2

tile5

PPNcommunication

PPNProcesses

Local Operating System

Run-timemanager

Process migration

P1

P2 P3

Middleware

Application(s)

Figure 3.1: Software stack.

3.1 PPN communication API

Based on the programming model described in Section 2.1, the PPN communication API provides

a set of primitives which allow the execution of applications modeled as PPNs on NoC-based MPSoC

platforms. In particular, this API must enforce the semantics of the PPN model of computation over

NoC implementations with no direct remote memory access, as the one considered in MADNESS.

MADNESS Deliverable-6.6-v1.0 Page 7 of 14

Page 9: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

Figure 3.2: Producer-consumer inter-tile communication implementation

Several methods to implement the PPN communication over NoC-based MPSoCs are described in

[2], namely Virtual Connector, Virtual Connector with Variable Rate, and Request-driven. However, in

our final prototype we adopt the Request-driven communication approach as it leads to an easier im-

plementation of the migration mechanism due to the reduced number of synchronization points between

processes.

An example of a PPN producer-consumer processes communicating over a NoC is shown in Fig. 3.2.

In the Request-driven approach, each FIFO buffer of the original PPN graph is split into two buffers,

one on the producer tile and one on the consumer tile. For instance, B1 in the top part of Fig. 3.2 is

split in BP1 on tile1 and BC

1 on tile2. The size of these buffers is set such that, for all channels Bi in the

original graph, BPi = BC

i = Bi. Moreover, the transfer of tokens from the producer tile to the consumer

tile is initiated by the consumer. This means that every time the consumer is blocked on a read at

a given FIFO channel, it sends a request to the producer to send new tokens for that channel. The

producer, after receiving this request, sends as many tokens as it has in its software FIFO implementing

that channel.

3.2 Process migration mechanism

A simple example of a process migration scenario is depicted in Fig. 3.3. The figure shows the tiles

directly involved in the process migration procedure, which are:

- the source tile, namely the tile which runs the process before the migration;

- the destination tile, which is the tile that will execute the process after the migration;

- the predecessor tiles, which runs the predecessor processes of one of the processes mapped on the

source tile;

- the successor tiles, which executes the successor processes of one of the processes mapped on the

source tile.

The migration mechanism described in Deliverable D6.3 required a consistent amount of actions from

the source tile. In the final implementation of our prototype, we decided to reduce as much as possible

the actions required from the source tile. In this way, all these actions can be performed by a simple

Task Migration Hardware (TMH), which is a much more realistic assumption in case of a permanent

fault of the main processor of a tile.

MADNESS Deliverable-6.6-v1.0 Page 8 of 14

Page 10: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

Figure 3.3: Migration scenario

The list of actions required to perform the migration procedure, in the newly devised approach, is

given below.

1. A “fault-triggering” message is sent to the tile in which the fault will be triggered. This emulates

the detection of a fault on the source tile.

2. The TMH of the faulty tile (in Figure 3.3 called source tile) is emulated by a dedicated interrupt

handler and sends a “fault-detected” control message to the Run-time Manager (RM) to notify

the detection of the fault.

3. The RM forwards the “fault-detected” message to all the other tiles, so that all the system is aware

that a system adaptation is taking place.

4. All the predecessor tiles send a “flush-message” to notify that all the communications with the

source tile have been completed (i.e., there are no tokens which are still on their way on the NoC).

5. After receiving a “flush-message” from all the predecessor tiles, the TMH of the source tile sends

the state of its process(es) to the RM.

6. Meanwhile, the RM makes the necessary remapping decisions, based on the DSE techniques de-

scribed in Section 4.

7. The RM forwards the state of each process that has to be migrated to the correct destination tile.

8. The processes can re-start their execution on their destination tiles, after loading the migrated

state.

MADNESS Deliverable-6.6-v1.0 Page 9 of 14

Page 11: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

4. DSE-based run-time remapping policies

When one tile of the platform experiences a permanent fault, the Resource Manager (RM) has to

make a remapping decision. In our prototype, the remapping decision is based on a set of alternative

Pareto-optimal mappings, which are determined using DSE techniques. In the following two sections,

we explain how this set of Pareto-optimal mappings is derived, and how the Resource Manager selects

the next mapping among this set.

4.1 Generation of alternative application mappings

To explore the design space for optimum design points, within MADNESS we adopt a search engine

that utilizes heuristic search techniques, such as multi-objective Genetic Algorithms (GAs). In particular,

the SESAME framework has been extended in order to simulate the PPN application execution on a

NoC communication infrastructure, according to the PPN communication API mentioned in Section 3.

In order to meet the requirement of our prototype, the design space search has to be driven to match

the limitations of the actual system implementation. For instance, due to memory limitations, not every

process can be mapped on every tile. A necessary condition for a process to be executed on a tile is

that a replica of the code of that process has to be loaded in the instruction memory of the tile. For

the prototype described in this document, we have chosen the H.264 decoder as a test application. Its

topology is shown in Figure 4.1. The actual allocation of replicas of the H.264 processes is shown in

Figure 4.2. The allocation of replicas restricts the mapping of a certain process only to a subset of tiles.

For instance, process H0 and H1 can only be mapped to tile0 because there are no replicas on the other

tiles. By contrast, H2 can be mapped on several tiles (0,1,3,4,5).

get_dataparser

cavlc idct deblockintrapred

H0

H2 H3 H4

printMB

H5H1

Figure 4.1: (Simplified) topology of the H.264 application.

4.2 Selection of application mapping at run-time

Once the set of alternative Pareto-optimal mappings has been derived, it can be stored in a simple

data structure accessible by the Resource Manager. At run time, the Resource Manager is responsible

to select the best remapping (from mapping Mi, before fault i, to mapping Mi+1, after fault i) when a

tile experiences a permanent fault.

MADNESS Deliverable-6.6-v1.0 Page 10 of 14

Page 12: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

tile0

tile3

tile1

tile5tile4tile3

tile2

H2

H3 H4H2

H0 H1

H2

H3 H4

H2

H3 H4

H2

H3 H4

H5

H3 H4

RM

Figure 4.2: H.264 process replica allocation. The highlighted replicas correspond to the initial mapping.

The Resource Manager (RM in Figure 4.2) selects an appropriate mapping based on two criteria,

which are listed below.

1. A mapping has to be feasible with the currently available platform resources. After a series of faults,

only a subset of the alternative remappings will be feasible. In particular, only those mappings

which do not involve faulty tiles will be feasible.

2. Among the subset of feasible mappings, the Resource Manager selects the mapping which has the

best performance. If more mappings provide the same performance, the Resource Manager selects

the one with the least Hamming distance from the previous mapping Mi. This choice is done in

order to minimize the run-time cost of migration.

4.3 Example of process migration scenario

Given the initial mapping shown in Figure 4.2, this section describes an example of process migration

scenario, which is depicted in Figure 4.3. In this scenario, tile3 will be halted due to an emulated fault.

The migration is performed in the following steps:

1. A control thread which runs on tile2 waits until a certain amount of time has elapsed, then sends

to tile3 an interrupt-generating message, which emulates a permanent error.

2. The Task Migration Hardware on tile3 is emulated by an interrupt handler; this handler is used to

send the state of process H2, which has to be migrated, to the Resource Manager (on tile2).

3. The resource manager selects the remapping from the table provided by the DSE framework. In

this example, process H2 has to be moved from tile3 to tile4. The current state of H2 is sent to tile4.

4. H2 restarts its execution on tile4.

MADNESS Deliverable-6.6-v1.0 Page 11 of 14

Page 13: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

Figure 4.3: Example of process migration scenario on H.264.

MADNESS Deliverable-6.6-v1.0 Page 12 of 14

Page 14: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

5. Attachment

After the Bibliography, we provide the presentation of Workpackage 6 as shown in the final review

of the MADNESS project.

MADNESS Deliverable-6.6-v1.0 Page 13 of 14

Page 15: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

PUBLIC

Bibliography

[1] M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini, “Xpipes: a Latency Insensitive

Parameterized Network-on-Chip Architecture for Multi-Processor SoCs,” in Proc. of the 21st Int.

Conf. on Computer Design, ser. ICCD’03, Washington, DC, USA, 2003, pp. 536–.

[2] E. Cannella, O. Derin, P. Meloni, G. Tuveri, and T. Stefanov, “Adaptivity Support for MPSoCs

based on Process Migration in Polyhedral Process Networks,” VLSI Design, vol. 2012, no. Article ID

987209, p. 17 pages.

MADNESS Deliverable-6.6-v1.0 Page 14 of 14

Page 16: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS project – www.madnessproject.org

“Though this be madness, yet there is method in't”

Hamlet Act 2, scene 2, 193–206

Work performed in WP6: Support for System Adaptivity

Page 17: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

Madness Framework Overview DAEDALUS

SESAME

ESPAM (parallel application code generation)

HW library

SW/MW library

FPGA-BASED Evaluation platform

Reconfigurable toolchain

System-level description

ASIP HDL

NoC HDL

GPPs HDL MW

support for fault

tolerance

HAL

APIs Parallel

Application code

Memories HDL

Synchronization modules

HW modules for fault

tolerance

Hardware

support for

adaptivity

Adapters HDL

Application binaries

MW

support

for

adaptivity

HDL generation and synthesis Memory initialization

On FPGA execution and performance evaluation

WP 2

WP 3

WP 4

WP 5

WP 6

WP 7 Metrics

for

architectural

optimization

Page 18: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

Madness Framework Overview: WP6 work

HW library

SW/MW library

FPGA-BASED Evaluation platform

Reconfigurable toolchain

System-level description

ASIP HDL

NoC HDL

GPPs HDL MW

support for fault

tolerance

HAL

APIs Parallel

Application code

Memories HDL

Synchronization modules

HW modules for fault

tolerance

Hardware

support for

adaptivity

Adapters HDL

Application binaries

HDL generation and synthesis Memory initialization

On FPGA execution and performance evaluation

WP 2

WP 3

WP 4

WP 5

WP 6

WP 7 Metrics

for

architectural

optimization

DAEDALUS

SESAME (DSE for adaptive MPSoCs)

ESPAM (parallel application code generation)

MW

support

for

adaptivity

Page 19: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

Objective of WP6

• Research and develop system-level design tools for adaptive MPSoCs in order to cope with Quality of Service and/or dependability demands

• In this context provide:

DSE for adaptive MPSoC platforms

Extend the SESAME environment with modeling and simulation techniques that are adaptivity aware

SW infrastructure for efficient Run-time (Re-)mapping of application tasks

Tasks’ structure and code generation supporting task migration

Local run-time task management using lightweight MTOS on every processor

Task migration mechanisms and migration policies

HW infrastructure for efficient system adaptivity

IP components facilitating task migration and monitoring of system activities

IP components which handle task migration in case of a faulty processor

Prototype on the MADNESS NoC-based platform developed in WP2&WP3

Page 20: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

Tasks in WP6

• Task 6.1 - DSE for adaptive MPSoC platforms (completed in 3rd year)

Activity Period: M19 – M36

• Task 6.2 - System-level middleware support for multi-application, adaptive MPSoCs (completed in 2nd year)

Activity Period: M04 – M24

• Task 6.3 - RTL Coding of the hardware IPs providing support for dynamic runtime management (completed in 2nd year)

Activity Period: M04 – M24

• Task 6.4 - Integration of an FPGA-based platform for adaptive MPSoC evaluation (completed in 2nd year)

Activity Period: M13 – M24

• Task 6.5 – Integration of a system-level design framework for adaptive MPSoCs (completed in 3rd year)

Activity Period: M25 – M36

Page 21: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

Task 6.2 : System-level middleware support for adaptive MPSoCs

MADNESS Application Model:

Kahn/Polyhedral Process Network (PPN)

MADNESS MPSoC platform:

Tiles connected via NoC

• MADNESS middleware SW components have been researched and developed:

PPN communication API implements the PPN communication semantics over a NoC

Process migration API supports, e.g.,

communication with migration manager

start, stop, load of migrated process

get and communicate the state of a process

Run-time (Re-)Mapping

Page 22: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

Task 6.3 : Hardware IPs for dynamic runtime management

• Programmable DMA controller for PPN communication

Moves incoming data packets from Network Interface (NI) to SW FIFOs in Local memory

• Interrupt Generator for reactive process migration

Triggers the task migration mechanism in a tile

HW extension of NI generating interrupts when messages with a specific tag are received

• Task Migration Hardware module:

Handles the task migration procedure in case the Processing Element is permanently damaged

• Hardware Counters in NI and inside the switches

Allow run-time monitoring of the communication activity of processors and switches

Page 23: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

Task 6.4 : FPGA-based platform for adaptive MPSoC evaluation

• Develop a simple evaluation platform and application

• Several task migrations have been demonstrated

Processes

P1: initVideoIn + videoIn

P2: DCT

P3: Q

P4: VLE + videoOut

Migration 1 Migration 2 Migration 3

Page 24: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 10

Task 6.1 DSE for adaptive MPSoC platforms

• Detection

• Recovery

• E.g. trade-off checkpoint overhead / restart overhead

• Design options

• Different effects on reliability

• Affects other objectives (like performance, power and costs)

DMR TMR

ASSERT

DMR

(skip)

TMR

(restart)

TMR

(restart)

[1 check / frame]

Fault-tolerance as adaptivity driver: reliability-aware DSE!

Page 25: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 14

Task 6.1 DSE for adaptive MPSoC platforms

Patternization

Page 26: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 15

Task 6.1 DSE for adaptive MPSoC platforms

Blown-up application (2x DMR)

Page 27: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 16

Task 6.1 DSE for adaptive MPSoC platforms

Binding to resources

Page 28: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 17

Task 6.1 DSE for adaptive MPSoC platforms

Page 29: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 18

Task 6.1 DSE for adaptive MPSoC platforms

• Fault-tolerant DSE • Available patterns: DMR, TMR with restart

and checkpoint budget between 0-6

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5

Fra

me D

rop

Ratio

Power

MJPEG

DMR

DMR (Res)

TMR

TMR (Res)

Pareto

Status: paper published at CODES+ISSS’12, and more in the pipeline.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

Fra

me D

rop R

atio

Power

MP3

DMR

DMR (Res)

TMR

TMR (Res)

Pareto

0%

10%

20%

30%

40%

50%

60%

3.5 4 4.5 5 5.5 6

Fra

me D

rop R

atio

Power

Sobel

DMR

DMR (Res)

TMR

TMR (Res)

Pareto

Page 30: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 21

• Run-time task mapping to • cope with changing (demands of) application workloads

• dynamically optimize system’s performance / energy consumption

• Based on application workload scenario’s (WP 4)

• Two-step approach • Static (design-time) scenario-based DSE step

• Dynamic optimization of task mapping

Task 6.1 DSE for adaptive MPSoC platforms

Scenario-based run-time task mapping for MPSoCs

Page 31: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 22

Task 6.1 DSE for adaptive MPSoC platforms

Cluster workload

scenarios

Per-cluster

optimal mapping

Mapping

database

Design-time DSE

Page 32: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 23

Task 6.1 DSE for adaptive MPSoC platforms

Cluster workload

scenarios

Per-cluster

optimal mapping

Mapping

database

Design-time DSE

Scenario cluster detection

Remapping application(s)

Run-time mapping optimization

Monitor applications

Detected performance

problem(s)

Problem analysis (heuristics)

Remap bottleneck task(s)

Run-time mapping

Page 33: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org 25

Task 6.1 DSE for adaptive MPSoC platforms

8.0E+05

1.0E+06

1.2E+06

1.4E+06

OPT FFBP ORB RBPR STM

ion

tim

e (c

ycle

s)

2.4E+07

2.6E+07

2.8E+07

3.0E+07

3.2E+07

total scenario execution time (cycles)

0.0E+00

2.0E+05

4.0E+05

6.0E+05

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

scen

ario

exec

ut

Intra-application scenario ID

FFBP ORB RBPR STM

0

2E+09

4E+09

6E+09

8E+09

total scenario energy consumption (nj)

a. estimated execution time

c. total intra-app scenario execution time

5.0E+07

1.0E+08

1.5E+08

2.0E+08

2.5E+08

3.0E+08

OPT FFBP ORB RBPR STM

ario

ene

rgy

cons

umpt

ion

(nj)

FFBP ORB RBPR STM

4.0E+07

6.0E+07

8.0E+07

1.0E+08

total scenario cost (ns)

d. total intra-app s cenario energy consumption

0.0E+00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33

scen

Intra-application scenario ID

0.0E+00

2.0E+07

FFBP ORB RBPR STM

b. estimated energy consumption e. cost of algorithms (including algorithm

execution cost and task migration cost)

Status: paper accepted at DAC’13, and more in the pipeline.

Page 34: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

• System adaptivity tecniques developed in WP5 and WP6 have been applied on a more complex and industrially-relevant application (DPI+H.264) • PPN communication API extended to

handle cyclic PPNs

• Process migration mechanism has been modified to make it compatible with the TMH implementation • Intensive testing to handle corner cases

Task 6.5 System-level design framework for adaptive MPSoCs

Page 35: MADNESS - CORDIS...MADNESS Deliverable-6.6-v1.0 Page 4 of 14 PUBLIC 2.3 Interrupt generation support A tag decoder has been instantiated inside the Network Adapter. It is in charge

MADNESS Project – www.madnessproject.org

• Development of the final system adaptivity prototype • 2x3 mesh of tiles on a NoC,

MicroBlaze processors

• Integration of DSE techniques in the Resource Manager (RM) remapping policies • RM can select a remapping from a

table of alternative Pareto-optimal mappings

Task 6.5 System-level design framework for adaptive MPSoCs