smart art) [compatibility mode]

Upload: yang-bo

Post on 06-Apr-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Smart Art) [Compatibility Mode]

    1/25

    Bo Yang, Liang Guang, Tero Sntti, Juha Plosila

    Parameter-Optimized SimulatedAnnealing for Application

    Mapping on Networks-on-Chip

  • 8/3/2019 Smart Art) [Compatibility Mode]

    2/25

    Outline

    Introduction

    Application Mapping

    Implementation of SA

    Nelder-Mead Simplex Method

    Experiment and Analysis

    Conclusion

  • 8/3/2019 Smart Art) [Compatibility Mode]

    3/25

    Introduction

    Moores Law is still valid (ITRSs perspective)

    What could we do with billions of transistors?

    Tens to hundreds of cores on a single chip

    80-core Intel Terascale Chip

    Tilera TILE-Gx Family with 16 to 100 processing cores

    ...

    Intel: Why a 1,000-core chip is feasible. ZDNet, 2010

    Manycore architecture has become the mainstreamfor parallel commputing

  • 8/3/2019 Smart Art) [Compatibility Mode]

    4/25

    Major Concern

    Communication, instead of computation

    Great impact on perfomance and energy consumption

    Conventional bus, point-to-point connections-bottleneck

    Networks-on-Chip (NoC)

    Better scalability

    Higher reliability

    More reusability

    Introduction

  • 8/3/2019 Smart Art) [Compatibility Mode]

    5/25

    Application

    a set of concurrent tasks

    modeled by the communication weighted graph (CWG)

    Many-core NoC

    a set of tiles and links

    Modeled by the computation and communication resource graph(CCRG)

    Application Mapping

  • 8/3/2019 Smart Art) [Compatibility Mode]

    6/25

    The role is to determine how to place each task on a tile of theNoC so that the specific design interests and costraints arefulfilled.

    Objective: mapping solution to minimize the communicationenergy consumption

    Application Mapping

  • 8/3/2019 Smart Art) [Compatibility Mode]

    7/25

    Energy model of NoC [Jingcao2005]

    Energy consumped by one communication

    where : data volume transferred from task i to j

    : distance of communication channel from node i

    to j on the NoC and : energy consumed by switch and link

    for transferring one bit of data on the NoC

    Application Mapping

  • 8/3/2019 Smart Art) [Compatibility Mode]

    8/25

    Objective Formulization

    Communication energy consumption of an application

    Given constants and , Eapp is linearly proportional to

    the product of and of all communications .

    Weghted Communication of an Application (WCA)

    The objective of the application mapping is to findthe optimal solution with minimal WCA.

    Application Mapping

    SmallerWCA

    Bettersolution

  • 8/3/2019 Smart Art) [Compatibility Mode]

    9/25

    NP-hard problem

    to map m tasks on n cores (m n )

    possible solutions

    search space increases exponentionally withproblem size m and n

    Exhaustive search is impossible. (e.g., n =m =25,

    25!1.55e25 ) Heuristic search including Simulated Annealing

    (SA), Tabu Search (TS), Greedy Incremental (GI),etc.

    Application Mapping

    )!1(! mn

    n

  • 8/3/2019 Smart Art) [Compatibility Mode]

    10/25

    Pro and Con

    Be able to find global optima

    Numerous computions and evaluations-long runtime

    Parameters and Functions

    Initial temperature T0 Final temperature T f Cost fucntion Cost(S)

    Temperature function T e m p ( i ) Acceptance function Accept ( C, T)

    Termination function Term ina te ( i , R)

    Move function Move( S, T)

    etc.

    Simulated Annealing

  • 8/3/2019 Smart Art) [Compatibility Mode]

    11/25

    Cost function Cost(S)

    Cost(S) = WCA of solution S

    Temperature function T e m p ( i )

    i :

    # of iterations,q: cooling ratio

    L: # of iterations at each temperature

    Simulated Annealing

    LiqTiTemp 0)(

  • 8/3/2019 Smart Art) [Compatibility Mode]

    12/25

    Acceptance function Accept ( C, T)

    C0: initial cost, C : cost difference

    K: normalized ratio

    Termination function Term ina t e ( i , R)

    Move function Mov e( S, T)

    Single random swapping

    A task in current solution is randomly selected and swapped to arandomly selected tiles to generate a new solution

    Simulated Annealing

    ZNRRTiTemp Cf 0max)(

    )exp(1

    1

    0

    ()TKC

    Cprobrandom

  • 8/3/2019 Smart Art) [Compatibility Mode]

    13/25

    Initial temperature T0 and final temperature T f Solve the acceptance function for T

    T0 and T f can be derived by:

    p r ob 0:probability of accepting Cm ax at temperature T0 p r ob f:probability of accepting Cm in at temperature T f

    Simulated Annealing

    )11ln(

    0

    00

    max

    probKC

    CT

    )11ln(0

    probKC

    CT

    )11ln(0

    min

    fprobKC

    CfT

  • 8/3/2019 Smart Art) [Compatibility Mode]

    14/25

    To summary, we need to determine parameters:

    q : colling ratio

    K: normalizing ratio

    p r ob 0: probability of accepting Cm ax at temperature T0 p r ob f: probability of accepting Cm in at temperature T f Cm a x , Cm in : coputed using a finite number of trial moves

    Considerations on parameter selection

    problem-specific

    jointly afftect the performance of SA

    The set of parameters should be selected in a systemway, instead of being set mannually and independently.

    Simulated Annealing

  • 8/3/2019 Smart Art) [Compatibility Mode]

    15/25

    Method for minimization of a function f ( p )

    Proposed by Nelder et.al in 1965

    f ( p ) : function with n variables x1, x2, , xn n + 1 points form the initial simplex, each point p

    kis a n-

    tuple (xk1, xk2, , xkn)

    Sort the n + 1 function values so that f ( p 0 ) f ( p 1 ) f ( p n )

    To get the minimum off ( p ) , in each iteration:

    a new simplex is formed:either by replacing the point p n with

    the re fe lec t i on po in t , the expans ion po in t or thecon t r act i on po in t , or by updating all points when thepreceding replacements failed.

    Sort the n + 1 function values of points in the new simplex andcontinue the process

    Nelder-Mead Simplex Method

  • 8/3/2019 Smart Art) [Compatibility Mode]

    16/25

    The process terminates until f ( p 0) , f ( p 1) , , f ( p n)converge to one value which is the approximation ofthe mimimum value of function f ( p )

    For more detail, refer J.A.Nelder and R.Mead. Asimplex method for function minimization.

    Nelder-Mead Simplex Method

  • 8/3/2019 Smart Art) [Compatibility Mode]

    17/25

    Parameter-Optimized SA (POSA) algorithm

    Variables: q, K , p r ob 0 and p r ob f Initial simplex:5 initial points consisting of selected

    values of 4 variables

    SA algorihm applies each set of parameters of onepoint and finds one mapping solution

    The WCA of the mapping solution found by the SAalgorithm is defined as the value of function f ( p ) andcompared with others

    The Nelder-Mead method terminates when all 5 pointsconverge to one point which represents the set ofoptimized parameters we try to find

    This set of optimized parameters is then applied to theSA algorithm to find the best mapping solution

    Nelder-Mead Simplex Method

  • 8/3/2019 Smart Art) [Compatibility Mode]

    18/25

    Expereiment and Result

    Setup

    Four applications

    video object plane decoder (VOPD, 16 tasks)

    MPEG4 12 tasks

    multimedia systems application (MMS, 25 tasks)H.264 decoder (H264, 16 tasks)

    Reference work: NoCMap ([Jingcao2005])

    Parameters are set manually

    q:0.9, T0:100, Tf:unbounded

    Exponential form of acceptance function

    Random swapping move function

    Simulator in NoCMap is used to obtain the communication energyconsumption for POSA and NoCMap

  • 8/3/2019 Smart Art) [Compatibility Mode]

    19/25

    Expereiment and Result

    Optimized Parameters

    Application q prob0 probf K

    VOPD 0.91 0.44 0.05 0.72

    MPEG4 0.95 0.34 0.05 0.36

    MMS 0.94 0.36 0.05 0.62

    H264 0.89 0.42 0.05 0.49

    Parameters are problem-specific

    Instead of using identical set of parameters, for different problems, differentsets of parmaters should be applied to the SA algorithm

  • 8/3/2019 Smart Art) [Compatibility Mode]

    20/25

    Expereiment and Result

    Number of I terations

    Application NoCMap POSA POSA/ NoCMap

    VOPD 4.30e6 2.74e4 0.64%

    MPEG4 2.61e6 2.77e4 1.06%

    MMS 1.14e7 1.18e5 1.04%

    H264 1.61e6 1.94e4 1.02%

    Avg. 0.94%

    POSA uses significantly less number of iterations

    On average less than 1% of that in NoCMap

  • 8/3/2019 Smart Art) [Compatibility Mode]

    21/25

    Expereiment and Result

    Runtime of SA (seconds)

    App NoCMap POSANoCMap/ POSA

    POSANoCMap/ POSA

    VOPD 31.69 15.50 2.04 0.087 364

    MPEG4 15.74 9.67 1.63 0.059 267

    MMS 171.74 181.75 0.94 1.17 147

    H264 12.34 11.90 1.04 0.072 171

    Avg. 1.41 237

    POSA includes the runtime of the Nelder-Mead method

    POSA is the runtime of SA applying the optimized parameters

    On average, a 237 times of speedup is achieved

  • 8/3/2019 Smart Art) [Compatibility Mode]

    22/25

    Expereiment and Result

    Weighted Communication (WCA)

    The mapping solution of POSA yields comparable WCA with that of NoCMap

  • 8/3/2019 Smart Art) [Compatibility Mode]

    23/25

    Expereiment and Result

    Energy Consumption(EC)

    Be consistent with the result of WCA

    The mapping solution of POSA yields comparable communication energy consumption withthat of NoCMap

  • 8/3/2019 Smart Art) [Compatibility Mode]

    24/25

    Conclusion

    A method to systematically select the parameters of the SAalgorithm for the application mapping problem is proposed.

    With the set of optimized parameters, significantly less numberof evaluations are processed in the POSA and the SA algorithmis accelerated.

    The accelerated POSA algorithm achieves comparable energyconsumption with NoCMap.

    For the set of benchmarks, the POSA obtains the same qualitymapping solutions while using less than 1% of iterations ofNoCMap and achieving an average of 237 times of speedup.

  • 8/3/2019 Smart Art) [Compatibility Mode]

    25/25

    Thank for your attention!

    Comments and Questions