energy scalability under iso performance analysis of parallel algorithms

Upload: sanddyv

Post on 29-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    1/35

    Analysis of ParallelAlgorithms for Energy

    Conservation in ScalableMulticore Architectures

    Vijay Anand Reddy and Gul Agha

    University of Illinois

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    2/35

    2

    Overview

    Motivation

    Problem Definition & Assumptions

    Methodology & A case study

    Related Work & Conclusion

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    3/35

    3

    Energy and Multi-core

    2% of energy consumed in the US is bycomputers.

    Efficiency = Performance/Watt

    Want to optimize efficiency: Low power processors are typically more

    efficient.

    Varying the frequency at which cores runto balance performance and energyconsumption.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    4/35

    4

    ParallelProgramming

    Parallel programming involves

    1.Dividing computation intoautonomous actors

    2.Specifying interaction (sharedmemory or message passing)between them.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    5/35

    Parallel PerformanceHow many actors may execute at the

    same time: Concurrency Index The number of available cores The speed at which they execute

    How much and when they need tocommunicate: Communicationoverhead

    Network congestion at memory affects performance Performance depends on both the parallel

    application and parallel architecture.

    5

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    6/35

    Scalable MulticoreArchitectures

    We are interested in (energy) efficiency asthe number of cores are scaled up

    Can multicore architectures be scaled up?

    6

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    7/35

    7

    Performance Vs Number ofcores

    Taken from IEEE spectrum Magazine (Sandia Research Labs)

    Increasing Cores may not benefit parallel programmingapplications if shared memory is maintained.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    8/35

    8

    Message Passing, Performanceand Energy Consumption

    Parallel programming involves messagepassing between actors.

    Increasing the number of cores:

    Leads to an increase in the number ofmessages communicated between them.

    Increasing cores may reduce performance.

    May lead to increased energy consumption.

    Depends on the parallel application andarchitectural parameters.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    9/35

    9

    Energy versus Performance For a fixed performance target, increasing

    cores may decrease the energy consumedfor computation: Cores can be run at lower frequency

    But increasing cores will also increase theenergy consumed for communication.

    Question: what is the trade off?

    Depends on the parallel application. Depends on the network architecture. Depends on the memory structure at each

    core.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    10/35

    10

    Energy Scalability under Iso-Performance

    Given a parallel algorithm, an architecturemodel and the performance measure,what is the appropriate number of cores

    required for minimum energy consumptionas a function of input size? Important for response time in interactive

    Applications.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    11/35

    11

    Simplifying ArchitecturalAssumptions

    All cores operate at the same speed.

    Speed of cores can be varied by frequencyscaling.

    Computation time of the cores can bescaled (by controlling the speed), but notcommunication time between cores.

    Communication time between cores isconstant.

    No memory hierarchy at the cores.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    12/35

    12

    Energy Model

    Energy:

    E = Ec (number of cycles) T X3

    Where E

    cis hardware constant

    X is the frequency of the processor

    Running TimeT = (number of cycles) (1/X)

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    13/35

    13

    Constants

    Em: Energy Consumed per message.

    F : Maximum frequency of a core.

    N : Input Size.

    M : Number of cores.

    Kc: Number of cycles at max frequency for

    single message communication time.

    Pidle

    : Static power consumed per unit of

    time.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    14/35

    14

    Case Study: Adding NNumbers

    Example N numbers 4 Actors

    N/4 additions

    1 2 3 4

    Communication period

    In the end, actor 1 stores the sum of all N numbers

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    15/35

    15

    Methodology

    Step 1: Evaluate the

    critical path of the parallelalgorithm

    1 2 3 4

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    16/35

    16

    Methodology

    Step 1: Evaluate the

    critical path of the parallelalgorithm

    Step 2: Partition the criticalpath based on

    communication andcomputation steps.

    1 2 3 4

    Computation

    Communication

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    17/35

    17

    Methodology

    Step 3: Scale computationsteps so that the parallelperformance matches the

    sequential performance.

    F = F (N/M 1 + log(M))

    N log(M) Kc

    where is the number of cycles per addition

    1 2 3 4

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    18/35

    18

    Methodology

    Step 3: Scale computationsteps so that the parallelperformance matches the

    sequential performance.

    Step 4: Evaluate number ofmessages sends in theparallel algorithm

    1 2 3 4

    M 1 Messages

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    19/35

    19

    Methodology Step 5: Frame an equation for energy

    consumption for the parallel application Energy for communication

    Ecomm = Em (M - 1)

    Energy for computation

    Ecomp = Ec (N - 1) F2

    Energy for Idle Computation (static power).

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    20/35

    20

    Methodology

    Step 6 : Analyze the equation to obtainappropriate number of cores requiredfor minimum energy consumption as a

    function of input size. Differentiate w.r.t. the number ofcores.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    21/35

    21

    Plot: Energy-N-M

    = 1

    Kc = 5 units

    Em / (Ec F2

    )= 500

    Ps /F = 1

    270 cores at

    N =1010

    70 cores atN = 108

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    22/35

    Sensitivity Analysis ( k = Em / (Ec

    F2 ))

    22

    As k increases ,optimal number ofcores decreases.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    23/35

    Nave Quicksort Assume input array is on a single core.

    A single core partitions an array and sendspart of it to another core.

    Recursively divide the array until all thecores are used (assume static division).

    Merge the numbers.

    23

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    24/35

    Nave Quicksort Analysis

    24

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    25/35

    25

    Case Study: Nave Quicksort

    Energ

    y

    No: of Cores

    Inp

    ut

    Siz

    e

    No Tradeoff: Single Core is good enough

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    26/35

    Parallel Quicksort

    Data to be sorted is distributed across thecores (assume parallel I/O).

    A single pivot is broadcast to all cores.

    Each core partitions its own data Data is moved so that the lessers are at

    cores in one region, and greaters are in

    another. Recursively quicksort each region.

    26

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    27/35

    Parallel Quicksort Analysis

    27

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    28/35

    Parallel Quicksort Algorithm

    28

    C i Q i k t

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    29/35

    Comparing QuicksortAlgorithms

    Recall: Parallel Quicksort has scalabilitycharacteristics under performance iso-efficiency compared to that of Nave

    Quicksort. (Vipin. et al.) Both Quicksort algorithms have similar

    bad energy scalability under Iso-performance characteristics.

    29

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    30/35

    LU Factorization

    Given an N x NmatrixA, find a unit lowertriangular matrix L and an upper triangularmatrix U, such thatA = L U

    Use the coarse-grain 1-D column parallelalgorithm

    30

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    31/35

    LU Factorization Analysis

    31

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    32/35

    32

    Case Study : LU Factorization

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    33/35

    33

    Related Work Hardware Simulation Based Technique (J. Li and

    J.F. Martinez) Runtime adaptation technique (online) Goal: Find the appropriate frequency and number of

    cores for power efficient execution.

    Search space: O(L M), where L is the number ofavailable frequency levels and M is the number of cores.

    Prediction Based Technique (Matthew et.al) Performance prediction model with low runtime

    overhead: Dynamically adjust L and M . Statistically analyzes samples of hardware rate events

    (collected from performance monitors). Based on profiled data collected from real work load

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    34/35

    34

    Conclusion and Future work Theoretical methodology has been

    proposed to evaluate the Energy-performance tradeoffs for parallelapplications on multi-core architectures as

    a function of input size. We plan to analyze various genre of

    parallel algorithms for Energy-performance trade offs.

    We also plan to build on this methodologyto consider various memory structures forenergy analysis of parallel applications.

  • 8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms

    35/35

    35

    References

    [1]. Introduction to Parallel Computing byVipin Kumar et al.

    [2]. Dynamic Power-Performance Adaptation

    of Parallel Computation on ChipMultiprocessor, J. Li and J.F. Martinez,2006

    [3]. Prediction Models for Multi dimensionalPower-Performance Optimization on ManyCores. Matthew et al., 2008