energy scalability under iso performance analysis of parallel algorithms
TRANSCRIPT
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
1/35
Analysis of ParallelAlgorithms for Energy
Conservation in ScalableMulticore Architectures
Vijay Anand Reddy and Gul Agha
University of Illinois
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
2/35
2
Overview
Motivation
Problem Definition & Assumptions
Methodology & A case study
Related Work & Conclusion
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
3/35
3
Energy and Multi-core
2% of energy consumed in the US is bycomputers.
Efficiency = Performance/Watt
Want to optimize efficiency: Low power processors are typically more
efficient.
Varying the frequency at which cores runto balance performance and energyconsumption.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
4/35
4
ParallelProgramming
Parallel programming involves
1.Dividing computation intoautonomous actors
2.Specifying interaction (sharedmemory or message passing)between them.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
5/35
Parallel PerformanceHow many actors may execute at the
same time: Concurrency Index The number of available cores The speed at which they execute
How much and when they need tocommunicate: Communicationoverhead
Network congestion at memory affects performance Performance depends on both the parallel
application and parallel architecture.
5
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
6/35
Scalable MulticoreArchitectures
We are interested in (energy) efficiency asthe number of cores are scaled up
Can multicore architectures be scaled up?
6
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
7/35
7
Performance Vs Number ofcores
Taken from IEEE spectrum Magazine (Sandia Research Labs)
Increasing Cores may not benefit parallel programmingapplications if shared memory is maintained.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
8/35
8
Message Passing, Performanceand Energy Consumption
Parallel programming involves messagepassing between actors.
Increasing the number of cores:
Leads to an increase in the number ofmessages communicated between them.
Increasing cores may reduce performance.
May lead to increased energy consumption.
Depends on the parallel application andarchitectural parameters.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
9/35
9
Energy versus Performance For a fixed performance target, increasing
cores may decrease the energy consumedfor computation: Cores can be run at lower frequency
But increasing cores will also increase theenergy consumed for communication.
Question: what is the trade off?
Depends on the parallel application. Depends on the network architecture. Depends on the memory structure at each
core.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
10/35
10
Energy Scalability under Iso-Performance
Given a parallel algorithm, an architecturemodel and the performance measure,what is the appropriate number of cores
required for minimum energy consumptionas a function of input size? Important for response time in interactive
Applications.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
11/35
11
Simplifying ArchitecturalAssumptions
All cores operate at the same speed.
Speed of cores can be varied by frequencyscaling.
Computation time of the cores can bescaled (by controlling the speed), but notcommunication time between cores.
Communication time between cores isconstant.
No memory hierarchy at the cores.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
12/35
12
Energy Model
Energy:
E = Ec (number of cycles) T X3
Where E
cis hardware constant
X is the frequency of the processor
Running TimeT = (number of cycles) (1/X)
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
13/35
13
Constants
Em: Energy Consumed per message.
F : Maximum frequency of a core.
N : Input Size.
M : Number of cores.
Kc: Number of cycles at max frequency for
single message communication time.
Pidle
: Static power consumed per unit of
time.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
14/35
14
Case Study: Adding NNumbers
Example N numbers 4 Actors
N/4 additions
1 2 3 4
Communication period
In the end, actor 1 stores the sum of all N numbers
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
15/35
15
Methodology
Step 1: Evaluate the
critical path of the parallelalgorithm
1 2 3 4
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
16/35
16
Methodology
Step 1: Evaluate the
critical path of the parallelalgorithm
Step 2: Partition the criticalpath based on
communication andcomputation steps.
1 2 3 4
Computation
Communication
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
17/35
17
Methodology
Step 3: Scale computationsteps so that the parallelperformance matches the
sequential performance.
F = F (N/M 1 + log(M))
N log(M) Kc
where is the number of cycles per addition
1 2 3 4
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
18/35
18
Methodology
Step 3: Scale computationsteps so that the parallelperformance matches the
sequential performance.
Step 4: Evaluate number ofmessages sends in theparallel algorithm
1 2 3 4
M 1 Messages
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
19/35
19
Methodology Step 5: Frame an equation for energy
consumption for the parallel application Energy for communication
Ecomm = Em (M - 1)
Energy for computation
Ecomp = Ec (N - 1) F2
Energy for Idle Computation (static power).
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
20/35
20
Methodology
Step 6 : Analyze the equation to obtainappropriate number of cores requiredfor minimum energy consumption as a
function of input size. Differentiate w.r.t. the number ofcores.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
21/35
21
Plot: Energy-N-M
= 1
Kc = 5 units
Em / (Ec F2
)= 500
Ps /F = 1
270 cores at
N =1010
70 cores atN = 108
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
22/35
Sensitivity Analysis ( k = Em / (Ec
F2 ))
22
As k increases ,optimal number ofcores decreases.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
23/35
Nave Quicksort Assume input array is on a single core.
A single core partitions an array and sendspart of it to another core.
Recursively divide the array until all thecores are used (assume static division).
Merge the numbers.
23
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
24/35
Nave Quicksort Analysis
24
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
25/35
25
Case Study: Nave Quicksort
Energ
y
No: of Cores
Inp
ut
Siz
e
No Tradeoff: Single Core is good enough
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
26/35
Parallel Quicksort
Data to be sorted is distributed across thecores (assume parallel I/O).
A single pivot is broadcast to all cores.
Each core partitions its own data Data is moved so that the lessers are at
cores in one region, and greaters are in
another. Recursively quicksort each region.
26
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
27/35
Parallel Quicksort Analysis
27
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
28/35
Parallel Quicksort Algorithm
28
C i Q i k t
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
29/35
Comparing QuicksortAlgorithms
Recall: Parallel Quicksort has scalabilitycharacteristics under performance iso-efficiency compared to that of Nave
Quicksort. (Vipin. et al.) Both Quicksort algorithms have similar
bad energy scalability under Iso-performance characteristics.
29
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
30/35
LU Factorization
Given an N x NmatrixA, find a unit lowertriangular matrix L and an upper triangularmatrix U, such thatA = L U
Use the coarse-grain 1-D column parallelalgorithm
30
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
31/35
LU Factorization Analysis
31
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
32/35
32
Case Study : LU Factorization
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
33/35
33
Related Work Hardware Simulation Based Technique (J. Li and
J.F. Martinez) Runtime adaptation technique (online) Goal: Find the appropriate frequency and number of
cores for power efficient execution.
Search space: O(L M), where L is the number ofavailable frequency levels and M is the number of cores.
Prediction Based Technique (Matthew et.al) Performance prediction model with low runtime
overhead: Dynamically adjust L and M . Statistically analyzes samples of hardware rate events
(collected from performance monitors). Based on profiled data collected from real work load
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
34/35
34
Conclusion and Future work Theoretical methodology has been
proposed to evaluate the Energy-performance tradeoffs for parallelapplications on multi-core architectures as
a function of input size. We plan to analyze various genre of
parallel algorithms for Energy-performance trade offs.
We also plan to build on this methodologyto consider various memory structures forenergy analysis of parallel applications.
-
8/9/2019 Energy Scalability Under Iso Performance Analysis of Parallel Algorithms
35/35
35
References
[1]. Introduction to Parallel Computing byVipin Kumar et al.
[2]. Dynamic Power-Performance Adaptation
of Parallel Computation on ChipMultiprocessor, J. Li and J.F. Martinez,2006
[3]. Prediction Models for Multi dimensionalPower-Performance Optimization on ManyCores. Matthew et al., 2008