high-level power simulation for dvs-aware processors traineeship under supervision of: prof. h....
Post on 21-Dec-2015
214 views
TRANSCRIPT
High-level Power Simulation for DVS-aware
Processors
Traineeship under supervision of:Prof. H. CorporaalM.Sc. S.V. Gheorghita
by Hans Giesen
2 / 28High-level Power Simulation for DVS-aware Processors
18-04-23
Overview
• Introduction
•Design
• Implementation
•Experiments
•Future Work
3 / 28High-level Power Simulation for DVS-aware Processors
18-04-23
Overview
• Introduction
•Design
• Implementation
•Experiments
•Future Work
18-04-23 Introduction 4 / 28
DVS principle
•Energy depends quadratically on supply voltage
2NVE
V
V
VVf T
•Clock frequency depends almost linearly on supply voltage
18-04-23 Introduction 5 / 28
Real-time embedded systems
Deadline
Time
IdlePow
erP
ower
Time
•Real-time systems have timing constraints
18-04-23 Introduction 6 / 28
Simulation toolset
Power simulator DVS simulator
Cycles
Combine
Schedule
Remove
Deadline
Calculate
Max
Statistics
Sample
18-04-23 Introduction 7 / 28
Simulation levels
System-level analytical models
Abstract performance simulation
Instruction set simulation
Cycle-accurate simulation
HDL / RTL simulation
Synthesis
Low
LowHigh
High
Acc
urac
y / D
etai
l
Sim
ulat
ion
spee
d
8 / 28High-level Power Simulation for DVS-aware Processors
18-04-23
Overview
• Introduction
•Design
• Implementation
•Experiments
•Future Work
18-04-23 Design 9 / 28
XTREM power simulator
XTREM
ARM binaryArchitectureinformation
Input
Performance +energy trace
Output
18-04-23 Design 10 / 28
Intel XScale architecture• Based on ARM architecture
• Used in e.g. Intel PXA255 and PXA270 processors
18-04-23 Design 11 / 28
Simple tracefile exampleCycle PC IPC DEC BTB
0 02000100 0 6.0e-16
200000 02002164 0.34 0.034 0.024
400000 0200210C 0.36 0.040 0.028
600000 020021B4 0.37 0.040 0.028
800000 020002A0 0.38 0.040 0.028
1000000 020021C0 0.38 0.040 0.028
1200000 0200217C 0.38 0.040 0.028
18-04-23 Design 12 / 28
Problems for DVS simulation
•Trace is only valid for one combination of frequency and voltage
•Sample periods have fixed length
18-04-23 Design 13 / 28
period Sample
xxxxxxxxxxxx
Adapting XTREM
N
P1
N
P4
N
P2
Code
N
P3
period Sample
xxxxxxxxxxxxxx
1
111 ,
N
VfP 4
444 ,
N
VfP 2
222 ,
N
VfP
Code
Mark Mark Mark
3
333 ,
N
VfP
Before adaptation:
After adaptation:
18-04-23 Design 14 / 28
DVS simulator
DVS simulator
Architectureinformation
DVS algorithmTrace from
XTREM
Performance +energy trace
18-04-23 Design 15 / 28
DVS simulator
t
E1
1
t
E
a
a
t
E
2
2
t
E
Code
3
3
t
E
DVS DVS
a
a
t
E
1
1
V
f
2
2
V
f
3
3
V
f
1
111 ,
N
VfP 2
222 ,
N
VfP 3
333 ,
N
VfP
16 / 28High-level Power Simulation for DVS-aware Processors
18-04-23
Overview
• Introduction
•Design
• Implementation
•Experiments
•Future Work
18-04-23 Implementation 17 / 28
Deriving power formulas
Before:double senseamp_power(int cols)
{
return((double) cols * Vdd / 8 * .5e-3);
}
After:double senseamp_power(int cols)
{
return((double) cols / 8 * .5e-3);
}
= cV
= c
18-04-23 Implementation 18 / 28
Deriving power formulasBefore:power->btb_datapower =
ram_decoder_power(logtwo(rowsb), 2) + ram_wordline_power(rowsb, colsb, 1, 1, CACHE) + BTB_DATA_BITLINE_AF * ram_bitline_power(rowsb, colsb, 1, 1, CACHE) +senseamp_power(colsb);
After:power->btb_datapower_fV2 =
ram_decoder_power(logtwo(rowsb), 2) + ram_wordline_power(rowsb, colsb, 1, 1, CACHE) + BTB_DATA_BITLINE_AF * ram_bitline_power(rowsb, colsb, 1, 1, CACHE);
power->btb_datapower_V = senseamp_power(colsb);
cfV2
cfV2
cfV2
cV
18-04-23 Implementation 19 / 28
Power formulas
652
4
322
1
1
21
1
21
1
22
1
22
1
22
1
22
1
22
1
22
1
22
1
21
cVcfVc
cVcfVcc
fVc
c
fVc
c
VcfVc
VcfVc
VcfVc
VcfVc
VcfVc
VcfVc
VcfVc
fVc
Clock
unit controlMemory
busmemory Internal
unitr accumulato-Multiplier
unitShift
unit logic-Arithmetic
cache Data
cache nInstructio
fileRegister
buffer Pend
buffer Write
buffer Fill
buffertarget Branch
decoder nInstructio
formula ofFormat unit Functional
18-04-23 Implementation 20 / 28
Example with DVS marks
#include <stdio.h>#include "DVS.h"
int main(){ DVS("Deadline=%u RWEC=%u", 0.026, 3425256); puts("This is a piece of code"); DVS("RWEC=982428"); puts("This is another piece of code"); DVS("%s=%u", "Deadline", 0);}
18-04-23 Implementation 21 / 28
DVS marks
C source of simulated program
void DVS(const char *iFormat, …)
int syscall(int number, …)
XTREM system call interface (syscall.c)
XTREM tracefile output (xtrem.c)
swi instruction
DVS_parameters variable
Call
Call
22 / 28High-level Power Simulation for DVS-aware Processors
18-04-23
Overview
• Introduction
•Design
• Implementation
•Experiments
•Future Work
18-04-23 Experiments 23 / 28
Experiments
•Comparison of total values of original XTREM and adapted XTREM
•Simulation of MP3 decoder– 20 DVS marks– 3 DVS algorithms– 4 Test files
18-04-23 Experiments 24 / 28
DVS algorithms•Constant algorithm
– Similar to using no DVS
•Worst Case Execution Path (WCEP) algorithm– At each DVS mark the lowest f and V
calculated for which the deadline is still reached in all cases
•Oracle algorithm– f and V are calculated using the execution
path that must be known in advance
18-04-23 Experiments 25 / 28
Frequency graph
0,00E+00
5,00E+07
1,00E+08
1,50E+08
2,00E+08
2,50E+08
3,00E+08
3,50E+08
4,00E+08
4,50E+08
5,00E+08
0 0,01 0,02 0,03 0,04 0,05 0,06 0,07
Time (s)
Fre
qu
ency
(H
z)
Constant
WCEP
Oracle
Deadline Deadline
18-04-23 Experiments 26 / 28
Power graph
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
0 0,01 0,02 0,03 0,04 0,05 0,06 0,07
Time (s)
Po
wer
(W
)
Constant
WCEP
Oracle
Deadline Deadline
18-04-23 Experiments 27 / 28
Total energy consumption
Algorithm
Energy (J)
Energy saved (%)
Constant
345.5 0.0
WCEP 334.3 3.2
Oracle 310.9 10.0
18-04-23 Experiments 28 / 28
Power distribution
DEC14,6%
BTB5,5%
MM28,6%
CLK15,4%
I$8,6%
MEM10,3%
D$6,3% FB
0,0%PB
0,0%
WB1,9%
SHF0,0%
MAC0,0%
REG2,7%
ALU5,9%
29 / 28High-level Power Simulation for DVS-aware Processors
18-04-23
Overview
• Introduction
•Design
• Implementation
•Experiments
•Future Work
18-04-23 Future Work 30 / 28
Future Work
•Other DVS algorithms
•Add more variables to power formulas
• Improve accuracy of power simulator