a fine-grained component-level power measurement method
DESCRIPTION
A Fine-grained Component-level Power Measurement Method. Zehan Cui, Yan Zhu, Yungang Bao , Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011. Outline. Motivation Design & Implementation Experiments Conclusion & Work in Progress. Outline. - PowerPoint PPT PresentationTRANSCRIPT
A Fine-grained Component-level Power Measurement
Method
Zehan Cui, Yan Zhu,Yungang Bao, Mingyu Chen
Institute of Computing Technology, Chinese Academy of SciencesJuly 28, 2011
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Outline
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Outline
Watts/Server
[source: The Problem of Power Consumption in Servers,Intel,2009]
CPU no longer dominates the system power.
Background
[source: Barroso et. al. , The datacenter as a computer, 2009]
Measurement is the basis.
Motivation
Low power
Hardware
Software
model
measurement
Component-Level: ATX-based method
Existing Measurement Method
accuracy
Directly powered through ATX wires.
Modern motherboards mostly have dedicated ATX wires for processor.VRM (Voltage Regulation Module) loss
Usually deduced from multi ATX wires. Platform dependent.
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Outline
CPU
Disk
Power Supply
Disk & CPU◦ Similar to other ATX-based methods
Memory & Add-in Card Devices◦ Wrapper-based methods
Advantages◦ Accurate: direct measurement◦ Easy-to-use: no deduction needed◦ Portable: multi-platform
Our Solution: A Hybrid Way
wrapperMemory
X
Current Sensor
Prototype◦ Disk power◦ CPU power◦ Memory power
Implementation
Component Count DescriptionWrapper Card 1 Memory power measurement.
• Support DDR2-400 DIMM.Intermediate Card
1 8 channels. • A channel is capable of converting one current into voltages.
DMM 2 Agilent 34411A. • One channel each.• Max speed: 50K samples per second. • LAN interface.
Collector 1 PC• Collect data from DMM.
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Outline
Component Detail
CPU Intel Core2 Duo E4500# of Cores: 2Clock Speed: 2.2GHzL2 Cache: 2MBFSB Speed: 800MHz
Memory DDR2-400 2GB UDIMMFrequency: 200MHzMax Bandwidth: 3.2GB/s
Disk 640GB SATA
Experimental Setup
401.bzip2 from SPECCPU2006
An Example
0 10 20 30 40 50 60 700
5
10
15
20
25
30
35
40
45
50
Time from Beginning (unit: Second)
Pow
er o
f Com
pone
nts
(u
nit:
Wat
t)
Memory DiskCPU
More frequently we measure the power, more details we can get.
Time Granularity
Observation:5,000 samples/s is an appropriate sample frequency at
component level.
Graph BFS (Breadth First Search) Higher BW,
but lower Power
Lower BW, Higher Power
Microbenchmark
Time: 6.5 times longer
Power: slightly lower Energy: 5.9 times higher
Malloc 512MB
Access in different strides
Two causes◦ Row conflict◦ Lots of TLB miss
increase row buffer hit rate large page may be more efficient
What is the relationship between performance and power?
64MB memory◦ Random vs. Sequential
Jump at least 64B eliminate cache hit
Large page(2MB) eliminate TLB miss
Load/Sotre_Unit % = LSU_stall_time/CPU_Cycle
Random vs. Sequential
Observation:It seems that DRAM power is already proportional to bandwidth. But the fact is that …
Use different SEEDs to generate different random access patterns;
Power varies less than 1.1%.
Random Access
Observation:DRAM power is highly correlated to two factors• Load/Store Unit Utilization• Sequential / Random
We can build memory power models based on the two factors rather than Bandwidth.
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Outline
We use a hybrid approach ◦ ATX-Based CPU/Disk◦ Wrapper card DRAM/…
5KHz is an appropriate sampling frequency to disclose fine-grain power behavior.
DRAM power is highly correlated to Load/Store Unit Utilization, rather than Bandwidth.
Takeaway Messages (Conclusions)
Upgrade current system◦ Support DDR3◦ Support Large memory capacity◦ Support 40 simultaneous measuring channels
Use FPGA to collect measured data
Correlate the measured power data with high-level semantics information
Work in progress
Thanks!&
Questions?
Backup
Wrapper Card already exists
We only did several small modifications
Wrapper Card Design
Current Sensor
Power Supply Signals
Memory Capacity Limitation
DIMM slot Motherboard
DIMM: Dual-Inline Memory Module
Normal
With our initial wrapper card
Memory Capacity Limitation
DIMM slot Motherboard
DIMM
Wrapper Card
28
Inside a DRAM Device
Bank 0
Sense AmpsColumn Decoder
Row
Deco
der ODT
Recie
ver
sDr
ive
rs
Regi
ster
s
Writ
e FI
FO
Banks• Independent
arrays• Asynchronous:
independent of memory bus speed
I/O Circuitry• Runs at bus speed• Clock sync/distribution• Bus drivers and receivers• Buffering/queueing
On-Die Termination• Required by bus electrical
characteristicsfor reliable operation
• Resistive element that dissipates power when bus is active
[Source: H. David et. al., Memory Power Management viaDynamic Voltage/Frequency Scaling, ICAC, 2011]
Can be approximately divided into◦ Background power
considered to be stable◦ Bank power
active/precharge Related to frequency of row operation
◦ I/O power Burst proportional to bandwidth
◦ Termination power Termination resistors Proportional to bandwidth
DRAM power
Current Sensor
P = U * I
ADCor
DMMCSA
(Current-Sense Amplifier)
DC Voltage
DC Voltage
DC Current
Doesn’t fluctuate too much, less than 2% in our platform.
Collector
(PC)
Data
Possible reason for non-proportional of random power in slide17: ◦ When bandwidth is low, auto-precharge (caused
by refresh) cause every access needs ACTIVE; the bank power is proportional to bandwidth.
◦ When bandwidth is high, some access may hit in the row buffer, which need less ACTIVE; the slope of bank power increase is lower than before.
DRAM power