s calably verifiable dynamic power management opeoluwa matthews, meng zhang, and daniel j. sorin...
TRANSCRIPT
SCALABLY VERIFIABLE DYNAMIC POWER MANAGEMENTOpeoluwa Matthews, Meng Zhang, and Daniel J. Sorin
20th International Symposium on High Performance Computer Architecture (HPCA)
Orlando, Florida, February 17-19, 2014
- Krishnaprasad K and Yashas Krishna
SOME BACKGROUND
Current day biggest problem Power Management
Managing power each Component gets When power is given How system gets power when needed Etc ..
Power management Static Power Management
Pre allocate power to each component Dynamic Power Management
Allocate power when needed Eg : Dynamic Voltage / frequency scaling
PROBLEMS WITH DPM
Designing DPM is Difficult Because of Increasing scale of Computer
Systems Cores / Processor increases Processors /System Increasing
Challenge to efficient DPM: Scalability
Scalable to large-scale systems Verifiability
Verify correctness in all situations Scalability affects Verifiability But no automated methods to Verify DPM
IMPORTANT FACTORS IN DOM
Scalability Factor Scalability proportional to Power Consumption High Scale = High Power Req. Low Scale = Low Power Req.
Verification Of DPM and Benefits Find Bugs in DPM To Prove Correctness of DPM
If not Done : Component Overheat System Failure and Damage
So a Scalably verifiable DPM is needed
CONTENTS
Existing System Model and Issues Introducing new DPM system : Fractal DPM How verification possible in new System ? Fractal DPM vs Performance : Tradeoffs New System Evaluation Implementation Strategy Comparison to Prior works Conclusion
INITIAL SYSTEM MODEL
DPM Model Dynamically allocate power to each component Ci
Power Allotted proportional to Current performance Xi
Xi = function of ( Current power Allocation Pi & Current unconstrained perf (Xmaxi).)
Initial Setting : Set a power Budget Allot power to Components satisfying Budget
Maximize Xi Sum(Pi) < Budget
Power Performance Model 5 possible power settings for each Ci
Low ( L) Medium_Low (ML) Medium (M) Medium_High (MH) High ( H )
INITIAL MODEL : ISSUES
Design Using Existing tools Fully automated Formal verification
Methodologies Tool : MurΦ Model Checker
Exhaustive State Space search Checks Invariant Satisfied or not
Issue : State Space Explosion problems As Ci increase : States Increase
Infeasible to traverse all states For Eg: 5 C and 5 setting means 5^5 states
Typical Solution: Check for small scale and if satisfied , assume Large scale also
satisfies Need not be true always
FRACTAL DPM DESIGN
Fractal Design A design in which system behaves the same at
every scale This makes Inductive verification possible
Base case: Verify that the minimum system satisfies its power constraints
Inductive step: Verify that larger systems are equivalent to smaller systems
Both done Using MurΦ
FRACTAL SYSTEM ORGANIZATION
Hierarchical Structure : Binary tree model Leaves : Computing Resource ( CR ) Intermediate Nodes : DPM Controllers
Records Power states of Child Nodes Handles power requests of CRs
Power Requests CR can request more power
Sending req to DMP controller ( Parent ) DMP Controller Responds
Either directly Or Passing the req to Its parent Controller
A DMP Controller and Its Two Child considered a single “Node” like a Single CR
Each such Node has a combined Power Setting Average of Child Nodes L:R
FRACTAL SYSTEM ORGANIZATION Eg : If Child are H and L , then average is MH L:R format represents power setting of Left child : power
setting of right child
FRACTAL SYSTEM ORGANIZATION
FRACTAL POWER INVARIANT
The Invariant Must be fractal Applicable on all scales of System Plus point of Fractal DPM : makes its unique from
other DPMs
Fractal Invariant It is impossible for both children of a DPM controller
to be at the High power setting at the same time Why?
Good for cases when Sum(Pi) > Budget Limits System Wide power consumption
Limitation Other Invariants are not considered or Compared : Future
Work
FRACTAL DPM : SPECIFICATION Table based specification Method Each entry in the table corresponds to a state/event combination, and the
entry specifies what happens in that situation.
SPECIFICATION CONTINUED
Special States : Pend-*
family of pending states in which the computing resource has requested a new power state and is waiting for a response
Block-* family includes states such as block-L:ML, in which the DPM
controller granted or denied a request to a child and is blocked waiting on the Ack from the child and will then go to state L:ML
Specification Of root DPM Same as Non Root DPM except Root has no parent DPM
to request power No Pending States , Only Block States Non root DPM passes to parent DPM only if :
It handles req by itself ( but Node state unchanged ) 4 Exceptions : Invariant not satisfied
FRACTAL DPM : SCALABILITY ISSUES
When High Scalability Tree height Increase Request from leaves to root take more time
Latency Issues More hops
Possible Solution Multi Degree Tree : Reduces Height of Tree Prob : MurΦ doesn’t support this ; Couldn't verify
Scalability Issues : No big Concern latency of DPM itself is not critical. many requests can be satisfied without traveling far up
the tree Experimental results on a real system (modestly sized
system (16 computing resources)) latencies are reasonable.
VERIFICATION OF FRACTAL DPM
Scalably Verify Verification Effort : Independent of number of CR Steps
Base Case Verification Induction Step Verification
Base Case :Minimum System verification Base system must be complete
Include all basic components Incomplete base system
When some elements not considered Gives incomplete verification : Spurious Actions MurΦ verifies whether Invariants satisfied
BASE CASE :MINIMUM SYSTEM VERIFICATION
VERIFICATION OF FRACTAL DPM
Inductive Step : Equivalence Verification Observation Equivalence verification chosen
Only outside behavior of system of diff. scale considered No internal Actions considered Considers only how system reacts to inputs
Two Perspectives Looking Down
When system scaled Downwards Looking Up
When system scaled Upwards In both case , verify the larger system behaves same as
sub system . Tool : MurΦ is used
Using same tool for both steps decrease transitional errors
On-The-Fly Mode : No extra state space
EQUIVALENCE VERIFICATION
POWER MANAGEMENT EFFICIENCY System wide power consumption : upper bounded
Max power consumed : ( C-1) MH + H As C approach Infinity
Max Average power of CR = MH F-DPM allows all CR to be in MH
Do not permit certain cases Causes Inefficiency But Tradeoff between this and Fractal Invariance But Rare and Inefficiency caused is small Another Inefficiency : F-DPM forces on CR of H to MH
EVALUATION OF SYSTEM
Goal Fractal DPM actually does its Job well ?
In allocating power to CRs Dynamically and Efficiently
Simulation Methodology1. Dynamically set Xmaxi to all CRs
1. Keep it changing at Time steps
2. Give weights to power settings 3. Model behavior of CRs and DPMCs
1. Specification Tables
4. Computes performance of each CR1. Function of power it is granted by DPM per Time
Steps
PERFORMANCE MODELING How determine performance of a given CR at a given
power setting ? Each CR can use power different way
May achieve different performance at same setting Abstract way : as a function of Pi and Xmaxi Two Functions :
Perf1: Decreasing marginal performance benefit
E.g. using more power to enable a faster core clock frequency helps performance but eventually performance becomes memory-bound
Perf2: Linear Performance benefit
E.g. ideal voltage/frequency scaling
PERFORMANCE COMPARISON AND RESULTS
Compare Against Implementable Oracle ( Ideal DPM) Gives best possible allocations , even H:H
allocations Results ( give #CRs = 8) :
In majority of the time steps (>72%) : performance(FDPM) = performance(Oracle)
the performance gap is never more than 37% for perf1 and 46% for perf2
Performance difference greater for Perf2 perf2 models greater performance at higher power
states, and thus being at a lower power state (to maintain the fractal invariant) is somewhat more costly
Thus : amount of performance sacrificed = Small
IMPLEMENTATION STRATEGY Dynamic Voltage/Frequency Scaling as Power adjustment strategy V/F adjusted on a core-pair ( Granularity )
Possible because of fractal structure CR and DPMC using Linux Daemons Communication through Sockets
Optimization : OptiFDPM CR re-requests next lower power setting if current request rejected Optimized version holds scalable verifiability of FDPM
EVALUATION OF IMPLEMENTATION
Compare the power and performance of fractal DPM against an un-implementable oracle DPM scheme that always assigns the optimal power levels to core pairs.
Compare the power and performance of fractal DPM against a provably correct power management scheme that statically sets all cores to a given power level.
Determine the latency to service requests for new power levels
EVALUATION OF IMPLEMENTATION
Comparison to Oracle Power Management
EVALUATION OF IMPLEMENTATION
Comparison to Static Power Management
EVALUATION OF IMPLEMENTATION
Latency
COMPARISON : PREVIOUS WORKS
Lungu et al.’s research on verifiable DPM for multicore processors [9] Observed DPM schemes cannot be verified on
Large Scale Showed State space explosion
Zhang et al.’s works on Fractal Coherence [14] Derived idea of Fractal design
First time used for DPM
Others Works on DMP [10][8][6] Did not use Verification
CONCLUSION
Design of Scalably verifiable DPM Using Fractal Design for Verifiability Small performance in efficiency only
Par with Oracle Model
REFERENCE [1] D. Bergamini, N. Descoubes, C. Joubert, and R. Mateescu,
“BISIMULATOR: A Modular Tool for On-the-Fly Equivalence Checking,” in Proceedings of TACAS’05, volume 3440 of LNCS, 2005, pp. 581–585.
[2] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2008.
[3] C.-T. Chou, P. Mannava, and S. Park, “A Simple Method for Parameterized Verification of Cache Coherence Protocols,” in Formal Methods in Computer-Aided Design, 2004, pp. 382–398.
[4] G. Dhiman, K. K. Pusukuri, and T. Rosing, “Analysis of Dynamic Voltage Scaling for System Level Energy Management,” in Proceedings of the 2008 Conference on Power Aware Computing and Systems, 2008.
[5] D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang, “Protocol Verification as a Hardware Design Aid,” in IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1992, pp. 522–525.
REFERENCE [6] A. Efthymiou and J. D. Garside, “Adaptive Pipeline Depth Control for
Processor Power-Management,” in Proceedings of the IEEE International Conference on Computer Design, 2002.
[7] J.-C. Fernandez, H. Garavel, A. Kerbrat, L. Mounier, R. Mateescu, and M. Sighireanu, “CADP - A Protocol Validation and Verification Toolbox,” in Proceedings of the 8th International Conference on Computer Aided Verification, 1996, pp. 437–440.
[8] C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi, “An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget,” in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006.
[9] A. Lungu, P. Bose, D. J. Sorin, S. German, and G. Janssen, “Multicore Power Management: Ensuring Robustness via Early-Stage Formal Verification,” in Proceedings of the Seventh ACM-IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), 2009.
[10] R. Maro, Y. Bai, and R. I. Bahar, “Dynamically Reconfiguring Processor Resources to Reduce Power Consumption in High-Performance Processors,” in Proceedings of the Workshop on Power-Aware Computer Systems, pp. 97–111, Nov. 2000.
REFERENCE [11] S. Park, S. Das, and D. L. Dill, “Automatic Checking of
Aggregation Abstractions Through State Enumeration,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 10, pp. 1202–1210, Nov. 2006.
[12] S. Park and D. L. Dill, “Verification of FLASH Cache Coherence Protocol by Aggregation of Distributed Transactions,” in Proceedings of the Eighth ACM Symposium on Parallel Algorithms and Architectures, 1996, pp. 288–296.
[13] D. J. Sorin, M. Plakal, M. D. Hill, A. E. Condon, M. M. K. Martin, and D. A. Wood, “Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol,” IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 6, pp. 556–578, Jun. 2002.
[14] M. Zhang, A. R. Lebeck, and D. J. Sorin, “Fractal Coherence: Scalably Verifiable Cache Coherence,” in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture 2010.