september 28 th 2004university of utah1 a preliminary look karthik ramani power and...
Post on 21-Dec-2015
217 views
TRANSCRIPT
September 28th 2004 University of Utah 1
A preliminary look
Karthik Ramani
Power and Temperature-Aware Microarchitecture
Sept 28th 2004
2
Motivation
ITRS Roadmap: Reasons for increasing power consumption
Higher chip operating frequencies
Increased gate leakage of transistors
Higher interconnect Capacitances and Resistances
Lack of interconnect architecture design tool until 2009
Inability of the Interconnect to scale for performance beyond 2009
Sept 28th 2004
3
Heterogeneous Interconnects: A starting point Two sets of Interconnects
Low Delay, high power wires Low Power wires(high delay)
Easier to target instructions
Augurs well for a more sophisticated model
Sept 28th 2004
4
Interconnect transfers - Types% Different Transfers in the Interconnect
24%
31%
16%
4%
25%
Ready Reg Bypassed Reg Load data Store data Address Transfer
Bypassed register value
Ready register valueAddress transfer
Store value
Load value
Sept 28th 2004
5
Bypassed Register Values
Operands produced in a
cluster that are immediately
required by another cluster
Criticality based on two
factors Operand arrival time at
the cluster
Actual issue time of the
sourcing instruction
Criticality changes at runtime
Needs a dynamic predictor
Rename&
Dispatch
IQ
Regfile
FU
IQ
Regfile
FU
IQ
Regfile
FU
Producing Instruction completing execution at cycle 120
Consumer Instruction dispatched atCycle 100
Sept 28th 2004
6
The Data Criticality Predictor A table indexed by the lower order bits of the
instruction address, updated dynamically to indicate the criticality of data.
Difference in arrival time and usage calculated for each operand of an instruction
Difference < Threshold Critical Difference > Threshold Non-Critical
Sept 28th 2004
7
Summary of transfers
Critical Non-Critical
Load Values Store value
Effective address unpredicted Effective address predicted
Bypassed register value Bypassed register value
Ready register value
Sept 28th 2004
8
Result summary
Two kinds of non-critical transfers Data that are not immediately used – 36% Verification of address predictions – 13%
Criticality based case 49% of all data transfers through the Power-optimized
wires Performance penalty - only 2.5% Potential energy savings of around 50% in the
interconnects
Sept 28th 2004
9
Things that are missing
Power modeling for the processor as a whole.
Implications on transient temperature variations for varying workloads.
Lack of a good on chip interconnect power/temperature simulator
Complexity effective design for the criticality predictor
Sept 28th 2004
10
Interconnect simulator: Problems
Should account for: No. of wires in the particular process.
Deal with a 3-D space for routing of wires.
Satisfy the design rule constraints.
More of a layout optimization problem.
Sept 28th 2004
11
What we propose to do
Wattch: incorporated into a scalable 16 cluster system
HotSpot: Transient temperature model
HotLeakage: Leakage power model
Build a prototype layout to satisfy the above requirements
Sept 28th 2004
12
Wattch
Power model from Princeton University
Simulates an o-o-o processor (Alpha
21264)
Caveat: Interconnects are not accurately
simulated
Sept 28th 2004
13
Wattch Modified
Wattch uses a single instruction window
logic
Issue queue model Separate Int and FP Wakeup logic
Separate Int and FP Selection logic
Helps in efficient distribution
Sept 28th 2004
14
Wattch Modified
Single result bus, FUs and register files
Distributed units Separate Integer and floating point register
files
Separate Integer and floating point execution
and result bus units
Sept 28th 2004
15
Wattch Modified
Wattch: Simple Alpha 21264
Modified for a scalable 16 cluster system Modular: easy for adaptation and testability.
Caveat: There is lot of scope for improvement
September 28th 2004 University of Utah 16
Visual Feature Recognition
Elastic Bunch Graph Matching(EBGM)
Sept 28th 2004
17
History
No particular algorithm known
Many algorithms for face and object
recognition
Few feature recognition benchmarks like
the FERET
Eigen faces – traditionally known for face
recognition
Sept 28th 2004
18
Motivation: EBGM
FLESHTONING
SEGMENT-ATION
FACEDETECTION
FACERECOGNITION
No Segmentation needed in EBGM!
Steps in Face Recognition
Sept 28th 2004
19
EBGM
Steps involved in EBGM
NORMALIZATION/PREPROCESSING
FACE GRAPHCREATION
FACEIDENTIFICATION
Looks easy
Sept 28th 2004
20
EBGM: Mathematically
Image descriptions are based on a Wavelet
transform
Gabor jets are extracted from each
landmark
Local image information around each node
is the key
Sept 28th 2004
21
EBGM: What is missing?
Landmark localization is less reliable
Difficult to track small differences in face
orientation now
Compute intensive Gabor jets