september 28 th 2004university of utah1 a preliminary look karthik ramani power and...

22
September 28 th 2004 University of Utah 1 A preliminary look Karthik Ramani Power and Temperature- Aware Microarchitecture

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

September 28th 2004 University of Utah 1

A preliminary look

Karthik Ramani

Power and Temperature-Aware Microarchitecture

Sept 28th 2004

2

Motivation

ITRS Roadmap: Reasons for increasing power consumption

Higher chip operating frequencies

Increased gate leakage of transistors

Higher interconnect Capacitances and Resistances

Lack of interconnect architecture design tool until 2009

Inability of the Interconnect to scale for performance beyond 2009

Sept 28th 2004

3

Heterogeneous Interconnects: A starting point Two sets of Interconnects

Low Delay, high power wires Low Power wires(high delay)

Easier to target instructions

Augurs well for a more sophisticated model

Sept 28th 2004

4

Interconnect transfers - Types% Different Transfers in the Interconnect

24%

31%

16%

4%

25%

Ready Reg Bypassed Reg Load data Store data Address Transfer

Bypassed register value

Ready register valueAddress transfer

Store value

Load value

Sept 28th 2004

5

Bypassed Register Values

Operands produced in a

cluster that are immediately

required by another cluster

Criticality based on two

factors Operand arrival time at

the cluster

Actual issue time of the

sourcing instruction

Criticality changes at runtime

Needs a dynamic predictor

Rename&

Dispatch

IQ

Regfile

FU

IQ

Regfile

FU

IQ

Regfile

FU

Producing Instruction completing execution at cycle 120

Consumer Instruction dispatched atCycle 100

Sept 28th 2004

6

The Data Criticality Predictor A table indexed by the lower order bits of the

instruction address, updated dynamically to indicate the criticality of data.

Difference in arrival time and usage calculated for each operand of an instruction

Difference < Threshold Critical Difference > Threshold Non-Critical

Sept 28th 2004

7

Summary of transfers

Critical Non-Critical

Load Values Store value

Effective address unpredicted Effective address predicted

Bypassed register value Bypassed register value

Ready register value

Sept 28th 2004

8

Result summary

Two kinds of non-critical transfers Data that are not immediately used – 36% Verification of address predictions – 13%

Criticality based case 49% of all data transfers through the Power-optimized

wires Performance penalty - only 2.5% Potential energy savings of around 50% in the

interconnects

Sept 28th 2004

9

Things that are missing

Power modeling for the processor as a whole.

Implications on transient temperature variations for varying workloads.

Lack of a good on chip interconnect power/temperature simulator

Complexity effective design for the criticality predictor

Sept 28th 2004

10

Interconnect simulator: Problems

Should account for: No. of wires in the particular process.

Deal with a 3-D space for routing of wires.

Satisfy the design rule constraints.

More of a layout optimization problem.

Sept 28th 2004

11

What we propose to do

Wattch: incorporated into a scalable 16 cluster system

HotSpot: Transient temperature model

HotLeakage: Leakage power model

Build a prototype layout to satisfy the above requirements

Sept 28th 2004

12

Wattch

Power model from Princeton University

Simulates an o-o-o processor (Alpha

21264)

Caveat: Interconnects are not accurately

simulated

Sept 28th 2004

13

Wattch Modified

Wattch uses a single instruction window

logic

Issue queue model Separate Int and FP Wakeup logic

Separate Int and FP Selection logic

Helps in efficient distribution

Sept 28th 2004

14

Wattch Modified

Single result bus, FUs and register files

Distributed units Separate Integer and floating point register

files

Separate Integer and floating point execution

and result bus units

Sept 28th 2004

15

Wattch Modified

Wattch: Simple Alpha 21264

Modified for a scalable 16 cluster system Modular: easy for adaptation and testability.

Caveat: There is lot of scope for improvement

September 28th 2004 University of Utah 16

Visual Feature Recognition

Elastic Bunch Graph Matching(EBGM)

Sept 28th 2004

17

History

No particular algorithm known

Many algorithms for face and object

recognition

Few feature recognition benchmarks like

the FERET

Eigen faces – traditionally known for face

recognition

Sept 28th 2004

18

Motivation: EBGM

FLESHTONING

SEGMENT-ATION

FACEDETECTION

FACERECOGNITION

No Segmentation needed in EBGM!

Steps in Face Recognition

Sept 28th 2004

19

EBGM

Steps involved in EBGM

NORMALIZATION/PREPROCESSING

FACE GRAPHCREATION

FACEIDENTIFICATION

Looks easy

Sept 28th 2004

20

EBGM: Mathematically

Image descriptions are based on a Wavelet

transform

Gabor jets are extracted from each

landmark

Local image information around each node

is the key

Sept 28th 2004

21

EBGM: What is missing?

Landmark localization is less reliable

Difficult to track small differences in face

orientation now

Compute intensive Gabor jets

Sept 28th 2004

22

Questions?

Thank you