presented by: deniz balkan

21
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland Presented by: Deniz Balkan

Upload: marcia-salas

Post on 31-Dec-2015

26 views

Category:

Documents


0 download

DESCRIPTION

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland. Presented by: Deniz Balkan. Dynamic Scheduler. Workings of a dynamic scheduler Wakeup dependent instructions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling

A. Aggarwal, O. Ergin – Binghamton UniversityM. Franklin – University of Maryland

Presented by: Deniz Balkan

Page 2: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Dynamic Scheduler

• Workings of a dynamic scheduler– Wakeup dependent instructions

– Select instructions from a pool of ready instructions

• Both these operations form a critical path

• Increase of a single cycle in this critical path impacts performance

Page 3: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Implications of a large Dynamic Scheduler

• Large dynamic scheduler has the potential to exploit more ILP

– Larger issue queue– Larger issue width

• Implications– Longer wire delays associated with driving register tags– Longer wire delays in driving tag comparison results– Longer select logic latency

• Overall increased scheduler latency, resulting in slower clock speed

Page 4: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Contributions of this paper

• Wakeup width definition – effective number of results used for instruction wakeup

– Usually equal to the issue width

• Reduced wakeup width dynamic scheduler– Issue width remains the same

– Reduces instruction wakeup latency, energy consumption, and area

– Less than 2% reduction in IPC

Page 5: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Program Behavior Study

• Not all instructions produce a result– Branch and store instructions form about 30%

• Entire issue width of the processor not used in every cycle

• Average number of tags generated per cycle considerably less than the processor issue width

Page 6: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Tags generated in a cycle

• To generate more tags per cycle, used a fetch, issue and commit width of 12

• Almost 50% of cycles have either 0 or 1 tag generated, even with a large issue width

• About 80% of the cycles have 3 or less tags generated per cycle

Page 7: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Useful tags

• Not all the generated tags are immediately useful

– Branch mispredictions lead to tags generated along wrong path, and tags not immediately required

– Dependent instructions not present in issue queue or waiting for other operands

• Average number of useful tags in a cycle even less than the average number of tags generated in a cycle

Page 8: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Useful tags

Only about 50-60% of instructions produce a tag that is immediately required

Page 9: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Reduced Wakeup Width Dynamic Scheduler

• Wakeup width reduced while retaining the issue width intact

– Some tags may have to wait before waking up the dependent instructions

• Performance impact is not expected to be high

– Soon there will be cycles with fewer tags

– Waiting tags can use the available wakeup slots

– Delays in not immediately useful tags may not have any performance impact

Page 10: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Hardware Implementation – Conventional DS

• Select logic decides which instruction executes on which FU

• Register tags of issued instructions placed in tag-latches

• Enable signals controlled to enable the drivers that drive the tags across the instruction window

Page 11: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Hardware Implementation – RWW DS

• Wakeup width reduced to half the issue width

• Two tag latches/FUs share common tag-lines

• If both tag-latches hold tags, only one of them is driven, the other remains in the tag-latch

• To prevent overwriting, 1-bit indicator latch used to control the selection process

Page 12: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

FU arbiter

• Decides the instruction to be executed on the FU

• Conventional arbiter giving priority to oldest instruction

• Arbiter with RWW dynamic scheduler, where “a” is the value of the indicator latch for the arbiter

Grant1 = req0 AND req1 AND enable

Grant1 = req0 AND a AND req1 AND enable

Page 13: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Experimental Setup

• Simulator based on Simplescalar to collect the performance statistics

• Delay, energy, and area estimation from the actual VLSI layouts using SPICE, in a 0.18 micron 6 metal layer CMOS process (TSMC)

• Dynamic scheduler size – 128-entry issue queue, 6-way issue width

Page 14: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Performance Results

• Compared to I6W6 (Issue Width 6, Wakeup Width 6) configuration

– I6W3 has 15% lower wakeup logic latency

• IPC impact about 5% for I6W3– Higher for high IPC FP benchmarks

– Significantly better than I3W3, with the same wakeup logic latency as I6W3

Page 15: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

IPC of FP benchmarks with RWW

Reasons of IPC impact• Instructions delayed due to waiting tags• Issue slots wasted because of waiting tags

Page 16: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Reasons of IPC impact

• Delayed register tags have more impact than issue slot wastage

• With reducing wakeup width, the impact of delayed register tags increases dramatically

Page 17: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Area and Energy Results

• Activation statistics obtained through simulations, and the energy consumption values from our detailed layouts

– I6W3 reduced wakeup logic energy consumption by 10%

• Area of the CAM cells (tag part of the instruction window) reduces by about 30% for I6W3

Page 18: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Reduced Issue Slots Wastage (RWIS)

• Issue slots wasted because no instructions issued to FUs with already waiting tags

• Classified instructions into– Tag-producing instructions– Non-tag-producing instructions

• Can still issue non-tag-producing instructions to FUs with waiting tags without overwriting the tag value

• Type bit included with the instruction to control issue

Page 19: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Reduced Tag Delays (RTD)

• Register tags delayed when multiple tag-producing instructions issued to the FUs sharing the tag-lines (FU-group)

• RTD limits the number of tag-producing instructions issued to an FU-group

– Waiting tags of the previous cycle used for this purpose

• Non-tag-producing instructions can still be issued to FUs with indicator bits set

Page 20: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Enhanced Performance

• RTD-1 (with a maximum of 1 waiting tag) is the most effective

• RWIS reduces the wastage of issue slots, RTD also reduces waiting register tags

• RTD-2 results in more instructions getting delayed (compared to RTD-1) due to waiting register tags

Page 21: Presented by: Deniz Balkan

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland

Conclusions

• Larger dynamic schedulers can exploit more ILP, thus increasing performance

• Larger dynamic scheduler results in longer scheduler latency

• Reduced wakeup width (RWW) dynamic scheduler exploits the property that the number of useful tags generated per cycle are significantly less than the issue width

• Significant reduction in wakeup logic latency and dynamic scheduler area and energy consumption with minimal IPC impact