lecture 11: pipelining and branch prediction

39
Lecture 11: Pipelining and Branch Prediction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)

Upload: candid

Post on 05-Feb-2016

43 views

Category:

Documents


2 download

DESCRIPTION

Lecture 11: Pipelining and Branch Prediction. EEN 312: Processors: Hardware, Software, and Interfacing. Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM). THE QUIZ SHOW!. Today ’ s class will be a quiz show. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 11: Pipelining and Branch Prediction

Lecture 11: Pipelining and Branch Prediction

EEN 312: Processors: Hardware, Software, and Interfacing

Department of Electrical and Computer EngineeringSpring 2014, Dr. Rozier (UM)

Page 2: Lecture 11: Pipelining and Branch Prediction

THE QUIZ SHOW!

Page 3: Lecture 11: Pipelining and Branch Prediction

Today’s class will be a quiz show

• We will be solving puzzles involving pipelining, branch prediction, and the stack.

• Form up into groups of 8 individuals

• Points for correct solutions, the extra credit points awarded to the top teams:– 4 pts for 1st place– 3 pts for 2nd place– 2 pts for 3rd place– 1 pt for 4th place

Page 4: Lecture 11: Pipelining and Branch Prediction

The Rules!

• Each group will elect a “buzzer” when the buzzer raises his hand, your group will be called on to solve the puzzle.

• One representative will be sent up per group. They will give their answer and explain it.

• Once the buzzer has raised his hand, your group must stop discussing the answer!

Page 5: Lecture 11: Pipelining and Branch Prediction

PIPELINING

Page 6: Lecture 11: Pipelining and Branch Prediction

Pipelining

• Assume r5 != r4• Assume there is one memory for

instructions and data.• During a cycle either data can be

loaded for an instruction OR an instruction can be fetched, not both.

(100) A structural hazard exists. What is it?

str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0

Page 7: Lecture 11: Pipelining and Branch Prediction

Pipelining

• Assume r5 != r4• Assume there is one memory for

instructions and data.• During a cycle either data can be

loaded for an instruction OR an instruction can be fetched, not both.

(200) Can this structural hazard be eliminated by adding “bubbles” to the pipeline in the form of NOP instructions?

str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0

Page 8: Lecture 11: Pipelining and Branch Prediction

Pipelining

• Assume r5 != r4• Assume there is one memory for

instructions and data.• During a cycle either data can be

loaded for an instruction OR an instruction can be fetched, not both.

(300) To guarantee forward progress, how must this hazard be resolved? In favor of data access, or instruction fetching? Why?

str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0

Page 9: Lecture 11: Pipelining and Branch Prediction

Pipelining

• Assume r5 != r4• Assume there is one memory for

instructions and data.• During a cycle either data can be

loaded for an instruction OR an instruction can be fetched, not both.

(400) Draw the 5-stage pipeline for this code, assume the stages are:

Fetch, Decode, Execute, Memory, Writeback.

What is the total execution time?

str r0, [r1, #16]ldr r0, [r1, #8]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0

Page 10: Lecture 11: Pipelining and Branch Prediction

Pipelining

• Assume r5 != r4• Assume there is one memory for

instructions and data.• During a cycle either data can be

loaded for an instruction OR an instruction can be fetched, not both.

(500) Assume we have a new processor such that when the offset is zero on a memory operation, the Execute stage (ALU) can be skipped. The MEM and EXECUTE can now be overlapped in the pipeline. What speedup is achieved with this new architecture?

str r0, [r1, #0]ldr r0, [r10, #0]cmp r5, r4beq labeladd r5, r2, r4add r5, r5, r0

Page 11: Lecture 11: Pipelining and Branch Prediction

DATA DEPENDENCIES

Page 12: Lecture 11: Pipelining and Branch Prediction

Data Dependencies

(100) Find all data dependencies in this sequence.

ldr r1, [r1, #0]and r1, r1, r2ldr r2, [r1, #0]ldr r1, [r3, #0]

Page 13: Lecture 11: Pipelining and Branch Prediction

Data Dependencies

(200) Find all hazards in this sequence, with and without forwarding, for a 5-stage pipeline assume the stages are:

Fetch, Decode, Execute, Memory, Writeback.

ldr r1, [r1, #0]and r1, r1, r2ldr r2, [r1, #0]ldr r1, [r3, #0]

Page 14: Lecture 11: Pipelining and Branch Prediction

Data Dependencies

(300) To reduce the clock cycle time, we are considering a split of the MEM stage into two stages.

Find all hazards in this sequence for a 5-stage pipeline, with and without forwarding, assume the stages are:

Fetch, Decode, Execute, Memory, Writeback.

add r1, r2, r1ldr r2, [r1, #0]ldr r1, [r1, #4]or r3, r1, r2

Page 15: Lecture 11: Pipelining and Branch Prediction

Data Dependencies

• Assume all data memory values are 0’s.

• Assume:– r0 = 0– r1 = -1– r2 = 31– r3 = 1500

• Assume the processor has forwarding logic for hazards.

(400) What value is the first one to be forwarded, and what is the value it overrides?

add r1, r2, r1ldr r2, [r1, #0]ldr r1, [r1, #4]or r3, r1, r2

Page 16: Lecture 11: Pipelining and Branch Prediction

Data Dependencies

• Assume all data memory values are 0’s.

• Assume:– r0 = 0– r1 = -1– r2 = 31– r3 = 1500

(500) The hazard detection unit assumes forwarding was implemented, but the processor designers, (UF students) forgot to implement it!

What are the final register values? What should they be?Add NOPs to this sequence to ensure

correct execution despite UF’s screw up!

add r1, r2, r1ldr r2, [r1, #0]ldr r1, [r1, #4]or r3, r1, r2

Page 17: Lecture 11: Pipelining and Branch Prediction

BRANCH PREDICTION

Page 18: Lecture 11: Pipelining and Branch Prediction

Branch Prediction

(100) When building a branch prediction unit, define for the following cases if the best choice is “branch not taken” or “branch taken” for the prediction:

1.Branches associated with “If” statements2.Branches associated with “Else if” statements3.Branches associated with “Else” Statements4.Branches associated with “For” Statements

Page 19: Lecture 11: Pipelining and Branch Prediction

Branch Prediction

(200) Design a dynamic branch predictor for if statements and loops. Describe how to implement it in hardware. What new hardware might it require?

Page 20: Lecture 11: Pipelining and Branch Prediction

Branch Prediction

• Assume branch prediction is handled by branch not taken.

• Assume one element of the array at r2 is equal to 100.

(300) How many times is the branch predicted correctly versus incorrectly?

00: mov r1, #001: mov r2, #DEADBEEFLOOP:02: ldr r3, [r2, r0 lsl 2]03: cmp r3, #10004: beq LABEL05: mov r4, r3LABEL:06: add r0, r0, #107: cmp r0, #508: beq LOOP09: mov r0, r410: add r0, r0, #1

Page 21: Lecture 11: Pipelining and Branch Prediction

Branch Prediction

• Assume branch prediction is handled by branch not taken.

• Assume one element of the array at r2 is equal to 100.

• Assume the PC pipeline is three instructions deep

• Assume the PC pipeline can be flushed in one cycle, and on a miss prediction must be fully flushed.

• Assume a pipeline with the phases:Fetch, Decode, Issue, Execute, Memory, and Writeback

• Assume branches are evaluated in the issue step, and the pipeline flushed during execute

(400) How many cycles does the loop take?

00: mov r1, #001: mov r2, #DEADBEEFLOOP:02: ldr r3, [r2, r0 lsl 2]03: cmp r3, #10004: beq LABEL05: mov r4, r3LABEL:06: add r0, r0, #107: cmp r0, #508: beq LOOP09: mov r0, r410: add r0, r0, #1

Page 22: Lecture 11: Pipelining and Branch Prediction

Branch Prediction

• Assume branch prediction is handled by branch not taken.

• Assume the PC pipeline is three instructions deep

• Assume the PC pipeline can be flushed in one cycle, and on a miss prediction must be fully flushed.

• Assume a pipeline with the phases:Fetch, Decode, Issue, Execute, Memory, and Writeback

• Assume branches are evaluated in the issue step, and the pipeline flushed during execute

(500) Act as the compiler. Optimize the code for branch not taken. How many cycles does it take?

00: mov r1, #001: mov r2, #DEADBEEFLOOP:02: ldr r3, [r2, r0 lsl 2]03: cmp r3, #10004: beq LABEL05: mov r4, r3LABEL:06: add r0, r0, #107: cmp r0, #508: beq LOOP09: mov r0, r410: add r0, r0, #1

Page 23: Lecture 11: Pipelining and Branch Prediction

PROCESSOR ARCHITECTURE

Page 24: Lecture 11: Pipelining and Branch Prediction

Processor Architecture

(100) For a five stage pipeline with stages: Fetch, Decode, Execute, Memory, and Writeback, describe what happens in each stage.

Page 25: Lecture 11: Pipelining and Branch Prediction

Processor Architecture

(200) Describe the purpose of a clock signal in a processor. Why do processors need clock signals?

Page 26: Lecture 11: Pipelining and Branch Prediction

Processor Architecture

(300) Describe how during the Decode phase registers are selected from the register file. How is this accomplished in hardware?

Page 27: Lecture 11: Pipelining and Branch Prediction

Processor Architecture

(400) Why must we allocate new registers in the datapath for the writeback register instead of reading it from the decode phase?

Page 28: Lecture 11: Pipelining and Branch Prediction

Processor Architecture

(500) Design a one bit full adder.

Page 29: Lecture 11: Pipelining and Branch Prediction

REPRESENTATION OF DATA

Page 30: Lecture 11: Pipelining and Branch Prediction

Representation of Data

(100) Describe the difference between big endian and little endian representations.

Page 31: Lecture 11: Pipelining and Branch Prediction

Representation of Data

(200) Represent the following data in big endian and little endian formats:

1.00ac8eff

2.54897743

3.be88fac8

Page 32: Lecture 11: Pipelining and Branch Prediction

Representation of Data

(300) Represent the following data as hexadecimal numbers in big and little endian formats. Assume unsigned integers

1.128

2.976

Page 33: Lecture 11: Pipelining and Branch Prediction

Representation of Data

(400) Represent the following data as hexadecimal numbers in big and little endian formats. Assume signed integers

1.-55

2.99

Page 34: Lecture 11: Pipelining and Branch Prediction

Representation of Data

(500) Write assembly code which takes data from one register in Big Endian format and stores it in a new register in Little Endian format.

You may use temporary registers.

Page 35: Lecture 11: Pipelining and Branch Prediction

FINAL QUESTION

Page 36: Lecture 11: Pipelining and Branch Prediction

Final Question

• Each team should decide an amount of points to bid.

• Write down your bids on a sheet of paper and hand them in.

• You will have only 60 seconds to answer the next question as a team, write your answers down by the time limit.– Answer correctly and you will add your bid to your score.– Answer incorrectly and you will lose those points.

Page 37: Lecture 11: Pipelining and Branch Prediction

Final Question

In order to detect data hazards, new hardware must be added. Assuming that the registers ids involved in an instruction are available during the decode stage, what hardware would be necessary to check for data hazards?

Page 38: Lecture 11: Pipelining and Branch Prediction

WRAP UP

Page 39: Lecture 11: Pipelining and Branch Prediction

For next time

• Enjoy your spring break!

• Read Chapter 5, sections 5.1 – 5.3