recap - indiana university bloomington · recap b649 parallel architectures and programming. b629:...

Post on 19-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RECAPB649

Parallel Architectures and Programming

B629: Practical Compiling for Modern Machines

RECAP

2

B629: Practical Compiling for Modern Machines

Recap

• ILP• Exploiting ILP• Dynamic scheduling• Thread-level Parallelism• Memory Hierarchy• Other topics through student presentations• Virtual Machines

3

B629: Practical Compiling for Modern Machines

ILP: Pipelining

4

B629: Practical Compiling for Modern Machines

Pipelining: Adding Latches

5

B629: Practical Compiling for Modern Machines

Pipelining: Adding Forwarding

6

B629: Practical Compiling for Modern Machines

Pipelining: Adding Branch Delay Slots

7

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

B629: Practical Compiling for Modern Machines

Exploiting ILP Through Compiler Techniques

9

• Loop unrolling• Making use of branch delayed slots• Static branch prediction• Loop fusion• Unroll and jam• ...

B629: Practical Compiling for Modern Machines

Dynamic Branch Prediction

10

bits to index BPB

Address bits

BranchPredictionBuffer1

0

10

B629: Practical Compiling for Modern Machines

Two-bit Branch Predictor

11

B629: Practical Compiling for Modern Machines

General n-bit Correlating Branch Predictors

12

bits to index BPB

Address bits

BranchPredictionBuffer

global shift register (m bits)

n bits

Use Branch Target Buffers (BTBs) for caching branch targets

B629: Practical Compiling for Modern Machines

Dynamic Scheduling: Tomasulo’s Approach

13

B629: Practical Compiling for Modern Machines

Tomasulo’s Approach: Observations

• RAW hazards handled by waiting for operands• WAR and WAW hazards handled by register

renaming★ only WAR and WAW hazards between instructions

currently in the pipeline are handled; is this a problem?★ larger number of hidden names reduces name dependences

• CDB implements forwarding

14

B629: Practical Compiling for Modern Machines

Tomasulo’s Approach + Speculation

15

Fields in ROB1. Instruction type2. Destination3. Value4. Ready

B629: Practical Compiling for Modern Machines

Observations on Speculation

16

• Speculation enables precise exception handling★ defer exception handling until instruction ready to commit

• Branches are critical to performance★ prediction accuracy★ latency of misprediction detection★ misprediction recovery time

•Must avoid hazards through memory ★WAR and WAW already taken care of (how?)★ for RAW

✴ don’t allow load to proceed if an active ROB entry has Destination field matching with A field of load

✴ maintain program order for effective address computation (why?)

B629: Practical Compiling for Modern Machines

Multiple Issue Processor Types

17

B629: Practical Compiling for Modern Machines

Dyn. Scheduling+Multiple Issue+Speculation

18

• Design parameters★ two-way issue (two instruction issues per cycle)★ pipelined and separate integer and FP functional units★ dynamic scheduling, but not out-of-order issue★ speculative execution

• Task per issue: assign reservation station and update pipeline control tables (i.e., control signals)

• Two possible techniques★ do the task in half a clock cycle★ build wider logic to issue any pair of instructions together

• Modern processors use both (4 or more way superscalar)

B629: Practical Compiling for Modern Machines

Shared-Memory Multiprocessors

19

B629: Practical Compiling for Modern Machines

Distributed-Memory Multiprocessors

20

B629: Practical Compiling for Modern Machines

Other Ways to Categorize Parallel Programming

21

data vs task parallel

client-sever vs p-to-p /

master-slave vs symm.

shared m

em vs m

sg

passing

tight

ly v

s lo

osel

y co

uple

dthreads vs

producer-consumer

course vs fine grained

SPM

D vs

MPM

Drecursive vs iterative

synch. vs asynch.

Parallel Programs

B629: Practical Compiling for Modern Machines

Write Invalidate Cache Coherence Protocolfor Write-Back Caches

22

B629: Practical Compiling for Modern Machines

Distributed Memory+Directories

23

B629: Practical Compiling for Modern Machines

Directory-Based Cache Coherence

24

B629: Practical Compiling for Modern Machines

Other Topics

• x86 assembly programming• VLIW / EPIC• Vector processors• Embedded systems• Scientific applications• GPUs and GPGPUs• CUDA and OpenCL• Interconnection networks• Virtualization

25

B629: Practical Compiling for Modern Machines

WHAT’S NEXT?

26

B629: Practical Compiling for Modern Machines

Future

• Continued importance of parallel programming★ challenge: how to program multiprocessors★ role of programming languages and compilers

• Convergence or specialization?★ “standardization” of general purpose architecture★ migration of “special-purpose” CPUs for general use

27

B629: Practical Compiling for Modern Machines

Landscape of Parallel Computing Research:A View from Berkeley

28

top related