single-chip multiprocessors: the rebirth of parallel architecture guri sohi university of wisconsin

41
Single-Chip Single-Chip Multiprocessors: the Multiprocessors: the Rebirth of Parallel Rebirth of Parallel Architecture Architecture Guri Sohi University of Wisconsin

Post on 20-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

Single-Chip Multiprocessors: the Single-Chip Multiprocessors: the Rebirth of Parallel ArchitectureRebirth of Parallel Architecture

Guri Sohi

University of Wisconsin

Page 2: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

2

OutlineOutline

Waves of innovation in architecture Innovation in uniprocessors Lessons from uniprocessors Future chip multiprocessor architectures Software and such things

Page 3: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

3

Waves of Research and InnovationWaves of Research and Innovation

A new direction is proposed or new opportunity becomes available

The center of gravity of the research community shifts to that direction SIMD architectures in the 1960s HLL computer architectures in the 1970s RISC architectures in the early 1980s Shared-memory MPs in the late 1980s OOO speculative execution processors in the 1990s

Page 4: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

4

WavesWaves

Wave is especially strong when coupled with a “step function” change in technology Integration of a processor on a single chip Integration of a multiprocessor on a chip

Page 5: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

5

Uniprocessor Innovation Wave: Part 1Uniprocessor Innovation Wave: Part 1

Many years of multi-chip implementations Generally microprogrammed control Major research topics: microprogramming,

pipelining, quantitative measures Significant research in multiprocessors

Page 6: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

6

Uniprocessor Innovation Wave: Part 2Uniprocessor Innovation Wave: Part 2

Integration of processor on a single chip The inflexion point Argued for different architecture (RISC)

More transistors allow for different models Speculative execution

Then the rebirth of uniprocessors Continue the journey of innovation Totally rethink uniprocessor microarchitecture Jim Keller: “Golden Age of Microarchitecture”

Page 7: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

7

Uniprocessor Innovation Wave: ResultsUniprocessor Innovation Wave: Results

Current uniprocessor very different from 1980’s uniprocessor

Uniprocessor research dominates conferences

MICRO comes back from the dead Top 1% (NEC Citeseer)

Impact on compilers Source: Rajwar and Hill, 2001

Page 8: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

8

Why Uniprocessor Innovation Wave?Why Uniprocessor Innovation Wave?

Innovation needed to happen Alternatives (multiprocessors) not practical

option for using additional transistors Innovation could happen: things could be

done differently Identify barriers (e.g., to performance) Use transistors to overcome barriers (e.g., via

speculation) Simulation tools facilitate innovation

Page 9: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

9

Lessons from UniprocessorsLessons from Uniprocessors

Don’t underestimate what can be done in hardware Doing things in software was considered easy;

in hardware considered hard Now perhaps the opposite

Barriers or limits become opportunities for innovation Via novel forms of speculation E.g., barriers in Wall’s study on limits of ILP

Page 10: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

10

Multiprocessor ArchitectureMultiprocessor Architecture

A.k.a. “multiarchitecture” of a multiprocessor Take state-of-the-art uniprocessor Connect several together with a suitable

network Have to live with defined interfaces

Expend hardware to provide cache coherence and streamline inter-node communication Have to live with defined interfaces

Page 11: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

11

Software ResponsibilitiesSoftware Responsibilities

Have software figure out how to use MP Reason about parallelism Reason about execution times and overheads Orchestrate parallel execution Very difficult for software to parallelize

transparently

Page 12: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

12

Explicit Parallel ProgrammingExplicit Parallel Programming

Have programmer express parallelism Reasoning about parallelism is hard Use synchronization to ease reasoning

Parallel trends towards serial with the use of synchronization

Page 13: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

13

Net ResultNet Result

Difficult to get parallelism speedup Computation is serial Inter-node communication latencies

exacerbate problem Multiprocessors rarely used for parallel

execution Used to run threaded programs

Lower-overhead sync would help Used to improve throughput

Page 14: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

14

The Inflexion Point for MultiprocessorsThe Inflexion Point for Multiprocessors

• Can put a basic small-scale MP on a chip

• Can think of alternative ways of building multiarchitecture• Don’t have to work with defined interfaces!

• What opportunities does this open up?• Allows for parallelism to get performance.

• Allows for use of novel techniques to overcome (software and hardware) barriers

• Other opportunities (e.g., reliability)

Page 15: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

15

Parallel SoftwareParallel Software

Needs to be compelling reason to have a parallel application

Won’t happen if difficult to create Written by programmer or automatically

parallelized by compiler Won’t happen if insufficient performance gain

Page 16: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

16

Changes in MP MultiarchitectureChanges in MP Multiarchitecture

Inventing new functionality to overcome barriers Consider barriers as opportunities

Developing new models for using CMPs Revisiting traditional use of MPs

Page 17: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

17

Speculative MultithreadingSpeculative Multithreading

Speculatively parallelize an application Use speculation to overcome ambiguous

dependences Use hardware support to recover from mis-

speculation E.g., multiscalar Use hardware to overcome limitations

Page 18: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

18

Overcoming Barriers: Memory ModelsOvercoming Barriers: Memory Models

Weak models proposed to overcome performance limitations of SC

Speculation used to overcome “maybe” dependences

Series of papers showing SC can achieve performance of weak models

Page 19: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

19

ImplicationsImplications

Strong memory models not necessarily low performance

Programmer does not have to reason about weak models

More likely to have parallel programs written

Page 20: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

20

Overcoming Barriers: SynchronizationOvercoming Barriers: Synchronization

Synchronization to avoid “maybe” dependences Causes serialization

Speculate to overcome serialization Recent work on techniques to dynamically

elide synchronization constructs

Page 21: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

21

ImplicationsImplications

Programmer can make liberal use of synchronization to ease programming

Little performance impact of synchronization More likely to have parallel programs written

Page 22: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

22

Overcoming Barriers: CoherenceOvercoming Barriers: Coherence

Caches used to reduce data access latency; need to be kept coherent

Latencies of getting value from remote location impact performance

Getting remote value is two-part operation Get value Get permissions to use value Can separating these help?

Page 23: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

23

Coherence DecouplingCoherence Decoupling

Sequential execution

Speculativeexecution

Time

Time

Coherencemiss detected

Valuepredicted

Permissiongranted

Value arrivesand verified

Retried (onmisspeculation)

Coherencemiss detected

Permissionand value arrive

Miss latency

Worst case latency

Best case latency

Page 24: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

24

Zeros/Ones in Coherence MissesZeros/Ones in Coherence Misses

Preliminary Data, Simics (Ultrasparc/Solaris, 16P), Cache (4MB 4-way SA L2, 64B lines, MOSI)

Load Value 1 0 -1 TotalOLTP 5.1% 32.3% 9.9% 47.3%Apache 0.7% 27.8% 0.5% 29.0%JBB 18.6% 12.6% 0.5% 31.7%Barnes 1.7% 34.1% 1.7% 37.5%Ocean 1.3% 25.9% 1.7% 28.9%Store Value 1 0 -1 TotalOLTP 7.1% 29.1% 13.1% 49.4%Apache 3.4% 17.0% 7.4% 27.7%JBB 0.7% 21.3% 1.4% 23.4%Barnes 1.8% 29.8% 17.2% 48.7%Ocean 13.2% 4.9% 3.5% 21.5%

Page 25: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

25

Other Performance OptimizationsOther Performance Optimizations

Clever techniques for inter-processor communication Remember: no artificial constraints on chip

Further reduction of artificial serialization

Page 26: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

26

New Uses for CMPsNew Uses for CMPs

Helper threads, redundant execution, etc.

• will need extensive research in the context of CMPs

How about trying to parallelize application, i.e., “traditional” use of MPs?

Page 27: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

27

Revisiting Traditional Use of MPsRevisiting Traditional Use of MPs

Compilers and software for MPs Digression: Master/Slave Speculative

Parallelization (MSSP) Expectations for future software Implications

Page 28: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

28

Parallelizing Apps: A Moving TargetParallelizing Apps: A Moving Target

Learned to reason about certain languages, data structures, programming constructs and applications

Newer languages, data structures, programming constructs and applications appear

Always playing catch up Can we get a jump ahead?

Page 29: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

29

Master/Slave Speculative Parallelization (MSSP)Master/Slave Speculative Parallelization (MSSP)

Take a program and create program with two sub-programs: Master and Slave

Master program is approximate (or distilled) version of original program

Slave (original) program “checks” work done by master

Portions of the slave program execute in parallel

Page 30: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

30

MSSP - OverviewMSSP - Overview

A

B

C

D

E

MasterSlave

Slave

Slave

Original Code

AD

BD

CD

DD

ED

Distilled Code

Distilled Code on Master

Original Code concurrently on Slaves verifies Distilled Code

Use checkpoints to communicate changes

Page 31: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

31

MSSP - DistillerMSSP - Distiller

Program with many paths

Page 32: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

32

MSSP - DistillerMSSP - Distiller

Program with many paths

Dominant paths

Page 33: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

33

MSSP - DistillerMSSP - Distiller

Program with many paths

Dominant paths

Page 34: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

34

MSSP - DistillerMSSP - Distiller

Program with many paths

Dominant paths

Page 35: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

35

MSSP - DistillerMSSP - Distiller

Approximate important pathsCompiler Optimizations

Distilled Code

Page 36: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

36

MSSP - ExecutionMSSP - Execution

BBD

DDD

A AD

C CD

EDE

Slave Slave Slave Master

Verify checkpoint of AD -- CorrectVerify checkpoint of BD -- CorrectVerify checkpoint of CD -- CorrectVerify checkpoint of DD -- Wrong

EDE

Restart E and ED

Page 37: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

37

MSSP SummaryMSSP Summary

Distill away code that is unlikely to impact state used by later portions of program

Performance tracks distillation ratio Better distillation = better performance

Verification of distilled program done in parallel on slaves

Page 38: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

38

Future Applications and SoftwareFuture Applications and Software

What will future applications look like? Don’t know

What language will they be written in? Don’t know; don’t care

Code for future applications will have “overheads” Overheads for checking for correctness Overheads for improving reliability Overheads for checking security

Page 39: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

39

Overheads as an OpportunityOverheads as an Opportunity

Performance costs of overhead have limited their use Overheads not a limit; rather an opportunity Run overhead code in parallel with non-overhead

code Develop programming models Develop parallel execution models (a la MSSP) Recent work in this direction

Success at reducing overhead cost will encourage even more use of “overhead” techniques

Page 40: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

40

SummarySummary

New opportunities for innovation in MPs Expect little resemblance between MPs today

and CMPs in 15 years We need to invent and define differences Not because uniprocessors are running out of

steam But because innovation in CMP

multiarchitecture possible

Page 41: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin

41

SummarySummary

Novel techniques for attacking performance limitations

New models for expressing work (computation and overhead)

New parallel processing models Simulation tools