why intel is designing multi-core processors...• few available, tied to old models, languages,...

31
Why Intel is designing multi-core processors Geoff Lowney Intel Fellow, Microprocessor Architecture and Planning July, 2006

Upload: others

Post on 26-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

Why Intel is designing multi-core processors

Geoff Lowney

Intel Fellow, Microprocessor Architecture and Planning

July, 2006

Page 2: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20062

Transistor Count

1.E+03

1.E+05

1.E+07

1.E+09

1970 1980 1990 2000 2010 2020

2.0x

Frequency (MHz)

1

10

100

1000

10000

100000

1970 1980 1990 2000 2010 2020

1.5x

2.0x

CPU Power (W)

10

100

1000

1990 1995 2000 2005 2010 2015

1.4x

Feature Size (um)

0.01

0.1

1

10

1970 1980 1990 2000 2010 2020

0.70x

0.79x

Moore’s Law at Intel 1970-2005

Power trend not Power trend not sustainablesustainable

Page 3: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20063

Pentium® 4 Processor

386 Processor

May 1986@16 MHz core

275,000 1.5µ transistors~1.2 SPECint2000

17 Years200x

200x/11x1000x

August 27, [email protected] GHz core55 Million 0.13µ transistors1249 SPECint2000

Page 4: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20064

1

10

100

1000

10000

Jan-85 Jan-87 Jan-89 Jan-91 Jan-93 Jan-95 Jan-97 Jan-99 Jan-01 Jan-03 Jan-05

Introduction Date

SPEC

int2

000

1

10

100

1000

10000

Jan-85 Jan-87 Jan-89 Jan-91 Jan-93 Jan-95 Jan-97 Jan-99 Jan-01 Jan-03 Jan-05

Introduction Date

Clo

ck F

requ

ency

Performance: 1000x

Frequency: 200x

Page 5: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20065

SPECint2000/MHz (normalized)

0

1

2

3

4

5

6

7

Jan-85 Jan-87 Jan-89 Jan-91 Jan-93 Jan-95 Jan-97 Jan-99 Jan-01 Jan-03 Jan-05

Introduction Date

Nor

mal

ized

SPE

Cin

t200

0/M

Hz

386486 486DX2

P5 P55-MMX

PPro/II

PIII

P4

Page 6: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20066

0

1

2

3

4

5

Pipelined S-Scalar OOO-Spec

Deep Pipe

Incr

ease

(X)

Area XPerf X

-1

0

1

2

3

Pipelined S-Scalar OOO-Spec

DeepPipe

Incr

ease

(X)

Power XMips/W (%)

Performance scales with area**.5

Power efficiency has dropped

Page 7: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20067

Pushing Frequency

Pipeline & Performance

02468

10

1 2 3 4 5 6 7 8 9 10Relative Frequency (Pipelining)

Performance

DiminishingReturn

Maximized frequency byMaximized frequency by•• Deeper pipelinesDeeper pipelines•• Pushed process to the limitPushed process to the limit

Process Technology

02468

10

1 2 3 4 5 6 7 8 9 10Relative Frequency

Sub-threshold Leakage increases

exponentially••Dynamic power increases Dynamic power increases with frequency with frequency ••Leakage power increases Leakage power increases with reducing with reducing VtVt

Diminishing return on performance. Increase in power

Page 8: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20068

Transistor Count

1.E+03

1.E+05

1.E+07

1.E+09

1970 1980 1990 2000 2010 2020

2.0x

Frequency (MHz)

1

10

100

1000

10000

100000

1970 1980 1990 2000 2010 2020

1.5x

2.0x

CPU Power (W)

10

100

1000

1990 1995 2000 2005 2010 2015

1.4x

Feature Size (um)

0.01

0.1

1

10

1970 1980 1990 2000 2010 2020

0.70x

0.79x

Moore’s Law at Intel 1970-2005

Power trend not Power trend not sustainablesustainable

Page 9: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 20069

Reducing power with voltage scaling

Power = Capacitance * Voltage**2 * Frequency

Frequency ~ Voltage in region of interest

Power ~ Voltage ** 3

10% reduction of voltage yields

• 10% reduction in frequency

• 30% reduction in power

• Less than 10% reduction in performance

Rule of ThumbRule of Thumb

Voltage Frequency Power Performance

1% 1% 3% 0.66%

Page 10: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200610

Dual core with voltage scaling

Area = 1Area = 1Voltage = 1Voltage = 1Freq = 1Freq = 1Power = 1Power = 1PerfPerf = 1= 1

Area = 2Area = 2Voltage = 0.85Voltage = 0.85Freq = 0.85Freq = 0.85Power = 1Power = 1PerfPerf = ~1.8= ~1.8

Frequency

Reduction

Power

Reduction

Performance

Reduction

15% 45% 10%

A 15% A 15%

ReductionReduction

In VoltageIn Voltage

YieldsYields

SINGLE CORESINGLE CORE DUAL COREDUAL CORE

RULE OF THUMBRULE OF THUMB

Page 11: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200611

Multiple cores deliver more performance per watt

Big coreBig core

CacheCache PowerPowerPower = Power = ¼¼

Performance = 1/2

C1C1

C4C4

C2C2

C3C3

SmallSmallcorecore

CacheCache

11

22

33

44 Performance = 1/2

PerformancePerformance

11

22

11 11

Many core is more Many core is more power efficientpower efficient

Power ~ areaPower ~ area

Single thread Single thread performance ~ area**.5

11

22

33

44

11

22

33

44

performance ~ area**.5

Page 12: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200612

Memory Gap

Growing Performance GapGrowing Performance Gap

0

100

200

300

400

500

600

700

Pentium66MHz

Pentium-Pro200MHz

PentiumIII1100MHz

Pentium4 2GHz

19921992 19941994 19961996 19981998 20002000 20022002

LOGICLOGIC

MEMORYMEMORY

GA

PG

AP

Peak InstructionsPeak InstructionsPer DRAM AccessPer DRAM Access

Reduce DRAM access with large cachesExtra benefit: power savings. Cache is lower power than logic

Tolerate memory latency with multiple threadsMultiple coresHyper-threading

Page 13: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200613

Multi-threading tolerates memory latency

Serial Execution

Ai Idle Bi IdleAi+1 Bi+1

Multi-threaded Execution

Ai Ai+1Idle

Bi Bi+1

Execute thread B while thread A waits for memoryExecute thread B while thread A waits for memory

Multi-core has a similar effect

Page 14: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200614

Multi-core tolerates memory latency

Serial Execution

Ai Ai+1 Bi IdleIdle Bi+1

Multi-core Execution

Ai Idle Ai+1

Bi Bi+1Idle

Execute thread A and B simultaneouslyExecute thread A and B simultaneously

Page 15: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200615

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

1970 1975 1980 1985 1990 1995 2000 2005

Introduction Date

Cor

e A

rea

- mm

^2

PPro

Pentium Processor

Banias

Pentium 4 processor

Merom4004

386

Pentium II processor

486

McKinley

Prescott

Yonah

Core Area (with L1 caches) TrendDie area/core shrinking after peaking with PPro

Why shrinking? Diminishing returns on performance.

Interconnect. Caches. Complexity.

Page 16: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200616

Reliability in the long term

Soft Error FIT/Chip (Logic & Mem)

0

50

100

150

180130 90 65 45 32 22 16

Rel

ativ

e

~8% degradation/bit/generation

Extreme device variations

0

50

100

100 120 140 160 180 200Vt(mV)

Rel

ativ

e

Wider

Time dependent device degradation

0

1

1 2 3 4 5 6 7 8 9 10Time

Ion

Rel

ativ

e

In future process generations, soft and hard errors will be more common.

Page 17: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200617

Redundant multi-threading: an architecture for fault detection and recovery

Two copies of each architecturally visible thread

Compare results: signal fault if different

Memory System

Sphere of Replication

OutputComparison

InputReplication

LeadingThread

TrailingThread

Multi-core enables many possible designs for redundant threading.

Page 18: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200618

Transistor Count

1.E+03

1.E+05

1.E+07

1.E+09

1970 1980 1990 2000 2010 2020

2.0x

Frequency (MHz)

1

10

100

1000

10000

100000

1970 1980 1990 2000 2010 2020

1.5x

2.0x

CPU Power (W)

10

100

1000

1990 1995 2000 2005 2010 2015

1.4x

Feature Size (um)

0.01

0.1

1

10

1970 1980 1990 2000 2010 2020

0.70x

0.79x

Moore’s Law at Intel 1970-2005

Multi-core addresses power, performance, memory, complexity, reliability

Page 19: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200619

Moore’s Law will provide transistors

Intel process technology capabilitiesIntel process technology capabilities

High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018

Feature Size 90nm 65nm 45nm 32nm 22nm 16nm 11nm

128

8nm

Integration Capacity(Billions of Transistors) 2 4 8 16 32 64 256

50nm

Transistor for Transistor for 90nm Process90nm Process

Source: IntelSource: Intel

Influenza VirusInfluenza VirusSource: CDCSource: CDC

Use transistors for multiple cores, caches, and new features.

Page 20: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200620

Multi-Core Processors

CLOVERTOWNCLOVERTOWN

WOODCRESTWOODCRESTQuad CoreQuad Core

PENTIUMPENTIUM®® MMDual CoreDual Core

Single CoreSingle Core

Page 21: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200621

Future Architecture: More Cores

Open issuesCores• How many?• What size?• Homogeneous?• Heterogeneous?

On-die Interconnect• Topology• Bandwidth

Cache hierarchy• Number of levels• Sharing• Inclusion

Scalability

Power delivery and managementPower delivery and management

High bandwidth memoryHigh bandwidth memory

Reconfigurable cacheReconfigurable cache

Scalable fabricScalable fabric

FixedFixed--function unitsfunction units

CoreCore CoreCore

CoreCore CoreCore

CoreCore CoreCore

CoreCore

CoreCore

CoreCore

CoreCore CoreCore

CoreCore CoreCore

CoreCore CoreCore

Page 22: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200622

The Importance of Threading

Do Nothing: Benefits Still Visible• Operating systems ready for multi-processing• Background tasks benefit from more compute resources• Virtual machines

Parallelize: Unlock the Potential• Native threads• Threaded libraries• Compiler generated threads

Page 23: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200623

Performance Scaling

0123456789

10

0 10 20 30Number of Cores

Perf

orm

ance

Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N)

Serial% = 6.7%N = 16, N1/2 = 8

16 Cores, Perf = 8

Serial% = 20%N = 6, N1/2 = 3

6 Cores, Perf = 3

Parallel software key to Multi-core successParallel software key to MultiParallel software key to Multi--core successcore success

Page 24: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200624

How does Multicore Change Parallel Programming?

No change in fundamental programming model

Synchronization and communication costs greatly reduced

• Makes it practical to parallelize more programs

Resources now shared

• Caches

• Memory interface

• Optimization choices may be different

P1

cache

P2 P3 P4

cache cache cache

Memory

SMP

C1

cache

Memory

C2 C3 C4

cache cache cache

CMP

Page 25: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200625

Threading for Multi-Core

Introducing Introducing ThreadsThreads

Architectural Architectural AnalysisAnalysis IntelIntel®® VTuneVTune™™ AnalyzersAnalyzers

IntelIntel®® C++ CompilerC++ Compiler

DebuggingDebugging IntelIntel®® Thread CheckerThread Checker

PerformancePerformanceTuningTuning IntelIntel®® Thread ProfilerThread Profiler

Intel has a full set of tools for parallel programming

Page 26: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200626

The Era Of TeraTerabytes of data. Teraflops of performance.

ImprovedImprovedMedical careMedical care Machine Machine

visionvision

TeleTele--present meetingspresent meetings

Interactive learning Interactive learning

Source: Steven K. Feiner, Columbia University.

Immersive 3DImmersive 3Dentertainmententertainment

Virtual Virtual realitiesrealities

Courtesy of the Electronic Visualization Laboratory, Univ. of Illinois at Chicago.

When personalcomputing finally

becomespersonal

Scientific SimulationScientific SimulationCourtesy TsinghuaUniversity HPC Center

Text Text MiningMining

Computationally intensive applications of the future will be highly parallel

TeleTele--present present meetingsmeetings

X

Y

Page 27: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200627

Slide fromSlide from

Patterson 2006Patterson 200621st Century Computer Architecture

Old CW: Since cannot know future programs, find set of old programs to evaluate designs of computers for the future

• E.g., SPEC2006

What about parallel codes?

• Few available, tied to old models, languages, architectures, …

New approach: Design computers of future for numerical methods important in future

Claim: key methods for next decade are 7 dwarves (+ a few), so design for them!

• Representative codes may vary over time, but these numerical methods will be important for > 10 years

Patterson, UC Berkeley, also predicts importance of parallel applications.

Page 28: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200628

Slide fromSlide from

Patterson 2006Patterson 2006Phillip Colella’s “Seven dwarfs”

High-end simulation in the physical sciences = 7 numerical methods:

Structured Grids (including locally structured grids, e.g. Adaptive Mesh Refinement)

Unstructured Grids

Fast Fourier Transform

Dense Linear Algebra

Sparse Linear Algebra

Particles

Monte Carlo

If add 4 for embedded, covers all 41 EEMBC benchmarks

8. Search/Sort9. Filter

10. Combinational logic11. Finite State Machine

Note: Data sizes (8 bit to 32 bit) and types (integer, character) differ, but algorithms the same

Slide from “Defining Software Requirements for Scientific Computing”, Phillip Colella, 2004

Well-defined targets from algorithmic, software, and architecture standpoint

Note scientific computing, media processing, machine learning, statistical computing share many algorithms

Page 29: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200629

Workload Acceleration

RMS Workload Speedup (Simulated)RMS Workload Speedup (Simulated)

0

8

16

24

32

40

48

56

64

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64

# of Cores

Spee

dup

Ray Tracing (Avg.)

Mod. Levelset

FB_Est

Body Tracker

Gauss-Seidel

Sparse Matrix (Avg.)

Dense Matrix (Avg.)

Kmeans

SVD

SVM_classification

Cholesky (watson)

Cholesky (mod2)

Forward Solver (pds-10)

Forward Solver (world)

Backward Solver (pds-10)Backward Solver(watson)

48x - 57x

19x - 33x

Recognition

Mining

Synthesis

Group I Group I –– Scale well with increasing core countScale well with increasing core countExamples: Ray tracing, body tracking, physical simulationExamples: Ray tracing, body tracking, physical simulation

Group II Group II –– WorstWorst--case scaling examples, yet still scalecase scaling examples, yet still scaleExamples: Forward & backward solversExamples: Forward & backward solvers

Source: Intel CorporationSource: Intel Corporation

Page 30: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200630

Tera-Leap to Parallelism:EN

ERG

Y-E

FFIC

IEN

T P

ERFO

RM

AN

CE

TIME

Instruction level parallelism

Hyper-Threading

Dual Core

Quad-Core

Tera-ScaleComputing

SingleSingle--core chipscore chips

More performanceMore performanceUsing Using lessless energyenergy

ScienceSciencefictionfiction

becomes becomes A REALITYA REALITY

Energy Efficient PerformanceEnergy Efficient Performance

Page 31: Why Intel is designing multi-core processors...• Few available, tied to old models, languages, architectures, … New approach: Design computers of future for numerical methods important

July, 200631

Summary

Technology is driving Intel to build multi-core processors.– Power– Performance– Memory latency– Complexity– Reliability

Parallel programming is a central issue.

Parallel applications will become mainstream