manycores in the future rob schreiber hp labs. dont forget these views are mine, not necessarily hps...

27
Manycores in the Future Rob Schreiber hp labs

Upload: evelyn-davidson

Post on 26-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Manycores in the Future

Rob Schreiberhp labs

Page 2: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Don’t Forget

These views are mine, not necessarily HP’s

Never make forecasts, especially about the future

― Sam Goldwyn

Page 3: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

hp labs, 1939

Page 4: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

HP/ HP Labs Today

• World’s biggest technology company, 2006 sales $91B, #14 in the US.

• Printing, PCs, servers, software, services• HP Labs has 700 researchers

−Palo Alto, Bristol, Haifa, Beijing, Bangalore, Tokyo, St. Petersburg

−Invests in medium and long-term research that has a good potential for return on the investment

−New director -- Prith Banerjee, dean of UIC College of Engineering

−www.hpl.hp.com

Page 5: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

The Future. It seems clear that:Single-thread performance is not getting better

All machines will be parallel

Further speedup will come to the extent that we can use the parallel hardware effectively

Parallelism has been a huge success in scientific computing

Communication bandwidth and energy efficiency are the key limits to improved performance

We should not make the next generation of parallel machines any harder to program than they are now

Page 6: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Moore’s Law

• Number of transistors per chip is 1.59year-1959

−Now slope is less; but we should see 10 -- 100X or more growth (65 nm – sub 10 nm)

• Classical performance scaling model – performance grows as O(n3)

−With feature size scaling of n

•You get O(n2) transistors

•They run O(n) times faster

Page 7: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

How long will this last?

There’s no getting around the fact that we make these things out of atoms

– Gordon Moore

Page 8: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Single core/thread performanceMoore’s Law says number of transistors scaling as O(n2) and

speed as O(n)

Microprocessor performance should scale as O(n3)

For quite some time, it hasn’t

N3

Era

N2

Era

N1

Era

N0

N-1

Number of Transistors

(lo

g)

Pe

rfo

rma

nce

Efficiency

Page 9: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

N3 EraExpansion of data paths from 4 to 32 bits

Pipelining, floating point hardware

N2 EraLarge caches – miss rate ~ (cache size)1/2

Wide issue – double the IPC with quad issue

N1 EraVery little benefit from increases in issue

width and cache size for many applications

Slowdown due to size, long wires

Page 10: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Microprocessor Power

• Figure source: Shekhar Borkar, “Low Power Design Challenges for the Decade”, Proceedings of the 2001 Conference on Asia South Pacific Design Automation, IEEE.

Page 11: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Voltage Scaling

Power is CV2f

Lowered voltage has reduced power (12/1.1)2 = 119X over 24 years!

ITRS projects minimum voltage of 0.7V in 2018

Only (1.1/0.7)2 = 2.5X reduction left in next 14 years!

Conclusion: Where GHz is concerned, we are close to the practical limit.

Page 12: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

How Big?

The Memory WallThe Power Wall

Page 13: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Data center thermal management

Modeling datacenters with CFD

Static (design time) and dynamic Smart Cooling

Page 14: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Does it matter, the end of GHz?

Word won’t go any faster

The problem in commercial computing is to keep up with the enormous volume of data

The problem in scientific computing is to keep up with the enormous volume of data

Throughput is needed. Parallelism works

491 of TOP500 have > 256 processors

512 – 2048 processors is the “sweet spot” today for scientific machines

Page 15: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Where are we today?

Intel Xeon:2007: 45nm – 4 cores 2008: 32 nm – 8 cores2010: 22 nm – 16 cores

Intel ships more multi than unicore chips, Q406All these have < 3GHz clocks

80 small, low power cores are possible in 65 nm

Page 16: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

The Future, Part I

More than 100 cores, perhaps 1000, will be possible in server-oriented parts optimized for maximum performance per watt

In 10 –15years we may be looking at 10 Tflops on a socket

Page 17: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

What changes with manycores?

• Flops are really free • Communication (between cores, with

memory) is costly−Memory bandwidths of 5 GB/s today, going up to

20 – 40 GB/s

−Flop rates headed towards 1Tf per socket

−Fixed clock rates means latency does not get any worse

−But the needed bandwidth scales linearly

Page 18: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

How Much Bandwidth Is Enough?

• Scientific and Commercial data-centric computing has high BW demands

• I/O bandwidth is critical in commercial computing

• HPCC Benchmarks (icl.cs.utk.edu/hpcc)show the ratio (bytes/flop) of bandwidth to compute

• 0.5 < (bytes/flop) < 2.0 for almost all the machines on the HPCC list

• A typical PC has much less bandwidth/flop

Page 19: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

How much bandwidth can we get?

• 1000 pins would provide TB/s bandwidths• But at a minimum of

2 x 10^{-12} J/b * 10^13 b/s = 20 W• 10TB/s = 200 W or more

Page 20: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Don’t Caches Make BW Less Important?

• Some kernels (dense matrix ops) cache perfectly, need very little memory BW

• Unfortunately, handling large meshes and graphs, iterative solution methods, multigrid do not

• Even when cache works, writing the programs is a formidable job−vendor BLAS

−self tuned libraries

−multiple levels of blocking

−doing more work to save time

Page 21: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

What about communication?• On chip networks

−two-dimension meshes are a natural thing on a chip

−but they have been tried and rejected in HPC

• Stacked memory−capacity

−cooling

• Optics (integrated on board and on chip)−the energy costs can be low and the bandwidth

can be high

−more onchip and offchip bandwidth at reasonable power?

−cost, reliability, manufacturability…

Page 22: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

The Future, Part II

Without a breakthrough in memory bandwidth, a lot of the potential parallel applications that could use manycore chips won’t be able to do so

This will be a serious problem for the industry and its customers

Page 23: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Architectures, Accelerators

1985 – 2005: The “killer micro” made all other machines obsolete

Slowdown of single cores appears to open the door to other architectures

FPGAs, GPGPUs, and accelerators

Example: Clearspeed32 SIMD lanes with local memory

block data transfers from main memory under program control, overlapped with computation

Page 24: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

But if flops are free…

Move functions into the chip, onto the cores

NICs

Computational kernels

Graphics

Makes it tough to sell a machine that accelerates computation

Page 25: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

Writing the Programs

There are some new things worth tryingGAS languages for scientific computing

Transactions, for more complicated algorithms

There now is a parallel Matlab

Improvements to the architecture can have a big impact on programmability

Lower latency across chip than board Higher bandwidth to memory Fast synchronization Use some of the cores to help with communication

Page 26: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

I hope it is even more clear that:

Single-thread performance not getting better

All machines will be parallel very soon

There are a lot of apps involving enormous datasets that have plenty of parallelism

Further throughput by using the parallel hardware effectively

Communication bandwidth and energy efficiency are the key limits to improved performance

We may not need to make parallel machines any harder to program than they are now

Page 27: Manycores in the Future Rob Schreiber hp labs. Dont Forget These views are mine, not necessarily HPs Never make forecasts, especially about the future

hp labs, 2007