debunking the 100x gpu vs cpu myth: an evaluation of throughput computing on cpu and gpu victor w....

24
Debunking the 100X GPU Debunking the 100X GPU vs CPU Myth: An vs CPU Myth: An Evaluation of Evaluation of Throughput Computing Throughput Computing on CPU and GPU on CPU and GPU Victor W. Lee, Victor W. Lee, et al. et al. Intel Corporation Intel Corporation ISCA ’10 June 19-23, 2010, ISCA ’10 June 19-23, 2010, Saint-Malo, France Saint-Malo, France

Upload: suzanna-oneal

Post on 23-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Debunking the 100X Debunking the 100X GPU vs CPU Myth: An GPU vs CPU Myth: An

Evaluation of Evaluation of Throughput Computing Throughput Computing

on CPU and GPUon CPU and GPU

Victor W. Lee, Victor W. Lee, et al.et al.Intel CorporationIntel Corporation

ISCA ’10 June 19-23, 2010, ISCA ’10 June 19-23, 2010, Saint-Malo, FranceSaint-Malo, France

Page 2: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Mythbusters view on the Mythbusters view on the topictopic

CPU vs GPUCPU vs GPU http://videosift.com/video/MythBusters-

CPU-vs-GPU-or-Paintball-Cannons-are-Cool

Full movie:Full movie: http://www.nvidia.com/object/nvision08_gpu

_v_cpu.html

Page 3: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

The Initial ClaimThe Initial Claim

Over the past 4 years NVIDIA has made Over the past 4 years NVIDIA has made a great many claims regarding how a great many claims regarding how porting various types of applications to porting various types of applications to run on GPUs instead of CPUs can run on GPUs instead of CPUs can tremendously improve performance by tremendously improve performance by anywhere from 10x to 500x.anywhere from 10x to 500x.

But it actually began much earlier But it actually began much earlier (SIGGRAPH 2004) (SIGGRAPH 2004) http://pl887.pairlitesite.com/talks/2004-08-

08-GP2-CPU-vs-GPU-BillMark.pdf

Page 4: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Intel’s Response?Intel’s Response?

  Intel, unsurprisingly, sees the Intel, unsurprisingly, sees the situation differently, but has situation differently, but has remained relatively quiet on the remained relatively quiet on the issue, possibly because Larrabee issue, possibly because Larrabee was going to be positioned as a was going to be positioned as a discrete GPU.  discrete GPU. 

Page 5: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Intel’s Response?Intel’s Response?

The recent announcement that Larrabee has The recent announcement that Larrabee has been repurposed as an HPC/scientific been repurposed as an HPC/scientific computing solution may therefore be partially computing solution may therefore be partially responsible for Intel ramping up an offensive responsible for Intel ramping up an offensive against NVIDIA's claims regarding GPU against NVIDIA's claims regarding GPU computing. computing.

At the International Symposium On Computer At the International Symposium On Computer Architecture (ISCA) this June, a team from Architecture (ISCA) this June, a team from Intel presented a whitepaper purporting to Intel presented a whitepaper purporting to investigate the real-world performance delta investigate the real-world performance delta between between CPUs and GPUs.   and GPUs. 

Page 6: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

But before that….But before that….

December 16, 2009December 16, 2009 One month after ISCA’s final papers were One month after ISCA’s final papers were

due.due.

The Federal Trade Commission filed an The Federal Trade Commission filed an antitrust-related lawsuit against Intel Wednesday, accusing the chip maker of deliberately , accusing the chip maker of deliberately attempting hurt its competition and attempting hurt its competition and ultimately consumers. ultimately consumers. 

The The Federal Trade Commission's complaint against Intel for alleged anticompetitive  against Intel for alleged anticompetitive practices has a new twist: graphics chips.practices has a new twist: graphics chips.

Page 7: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

2009 was expensive for 2009 was expensive for IntelIntel

The The European Commission fined Intel for  for nearly 1.5 billion USD, nearly 1.5 billion USD,

the the US Federal Trade Commission sued Intel on  on anti-trust grounds, and anti-trust grounds, and 

Intel settled with AMD for another 1.25  for another 1.25 billion USD. billion USD. If nothing else it was an expensive year, and If nothing else it was an expensive year, and

while Intel settling with AMD was a significant while Intel settling with AMD was a significant milestone for the company it was not the end of milestone for the company it was not the end of their troubles.their troubles.

Page 8: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Finally the settlement(s)Finally the settlement(s)

The EU Fine is still under appeal The EU Fine is still under appeal ($1.45B)($1.45B)

8/4/2010 Intel Settles with the FTC8/4/2010 Intel Settles with the FTC

Then there is the whole Dell issue….Then there is the whole Dell issue….

Page 9: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

So back to the paper, So back to the paper, What did Intel Say?What did Intel Say?

Throughput ComputingThroughput Computing

KernelsKernels What is a kernel?What is a kernel?

Kernels selected:Kernels selected: SGEMM, MC, Conv, FFT, SAXPY, LBM, SGEMM, MC, Conv, FFT, SAXPY, LBM,

Solv, SpMV, GJK, Sort, RC, Search, Solv, SpMV, GJK, Sort, RC, Search, Hist, BilatHist, Bilat

Page 10: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

The Hardware selectedThe Hardware selected

CPU:CPU: 3.2GHz Core i7-960, 6GB RAM3.2GHz Core i7-960, 6GB RAM

GPUGPU 1.3GHz eVGA GeForce GTX280 w/ 1GB1.3GHz eVGA GeForce GTX280 w/ 1GB

Page 11: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Optimizations:Optimizations:

CPUCPU Mutithreading, Mutithreading, cache blocking, and cache blocking, and reorganization of memory accesses for reorganization of memory accesses for

SIMDificationSIMDification GPUGPU

Minimizing global synchronization, and Minimizing global synchronization, and using local shared buffers.using local shared buffers.

Page 12: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

This even made SlashdotThis even made Slashdot

Hardware:  Intel, NVIDIA Take Shots At CPU vs. GPU Performance

Page 13: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

And PCWorldAnd PCWorld Intel: 2-year-old Nvidia GPU Intel: 2-year-old Nvidia GPU

Outperforms 3.2GHz Core I7Outperforms 3.2GHz Core I7 Intel researchers have published the results Intel researchers have published the results

of a performance comparison between their of a performance comparison between their latest quad-core Core i7 processor and a latest quad-core Core i7 processor and a two-year-old Nvidia graphics card, and two-year-old Nvidia graphics card, and found that the Intel processor can't match found that the Intel processor can't match the graphics chip's parallel processing the graphics chip's parallel processing performance. performance.

http://www.pcworld.com/article/199758/http://www.pcworld.com/article/199758/intel_2yearold_nvidia_gpu_outperforms_32gintel_2yearold_nvidia_gpu_outperforms_32ghz_core_i7.html hz_core_i7.html

Page 14: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

From the paper's abstract:From the paper's abstract: In the past few years there have been In the past few years there have been

many studies claiming GPUs deliver many studies claiming GPUs deliver substantial speedups ...over multi-core substantial speedups ...over multi-core CPUs...[W]e perform a rigorous CPUs...[W]e perform a rigorous performance analysis and find that after performance analysis and find that after applying optimizations appropriate for applying optimizations appropriate for both CPUs and GPUs the performance both CPUs and GPUs the performance gap between an Nvidia GTX280 processor gap between an Nvidia GTX280 processor and the Intel Core i7 960 processor and the Intel Core i7 960 processor narrows to only 2.5x on average. narrows to only 2.5x on average.

Do you have a problem with this statement?Do you have a problem with this statement?

Page 15: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Intel's own paper indirectly raises a Intel's own paper indirectly raises a question when it notes:question when it notes: The previously reported LBM number on The previously reported LBM number on

GPUs claims 114X speedup over CPUs. GPUs claims 114X speedup over CPUs. However, we found that with careful However, we found that with careful multithreading, reorganization of memory multithreading, reorganization of memory access patterns, and SIMD optimizations, access patterns, and SIMD optimizations, the performance on both CPUs and GPUs the performance on both CPUs and GPUs is limited by memory bandwidth and the is limited by memory bandwidth and the gap is reduced to only 5X. gap is reduced to only 5X.

Page 16: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

What is important about What is important about the context?the context?

The International Symposium on The International Symposium on Computer Architecture (ISCA) in Computer Architecture (ISCA) in Saint-Malo, France, interestingly Saint-Malo, France, interestingly enough, is the same event where enough, is the same event where NVIDIA’s Chief Scientist Bill Dally NVIDIA’s Chief Scientist Bill Dally received the prestigious 2010 received the prestigious 2010 Eckert-Mauchly Award for his Eckert-Mauchly Award for his pioneering work in architecture for pioneering work in architecture for parallel computing. parallel computing.

Page 17: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

NVIDIA Blog Response:NVIDIA Blog Response:

It’s a rare day in the world of technology It’s a rare day in the world of technology when a company you compete with stands when a company you compete with stands up at an important conference and up at an important conference and declares that your technology is *only* up declares that your technology is *only* up to 14 times faster than theirs. to 14 times faster than theirs.

http://blogs.nvidia.com/blog/2010/06/23/http://blogs.nvidia.com/blog/2010/06/23/gpus-are-only-up-to-14-times-faster-than-gpus-are-only-up-to-14-times-faster-than-cpus-says-intel/cpus-says-intel/

Page 18: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

NVIDIA Blog Response: NVIDIA Blog Response: (cont)(cont)

The real myth here is that multi-core The real myth here is that multi-core CPUs are easy for any developer to CPUs are easy for any developer to use and see performance use and see performance improvements. improvements.

Page 19: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Undergraduate students learning Undergraduate students learning parallel programming at M.I.T. parallel programming at M.I.T. disputed this when they looked at the disputed this when they looked at the performance increase they could get performance increase they could get from different processor types and from different processor types and compared this with the amount of compared this with the amount of time they needed to spend in re-time they needed to spend in re-writing their code. writing their code.

According to them, for the same According to them, for the same investment of time as coding for a investment of time as coding for a CPU, they could get more than 35x CPU, they could get more than 35x the performance from a GPU. the performance from a GPU.

Page 20: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Despite substantial investments in Despite substantial investments in parallel computing tools and libraries, parallel computing tools and libraries, efficient multi-core optimization efficient multi-core optimization remains in the realm of experts like remains in the realm of experts like those Intel recruited for its analysis. those Intel recruited for its analysis.

In contrast, the CUDA parallel In contrast, the CUDA parallel computing architecture from NVIDIA is computing architecture from NVIDIA is a little over 3 years old and already a little over 3 years old and already hundreds of consumer, professional hundreds of consumer, professional and scientific applications are seeing and scientific applications are seeing speedups ranging from 10 to 100x using speedups ranging from 10 to 100x using NVIDIA GPUs. NVIDIA GPUs.

Page 21: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

QuestionsQuestions

Where did the 2.5x, 5x, and 14x Where did the 2.5x, 5x, and 14x come from?come from?

How big were the problems that How big were the problems that Intel used for comparisons? Intel used for comparisons? [compare w/ cache size][compare w/ cache size]

How were they selected?How were they selected? What optimizations were done?What optimizations were done?

Page 22: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Fermi cards were almost certainly Fermi cards were almost certainly unavailable when Intel commenced unavailable when Intel commenced its project, but it's still worth noting its project, but it's still worth noting that some of the GF100's that some of the GF100's architectural advances partially architectural advances partially address (or at least alleviate) certain address (or at least alleviate) certain performance-limiting handicaps Intel performance-limiting handicaps Intel points to when comparing Nehalem points to when comparing Nehalem to a GT200 processor. to a GT200 processor.

Page 23: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Bottom LineBottom Line

Parallelization is hard, whether Parallelization is hard, whether you're working with a quad-core x86 you're working with a quad-core x86 CPU or a 240-core GPU; each CPU or a 240-core GPU; each architecture has strengths and architecture has strengths and weaknesses that make it better or weaknesses that make it better or worse at handling certain kinds of worse at handling certain kinds of workloads. workloads.

Page 24: Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Victor W. Lee, et al. Intel Corporation ISCA ’10 June 19-23, 2010,

Other ReadingOther Reading

On the Limits of GPU Acceleration http://www.usenix.org/event/hotpar1http://www.usenix.org/event/hotpar10/tech/full_papers/Vuduc.pdf 0/tech/full_papers/Vuduc.pdf