gpus and cpus: the uneasy alliance panel discussion
Post on 20-Dec-2015
220 views
TRANSCRIPT
GPUs and CPUs:The Uneasy Alliance
Panel Discussion
2
Panelists
• Neil Trevett, 3Dlabs• Michael Doggett, ATI• Adam Lake, Intel• David Kirk, NVIDIA• Bill Mark, University of Texas at
Austin
Moderator• Peter N. Glaskowsky, MemoryLogix
Neil Trevett3Dlabs
Neil Trevett is Senior Vice President for Market Development at 3Dlabs, Inc. Trevett also serves as President of the Web3D Consortium and secretary of the Khronos Group developing the OpenML and OpenGL ES standards for dynamic media processing and graphics APIs for embedded appliances and applications.
© Copyright 3Dlabs 2004 - Page 4
GP2 MusingsGP2 MusingsNeil Trevett, Senior VP Market Development, 3DlabsNeil Trevett, Senior VP Market Development, 3Dlabs
President, Khronos GroupPresident, Khronos GroupLos Angeles 2004Los Angeles 2004
© Copyright 3Dlabs 2004 - Page 5
CPUs and GPUs – Dynamic TensionCPUs and GPUs – Dynamic TensionCPUs and GPUs – Dynamic TensionCPUs and GPUs – Dynamic Tension
CPUs and GPUs exist because of their different design goalsCPUs – maximize performance and minimize cost of executing SCALAR codeGPUs – exploit parallelism to beat CPUs at executing VECTOR code
BUT - GPUs are rapidly integrating many CPU techniques Learned and refined by the CPU community over decades
Demand-Paged Virtual Memory – 256GB
Virtual Shader Program Memory - 256K instructions
Efficient multi-tasking and isochronous channel
512MB memory -> 1GB memory
High-level Language Programmability
Advanced GPUs designed exclusively for PROFESSSIONAL PRODUCTIVITY
If you would like try a Wildcat Realizm boardemail [email protected]
A message from your sponsor
© Copyright 3Dlabs 2004 - Page 6
CPUs and GPUs – Dynamic TensionCPUs and GPUs – Dynamic TensionCPUs and GPUs – Dynamic TensionCPUs and GPUs – Dynamic Tension
Fundamentally different designs finding increasingly common ground
Increasing commonality creates possibilities for tighter integrationE.g. merge virtual address spaces with cache coherency Would enable new CPU/GPU cooperative paradigmsPossibility of increased coprocessor linkageBreak the AGP/PCIe bottleneck
CPU GPUFundamental differences
in design approach
Increasing areas of commonality
CPUSubsystem
CPUSubsystem
GPUSubsystem
GPUSubsystem
Cache Coherent Unified Virtual
Memory Space
Cache Coherent Unified Virtual
Memory Space
© Copyright 3Dlabs 2004 - Page 7
GPUs – More Than Graphics Processors?GPUs – More Than Graphics Processors?GPUs – More Than Graphics Processors?GPUs – More Than Graphics Processors?
The volume of graphics shipments has created the GPU phenomenonIngenious work ongoing to find alternative uses for these graphics machines
Can GPUs be modified to address non-graphics needs?E.g. double precision, less SIMD more MIMD, more general data storage
Primarily an economic questionNot just technology
Does reaching for new markets decrease your graphics market share?Increased costs bring no benefit for core market
Graphics
Market Design Spectrum
$
ImagingHPC
Design Shift will only occur if the “Integral of Achieved Profit” is increased
Shifting this far – decreases
effectiveness in graphics market?
Probably a small stretch for increased
volume
© Copyright 3Dlabs 2004 - Page 8
Programming GPUs – Industry ChallengeProgramming GPUs – Industry ChallengeProgramming GPUs – Industry ChallengeProgramming GPUs – Industry Challenge
GPU microarchitectures will not be exposed externally any time soonToo much intellectual property would be exposedWould create too much architectural inertia at a time of rapid innovation
Agree that Domain Specific Libraries are effective, pragmatic approachGood to start solving specific real problems now
But we should aim higher than just a library approach?Feels like we need to expose the full flexibility of programmability
Creating effective industry programming infrastructure is a challenge
DomainLanguages
EvolvingGPU
architectures
Firewall to GPU ISAs
DomainLanguagesDomain
LanguagesDomainLanguagesDomain
Languages
EvolvingGPU
architecturesEvolving
GPU architectures
EvolvingGPU
architecturesEvolving
GPU architectures
EvolvingGPU
architecturesCombinatorial
Problem
© Copyright 3Dlabs 2004 - Page 9
Combine desirable
features from the different approaches
Industry Standard Virtual Machine?Industry Standard Virtual Machine?Industry Standard Virtual Machine?Industry Standard Virtual Machine?
Could a Virtual Machine standard avoid combinatorial explosion?Uncouples multiple languages from multiple GPUsTarget for domain language architects AND enables innovation by GPU vendors
Create an open and cross-platform industry standard virtual machine?Correct virtual machine could help and persuade GPUs evolve into stream processors
What should that virtual machine be?Can we work together to figure out this key question?
ARB Vertex and Fragment extensions?
OpenGL Shading Language?
Brook or sh?
Domain Languages
GPUs
Too-graphics oriented, too low-level to track the capabilities of evolving GPU architecture?
Too-graphics oriented? Effectively a graphics Domain Specific Library – with the flexibility of programmability?
Can be extended for more generality? What direction should the OpenGL ARB take?
The level of abstraction we need to break out of the graphics mind-set? TOO big a leap from graphics base?
Too high-level to be a useful virtual machine?
Virtual Machines
© Copyright 3Dlabs 2004 - Page 10
OpenGL ES 2.0OpenGL ES 2.0
OpenGL 2.0OpenGL 2.0
OpenGL ES 1.1OpenGL ES 1.1
OpenGL ES 1.0OpenGL ES 1.0
Battery Powered GPUs!Battery Powered GPUs!Battery Powered GPUs!Battery Powered GPUs!
The Khronos Group is now defining OpenGL ES 2.0The OpenGL Shading Language comes to cell phones!
Driven hard cell-phone industry for compelling hand-held gamingAggressive development to match the availability of GPUs in handsets
OpenGL ES 2.0 will not just be in phones – e.g. games consolesSony Playstation is a Khronos Member
OpenGL 1.3OpenGL 1.3
OpenGL 1.5OpenGL 1.5
Enabled software AND hardware 3D engines – including small-footprint, low-end fixed point platforms
GLSL-based Shader programmability for embedded devices. Tackling
issues such as remote compilation
Mid-03 Mid-04 Mid-05
Increased emphasis on hardware acceleration and
enhanced 3D pipeline
© Copyright 3Dlabs 2004 - Page 11
Embedded Industry - GP2 Genetic DiversityEmbedded Industry - GP2 Genetic DiversityEmbedded Industry - GP2 Genetic DiversityEmbedded Industry - GP2 Genetic Diversity
Cell phones – 100Ms units a year that will have GPUs3D gaming now PLUS phones mutating to general-purpose personal compute devices
Size, power and cost - low-power design now getting lot of attentionInteresting for build handhelds AND large arrays for HPC etc.
Embedded industry has fast innovation, flexible infrastructureTight CPU/GPU integration might happen here first – systems on a chip
Programmable acceleration avoids multiple media acceleration blocksA programmable GPU can accelerate 3D, images, video, audio, speech and ….OpenMAX – a new Khronos standard – domain specific primitive libraries
Uneasy alliance with DSPs too!!Will GPUs even assume some baseband processing?
ARMCPUCore
ARMCPUCore
LowPowerGPUCore
LowPowerGPUCore
Cache Coherent
Unified Virtual Memory Space
Cache Coherent
Unified Virtual Memory Space
Single Chip
Domain-specific primitive libraries – can be accelerated on GPUs
Michael DoggettATI
Michael Doggett is an architect at ATI. He is working on upcoming graphics hardware for microsoft and desktop PC graphics chips. Before joining ATI, Doggett was a post doc at the University of Tuebingen in Germany and completed his Ph.D. at the University of New South Wales in Sydney, Australia.
GPUs and CPUs: The Uneasy AllianceMike DoggettATI
14GPUs and CPUs: The Uneasy Alliance?
GPUs
• Not stream processors• Graphics black box• Deep pipeline
–Arithmetic intensity
15GPUs and CPUs: The Uneasy Alliance?
GPUs
• How to get new features into GPUs ?–Get game developers to use them
• Architectural Specs–API definition–GPUBench
• Double precision–Performance tradeoff–Simulated double
16GPUs and CPUs: The Uneasy Alliance?
GPU future
• Competitive market• More of the same
Adam LakeIntel
Adam Lake is a Sr. Software Engineer at Intel specializing in 3D graphics. Previous areas of work include stream processing, compilers for high level shading languages, and non-photorealistic rendering. He holds an M.S. degree from the University of North Carolina at Chapel Hill.
A few A few alternatives…alternatives…
19
Intel IXPIntel IXP Network Processor Family Network Processor Family
20
IXP Perf. CharacteristicsIXP Perf. Characteristics IXP2800 [Intel02]IXP2800 [Intel02]
51 GB/s peak to RDRAM51 GB/s peak to RDRAM 3 RDRAM channels input and output, total aggregate@533 3 RDRAM channels input and output, total aggregate@533
MHzMHz 32 GB/s peak to SDRAM32 GB/s peak to SDRAM
4 QDR II SDRAM ports (2 read/2write) @250 MHz4 QDR II SDRAM ports (2 read/2write) @250 MHz Example Application: 10GB/s EthernetExample Application: 10GB/s Ethernet 1.4 GHz clock rate1.4 GHz clock rate
IXP2400 4,800 MIPSIXP2400 4,800 MIPS IXP1200 1,200 MIPSIXP1200 1,200 MIPS Notes:Notes:
NO FPU!!NO FPU!! Packet arrival rate determines # instructions Packet arrival rate determines # instructions
executed per packetexecuted per packet
21
Key takeaways for IXPKey takeaways for IXP
Designed for Network processing Designed for Network processing workloadsworkloads
Switch on event model for hardware Switch on event model for hardware resourcesresources
No FPU, nor plans for FPUNo FPU, nor plans for FPU Improving software stackImproving software stack
Shangri-la projectShangri-la project
22
MXP5800MXP5800
23
Specs of MXP5800Specs of MXP5800
Internal B/WInternal B/W 532 Mbytes/S/Connection532 Mbytes/S/Connection
Theoretical External B/WTheoretical External B/W 1 GByte/S1 GByte/S
130 nm130 nm 256 MHz256 MHz 35 mm x 35 mm die35 mm x 35 mm die
24
Key takeaways from MXPKey takeaways from MXP
Not a general purpose Not a general purpose MicroprocessorMicroprocessor
Shipping today with software toolsShipping today with software tools One common ISA for all execution One common ISA for all execution
unitsunits
25
So what’s the point?So what’s the point?
Some alternatives for general Some alternatives for general purpose computing on special purpose computing on special purpose hardwarepurpose hardware
Larger context of stream processing Larger context of stream processing architecturesarchitectures
26
Programming ModelsProgramming Models Getting the programming model right is Getting the programming model right is
hardhard Graphics architects got it right for graphicsGraphics architects got it right for graphics
Made harder if you try to be completely Made harder if you try to be completely generalgeneral
Reason: Increase generality, you lose Reason: Increase generality, you lose performanceperformance You can quickly lose any benefit of your You can quickly lose any benefit of your
stream programming modelstream programming model Fully general streaming, in the limit, is Fully general streaming, in the limit, is
multithreadingmultithreading
27
Call to ActionCall to Action For some applications in computational For some applications in computational
science and other domains performance is science and other domains performance is dominant factor, not costdominant factor, not cost
However, in other domains, cost is However, in other domains, cost is dominant:dominant: Purchase Price per MIPPurchase Price per MIP Not just raw performanceNot just raw performance
Call to actionCall to action Consider chipset implementations:Consider chipset implementations:
Analysis of GPGPU taking raw $ cost into accountAnalysis of GPGPU taking raw $ cost into account There are 3 options, not 2:There are 3 options, not 2:
CPU vs. CPU and chipset vs. GPUCPU vs. CPU and chipset vs. GPU
28
The BIG ProblemsThe BIG Problems
How do we program it?How do we program it? Programming ModelProgramming Model
How do we feed it?How do we feed it? Memory hierarchy and bandwidthMemory hierarchy and bandwidth
How do we keep it cool? How do we keep it cool? Power and Thermal requirements Power and Thermal requirements
provide significant challenges for ALL provide significant challenges for ALL architecturesarchitectures
David KirkNVIDIA
David Kirk has been NVIDIA's Chief Scientist since January 1997. Prior to joining NVIDIA, Kirk held positions at Crystal Dynamics and the Apollo Systems Division of Hewlett-Packard Company. Kirk holds M.S. and Ph.D. degrees in Computer Science from the California Institute of Technology.
vertex
setuprasterizer
pixel
texture
memory
per pixel texturefilter & x8 blending
(Year 2000) The GeForce256 Graphics Pipeline
vertextransform & lighting
per-pixelinterpolation
polygonpolygon setup &rasterization
Z-buffer, x8 blending& anti-aliasimage
vertex
setuprasterizer
pixel
texture
image
per-pixel texture, fp16 blending
(Year 2004)The GeForce6 Graphics Pipeline
programmable vertexprocessing (fp32)
programmable per-pixel math (fp32)
polygonpolygon setup,culling, rasterization
Z-buf, fp16 blending,anti-alias (MRT)
memory
data
setuprasterizer
data
data
data
data fetch, fp16 blending
(Year 2004)The GeForce6 NON-Graphics Pipeline
programmable MIMDprocessing (fp32)
programmable SIMDprocessing (fp32)
listsSIMD“rasterization”
predicated write, fp16blend, multiple output
memory
“GP” Processors
XShared peak Input bandwidth
Shared peak Output bandwidth
Dedicated peak Processing power
memory
Bill MarkUniversity of Texas at Austin
Bill Mark is an assistant professor in the Department of Computer Sciences at the University of Texas at Austin. Mark was the lead architect of NVIDIA's Cg language and development system. He holds a Ph.D. from the University of North Carolina at Chapel Hill.
GP2 Panel PresentationGP2 Panel Presentation
William Mark, University of Texas at Austin
We’re entering an era ofWe’re entering an era ofdisruptive changedisruptive changeWe’re entering an era ofWe’re entering an era ofdisruptive changedisruptive change
• Driven by VLSI technology– Too many transistors: CPU performance plateau– Heat/Power is now a first-class constraint– Possible to fit many processors on a single chip
• Two kinds of change coming:– Technical – single-chip parallel computation– Industry structure – pressure for vertical re-integration
What do we mean by“CPU vs. GPU”?What do we mean by“CPU vs. GPU”?
• General HW vs. specialized HW– GPU’s moving towards generality, but not fully there yet
• Sequential vs. Parallel– Latency optimized vs. Throughput optimized
• Two separate chips
• Different sets of companies (exception: Intel)
• Raw HW access vs. Managed code
Need at least two parallel programming modelsNeed at least two parallel programming models
• Stream model– Naturally exposes parallelism and communication– Easy to use, when problem maps well
• Communicating sequential processes (e.g. pthreads)– Explicitly exposes spatial dimension of HW parallelism– Efficiently supports data-dependent communication patterns– Useful for creating/modifying large irregular data structures– Harder to use – e.g. race conditions– Hard to get performance portability
HW must satisfymass-market needsHW must satisfymass-market needs
• Games will continue to dominate– Rendering– Simulation? – an opportunity
• Maximize impact of research by meeting game needs– Chicken/Egg problem: Co-evolve algorithms and architectures– Different visibility algorithms – ray casting?– Global illumination – shadows, ambient occlusion, reflection, …– Parallelize model management, simulation, game behavior, …
• Solving these problems will help other applications
2-year predictions2-year predictions
• CPU’s: multi-core trend accelerates– Multicore used by games and HPC
• GPU’s: More powerful streaming model– Scatter, gather, conditional streams, reductions, etc.– Start to see more success stories for GPGPU– But limits of stream model become apparent
• “Dark Horses” attract increasing attention– CELL and others
6-year predictions6-year predictions
• One processing chip for PC’s– Who makes it?
• Heterogeneous architecture for this chip:– Classical CPU– Parallel fine-grained shared memory (pthreads)– Parallel stream processor (Brook)
• Supports ray-casting visibility
• This architecture emerges in console space first
• This architecture meets many HPC needs
Peter N. GlaskowskyMemoryLogix
Peter Glaskowsky is Chief System Architect at MemoryLogix, a Silicon Valley microprocessor design startup. Formerly, Glaskowsky was editor in chief of Microprocessor Report and a principal analyst with In-Stat/MDR, a chief engineer at Integrated Device Technology, and a lead engineer at SuperMac and Telebit.
43
Some Panel Topics
• Which problems are the natural province of the CPU?
• …of the GPU?• Which CPU design elements will be
borrowed by GPUs, and vice-versa?• Which problems support cooperation
between the CPU and GPU?– How do we stimulate this cooperation?– Or will it be more like competition?
44
Panelists
• Neil Trevett, 3Dlabs• Michael Doggett, ATI• Adam Lake, Intel• David Kirk, NVIDIA• Bill Mark, University of Texas at
Austin
Moderator• Peter N. Glaskowsky, MemoryLogix