gpu architecture and programming. gpu vs cpu

21
GPU Architecture and Programming

Upload: aldous-baker

Post on 13-Jan-2016

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GPU Architecture and Programming. GPU vs CPU

GPU Architecture and Programming

Page 2: GPU Architecture and Programming. GPU vs CPU

GPU vs CPUhttps://www.youtube.com/watch?v=fKK933KK6Gg

Page 3: GPU Architecture and Programming. GPU vs CPU

GPU Architecture

• GPU (Graphics Processing Unit) were originally designed as graphics accelerators, used for real-time graphics rendering.

• Starting in the late 1990s, the hardware became increasingly programmable, culminating in NVIDIA's first GPU in 1999.

Page 4: GPU Architecture and Programming. GPU vs CPU

• CPU + GPU is a powerful combination – CPUs consist of a few cores optimized for serial processing, – GPUs consist of thousands of smaller, more efficient cores

designed for parallel performance. – Serial portions of the code run on the CPU while parallel

portions run on the GPU

Page 5: GPU Architecture and Programming. GPU vs CPU

Architecture of GPU

Image copied from http://www.pgroup.com/lit/articles/insider/v2n1a5.htm Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

Page 6: GPU Architecture and Programming. GPU vs CPU

CUDA Programming

• CUDA (Compute Unified Device Architecture) is a parallel programming platform created by NVIDIA based on its GPUs.

• By using CUDA, you can write programs that directly access GPU.

• CUDA platform is accessible to programmers via CUDA libraries and extensions to programming languages like C, C++ AND Fortran. – C/C++ programmers use “CUDA C/C++”, compiled with nvcc

compiler– Fortran programmers can use CUDA Fortran, compiled with PGI

CUDA Fortran

Page 7: GPU Architecture and Programming. GPU vs CPU

• Terminology:– Host: The CPU and its memory (host memory)– Device: The GPU and its memory (device memory)

Page 8: GPU Architecture and Programming. GPU vs CPU

Programming Paradigm

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Parallel function of application: execute as a kernel

Page 9: GPU Architecture and Programming. GPU vs CPU

Programming Flow

1. Copy input data from CPU memory to GPU memory

2. Load GPU program and execute3. Copy results from GPU memory to CPU

memory

Page 10: GPU Architecture and Programming. GPU vs CPU

• Each parallel function of application is execute as a kernel

• That means GPUs are programmed as a sequence of kernels; typically, each kernel completes execution before the next kernel begins.

• Fermi has some support for multiple, independent kernels to execute simultaneously, but most kernels are large enough to fill the entire machine.

Page 11: GPU Architecture and Programming. GPU vs CPU

Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

Page 12: GPU Architecture and Programming. GPU vs CPU

Hello World! Example

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

_ _global_ _ is a CUDA C/C++ keyword meaning • mykernel() will be exectued on the device• mykernel() will be called from the host

Page 13: GPU Architecture and Programming. GPU vs CPU

Addition Example

• Since add runs on device, pointers a, b, and c must point to device memory

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 14: GPU Architecture and Programming. GPU vs CPU

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 15: GPU Architecture and Programming. GPU vs CPU

Vector Addition Example

Kernel Function:

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 16: GPU Architecture and Programming. GPU vs CPU

main:

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 17: GPU Architecture and Programming. GPU vs CPU

Alternative 1:

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 18: GPU Architecture and Programming. GPU vs CPU

Alternative 2:

int globalThreadId = threadIdx.x + blockIdx.x * M //M is the number of threads in a block

Int globalThreadId = threadIdx.x + blockIdx.x * blockDim.x

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 19: GPU Architecture and Programming. GPU vs CPU

• So the kernel becomes

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 20: GPU Architecture and Programming. GPU vs CPU

• The main becomes

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Page 21: GPU Architecture and Programming. GPU vs CPU

Handling Arbitrary Vector Sizes

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf