tim madden odg/xsd. graphics processing unit graphics card on your pc. “hardware accelerated...

18
GPUS AND DETECTORS Tim Madden ODG/XSD

Upload: elaine-armstrong

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

GPUS AND DETECTORSTim Madden

ODG/XSD

What is a GPU

Graphics Processing Unit Graphics card on your PC. “Hardware accelerated graphics” Video game industry is main driver. More recently used for non-graphics

applications.

How does GPU work?

Card on the PCI-Express buss. GPU card contains its own RAM and

processor(s). What is a CORE?

A core is an ALU, arithmetic logic unit. ALU is basically a single processor that can

run a computer program. Modern PCs have “Quad Core.” Basically 4

processors. This refers to the processor on the motherboard that runs Windows.

GPU has hundreds of Cores!

GPU for Graphics Applications

Programmer uses some API to write graphics code Open GL: Silicon Graphics Corp. Now a common standard on

most computers. Direct X: X-Box, Windows

Programmer calls functions with above APIs and compiles.

If the computer has proper GPU, (“Direct 3D compatible” etc.) the code magically runs on the GPU. Compiled program can link at runtime to libraries that run on the GPU. “Hardware Acceleration”

These APIs only useful for drawing and responding to the mouse, or joystick etc.

MSDN- http://msdn.microsoft.com/en-us/directx/default

Microsoft site for DirectX

GPU for Science Applications

DirectX, OpenGL is predefined set of graphics functions that can run on GPU

Nvidea created CUDA for non-graphics applications for GPUs

CUDA allows writing a C++ program to run on the GPU.

Cross-compiler- You write code by typing into a windows box. You compile on the windows box. The code runs on the GPU.

CUDA tools interfaces with Microsoft compiler. CUDA allows the creation of your OWN functions

that run on the GPU, not just whatever DirectX gives you.

What advantage to GPU

Parallel programming? What is a “Thread?”

A sequence of commands in a program that run after another.

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

Multi-Threaded Programs A typical program on a PC has many threads

running at once. An EPICS IOC has about 20 threads running. This Powerpoint program is running 8 threads (at

time of typing this sentence).void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

Threads on the Host Processor The more threads running, the slower each thread. Solution is to add more processors. A “core” is a processor. “Quad Core” PC has 4 processors, each running hundreds of

threads.

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

PROCESSOR

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

PROCESSOR

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

PROCESSORvoid oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

void oneThread(int N){ int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }}

PROCESSOR

Demo: How to Crash a computer

Make a thread, that in turn makes a new thread, etc… Void haveChildren()

Update global thread counter, and printf. Sleep 500ms Call haveChildren() on a New thread. Display a window If OK is hit on window, then exit(0)

When haveChildren is called, an infinite number of threads is created. An infinite number of windows will display.

Threads show in Task Manager

Threads on a GPU

Instead of running 100’s of threads, let us run millions of threads!

GPU can have 1024 processors. Each processor can run 1000’s of threads at once.

Adding more processors speeds up the program.

Threads on a GPU

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

ThreadThrea

dThreadThrea

d

Adding 1 to an image

// My image dataShort *image = new short[1024*1024];Int k

For (k=0; k<1024*1024; k++){ image[k] = image[k] + 1;}

On the host (not the GPU) we write a single thread to process an image.

1 pixel at a time. For a 1kx1k image, this is 1M operations

in sequence.

Adding 1 to an Image on a GPU

Write code for a single pixel, and call the code in 1M separate threads.

Cuda will dole out threads to Cores for you on the GPU.

Pixel X runs on thread X.__global__ void subtractDarkImage_k( unsigned short *d_Dst, unsigned short *d_Src, int dataSize){ const int i = blockDim.x * blockIdx.x + threadIdx.x; if(i >= dataSize) return;

d_Dst[i] =d_Src[i] +1;}

Try this at home

Install Microsoft compiler. Download Cuda and install. Open examples.

Cuda plugs into Microsoft Visual Studio. When you build, both the host code and

GPU code are built. .cpp runs on the host. .cu and .cuh runs

on GPU

GPU and EPICS

Plugin to Area Detector to run calculations on GPU. When new image comes from detector:

Host sends image to GPU GPU does calcs. Host retrieves result from GPU.. Host sends results to EPICS etc.

GPU code compiled as DLL. Epics Area Detector loads DLL and runs.

Allows arbitrary calculations on GPU. Just make a new DLL. Separates cross compile of GPU code, from EPICS build. One Area Detector plugin for all GPU calculations. Can define EPICS variables in the DLL. Host queries DLL for

parameters and connects EPICS PVs.

Demo: Simulated Detector with GPU.

Sending image to GPU and back. Dark Subtraction on Host versus GPU. Fast convolution on GPU versus Host. Running several programs on GPU at

once.