multi-core development

Multi-Core Development

Kyle Anderson

Overview

• History

• Pollack’s Law

• Moore’s Law

• CPU

• GPU

• OpenCL

• CUDA

• Parallelism

History

• First 4 bit microprocessor – 1971• 60,000 instructions per second• 2,300 transistors

• First 8 bit microprocessor – 1974• 290,000 instructions per second• 4,500 transistors• Altair 8800

• First 32 bit microprocessor – 1985• 275,000 transistors

History

• First Pentium processor released – 1993• 66 MHz

• Pentium 4 released – 2000• 1.5 GHz• 42,000,000 transistors

• Approach 4GHz 2000 - 2005

• Core 2 Duo released – 2006• 291,000,000 tranisitors

History

Pollack’s Law

• Processor Performance grows with square root of area

Pollack’s Law

Moore’s Law

• “The Number of transistors incorporated in a chip will approximately double every 24 months.”

– Gordon Moore, Intel co-founder

• Smaller and smaller transistors

Moore’s Law

CPU

• Sequential

• Fully functioning cores

• 16 cores maximum Currently

• Hyperthreading

• Little Latency

GPU

• Higher latency

• Thousands of cores

• Simple calculations

• Used for research

OpenCL

• Multitude of Devices

• Run-time compilation ensures most up to date features on device

• Lock-Step

OpenCL Data Structures

• Host

• Device• Compute Units

• Work-Group• Work-Item

• Command Queue

• Kernel

• Context

OpenCL Types of Memory

• Global

• Constant

• Local

• Private

OpenCL

OpenCL Example

CUDA

• NVidia's proprietary API for their GPU’s

• Stands for “Compute Unified Device Architecture”

• Compiles directly to hardware

• Used by Adobe, Autodesk, National Instruments, Microsoft and Wolfram Mathematica

• Faster than OpenCL because compiled directly on hardware and focus on a single architecture.

CUDA Indexing

CUDA Example

CUDA Function Call

cudaMemcpy( dev_a, a, N * sizeof(int),cudaMemcpyHostToDevice );

cudaMemcpy( dev_b, b, N * sizeof(int),cudaMemcpyHostToDevice );

add<<<N,1>>>( dev _ a, dev _ b, dev _ c );

Types of Parallelism

• SIMD

• MISD

• MIMD

• Instruction parallelism

• Task parallelism

• Data parallelism

SISD

• Stands for Single Instruction, Single Data

• Does not use multiple cores

SIMD

• Stands for “Single Instruction, Multiple Data Streams”

• Can be process multiple data streams concurrently

MISD

• Stands for “Multiple Instruction, Single Data”

• Risky because several instructions are processing the same data

MIMD

• Stands for “Multiple Instruction, Multiple Data”

• Instructions are processed sequentially

Instruction Parallelism

• Mutually exclusive

• MIMD and MISD often use this

• Allows multiple instructions to be run at once

• Instructions considered operations

• Not programmatically done• Hardware • Compiler

Task Parallelism

• Dividing up of main tasks or controls

• Runs multiple threads concurrently

Data Parallelism

• Used by SIMD and MIMD

• A list of instructions is able to work concurrently on a several data sets

multi-core development

Documents

single instruction

multiple data streamscan

multiple datainstructions

work item14opencl

multiple cores simdstands

data parallelismused

data mimdstands

data sets