multi-core development
DESCRIPTION
Multi-Core Development. Kyle Anderson. Overview. History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism. History. First 4 bit microprocessor – 1971 60,000 instructions per second 2,300 transistors First 8 bit microprocessor – 1974 290,000 instructions per second - PowerPoint PPT PresentationTRANSCRIPT
Multi-Core Development
Kyle Anderson
Overview
• History
• Pollack’s Law
• Moore’s Law
• CPU
• GPU
• OpenCL
• CUDA
• Parallelism
History
• First 4 bit microprocessor – 1971• 60,000 instructions per second• 2,300 transistors
• First 8 bit microprocessor – 1974• 290,000 instructions per second• 4,500 transistors• Altair 8800
• First 32 bit microprocessor – 1985• 275,000 transistors
History
• First Pentium processor released – 1993• 66 MHz
• Pentium 4 released – 2000• 1.5 GHz• 42,000,000 transistors
• Approach 4GHz 2000 - 2005
• Core 2 Duo released – 2006• 291,000,000 tranisitors
History
Pollack’s Law
• Processor Performance grows with square root of area
Pollack’s Law
Moore’s Law
• “The Number of transistors incorporated in a chip will approximately double every 24 months.”
– Gordon Moore, Intel co-founder
• Smaller and smaller transistors
Moore’s Law
CPU
• Sequential
• Fully functioning cores
• 16 cores maximum Currently
• Hyperthreading
• Little Latency
GPU
• Higher latency
• Thousands of cores
• Simple calculations
• Used for research
OpenCL
• Multitude of Devices
• Run-time compilation ensures most up to date features on device
• Lock-Step
OpenCL Data Structures
• Host
• Device• Compute Units
• Work-Group• Work-Item
• Command Queue
• Kernel
• Context
OpenCL Types of Memory
• Global
• Constant
• Local
• Private
OpenCL
OpenCL Example
OpenCL Example
OpenCL Example
CUDA
• NVidia's proprietary API for their GPU’s
• Stands for “Compute Unified Device Architecture”
• Compiles directly to hardware
• Used by Adobe, Autodesk, National Instruments, Microsoft and Wolfram Mathematica
• Faster than OpenCL because compiled directly on hardware and focus on a single architecture.
CUDA Indexing
CUDA Example
CUDA Example
CUDA Example
CUDA Function Call
cudaMemcpy( dev_a, a, N * sizeof(int),cudaMemcpyHostToDevice );
cudaMemcpy( dev_b, b, N * sizeof(int),cudaMemcpyHostToDevice );
add<<<N,1>>>( dev _ a, dev _ b, dev _ c );
Types of Parallelism
• SIMD
• MISD
• MIMD
• Instruction parallelism
• Task parallelism
• Data parallelism
SISD
• Stands for Single Instruction, Single Data
• Does not use multiple cores
SIMD
• Stands for “Single Instruction, Multiple Data Streams”
• Can be process multiple data streams concurrently
MISD
• Stands for “Multiple Instruction, Single Data”
• Risky because several instructions are processing the same data
MIMD
• Stands for “Multiple Instruction, Multiple Data”
• Instructions are processed sequentially
Instruction Parallelism
• Mutually exclusive
• MIMD and MISD often use this
• Allows multiple instructions to be run at once
• Instructions considered operations
• Not programmatically done• Hardware • Compiler
Task Parallelism
• Dividing up of main tasks or controls
• Runs multiple threads concurrently
Data Parallelism
• Used by SIMD and MIMD
• A list of instructions is able to work concurrently on a several data sets