inf5063: programming heterogeneous multi-core processors...1 exercise (not graded) on intel ixp ......
TRANSCRIPT
![Page 1: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/1.jpg)
September 13, 2010
INF5063: Programming heterogeneous multi-core processors
![Page 2: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/2.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Overview
Course topic and scope
Background for the use and parallel processing using heterogeneous multi-core processors
Examples of heterogeneous architectures
![Page 3: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/3.jpg)
INF5063: The Course
![Page 4: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/4.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
People
Håvard Espeland email: haavares @ ifi
Håkon Kvale Stensland email: haakonks @ ifi
Carsten Griwodz email: griff @ ifi
Pål Halvorsen email: paalh @ ifi
![Page 5: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/5.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Time and place
Lectures: Fridays 13.15 – 15.00 Store Aud. ??? Veilabben???
NB! The web page states that we will have group exercises on
Thursdays 10.15 - 12.00, 3B. However, there will NOT be any weekly exercises, but this hour is assigned for your mandatory assignments (we will NOT be there).
![Page 6: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/6.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
About INF5063: Topic & Scope
Content: The course gives …
- … an overview of heterogeneous multi-core processors in general and three variants in particular and a modern general-purpose core (architectures and use)
- … an introduction to working with heterogeneous multi-core processors
• Intel IXP 2400 network processor card
• SSEx for x86
• nVIDIA’s family of GPUs and the CUDA programming framework
• The Cell Broadband Engine Architecture
- … some ideas of how to use/program heterogeneous multi-core processors
![Page 7: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/7.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
About INF5063: Topic & Scope Tasks:
The important part of the course is lab-assignments where you program each of the three examples of heterogeneous multi-core processors
1 exercise (not graded) on Intel IXP
- packet counter – download, run and extend wwpingbump
3 graded home exams (counting 33% each):
- Deliver code
- Make a demonstration and explain your design and code to the class
1. On the x86
• Video encoding – Improve performance of video compression by using SSE instructions.
2. On the nVIDIA graphics cards
• Video encoding – Improve the performance of video compression by using the G80 architecture
3. On the Cell processor
• Video encoding – the same as above, but exploit the parallelity of the Cell processor’s SBEs
![Page 8: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/8.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Available Resources
Resources will be placed at
- http://www.ifi.uio.no/~griff/INF5063 - Login: inf5063 - Password: ixp
- Manuals, papers, code example, …
![Page 9: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/9.jpg)
Background and Motivation:
Moore’s Law
![Page 10: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/10.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Motivation: Intel View
>billion transistors integrated 2010: • 2,3 billion - Intel 8-Core Xeon Nehalem-EX • 3,0 billion - nVidia GF100 (Fermi)
1971: • 2,300 - Intel 4004
![Page 11: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/11.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Motivation: Intel View
>billion transistors integrated Clock frequency can still increase
2010: • 5 (6) GHz – IBM Power6
![Page 12: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/12.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Motivation: Intel View
>billion transistors integrated Clock frequency can still increase Future applications will demand TIPS
2010: 147,600 MIPS @ 3.3 GHz – Intel Core i7
Extreme Edition i980EE (6 cores)
![Page 13: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/13.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Motivation: Intel View
>billion transistors integrated Clock frequency can still increase Future applications will demand TIPS Power? Heat?
![Page 14: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/14.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Motivation: Intel View
Soon >billion transistors integrated Clock frequency can still increase Future applications will demand TIPS Power? Heat?
![Page 15: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/15.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Motivation “Future applications will demand TIPS”
“Think platform beyond a single processor”
“Exploit concurrency at multiple levels”
“Power will be the limiter due to complexity and leakage”
Distribute workload on multiple cores
![Page 16: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/16.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Symmetric Multi-Core Processors
Phenom X4
![Page 17: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/17.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Symmetric Multi-Core Processors
UltraSparc
![Page 18: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/18.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Intel Multi-Core Processors
Symmetric multi-processors allow multi-threaded applications to achieve higher performance at less die area and power consumption than single-core processors
![Page 19: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/19.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Symmetric Multi-Core Processors Good - Growing computational power
Problematic - Growing die sizes - Unused resources
• Some cores used much more than others • Many core parts frequently unused
Why not spread the load better? - Functions exist only once per core - Parallel programming is hard
⇒ Asymmetric multi-core processors
![Page 20: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/20.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Asymmetric Multi-Core Processors
Asymmetric multi-processors consume power and provide increased computational power only on demand
Highly parallel Moderately parallel Sequential
![Page 21: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/21.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Motivation “Future applications will demand TIPS”
“Think platform beyond a single processor”
“Exploit concurrency at multiple levels”
“Power will be the limiter due to complexity and leakage”
Distributed workload on multiple cores + simple processors are “easier” to program
+consume less energy
heterogeneous multi-core processors
![Page 22: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/22.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Co-Processors
The original IBM PC included a socket for an Intel 8087 floating point co-processor (FPU) - 50-fold speed up of floating point operations
Intel kept the co-processor up to i486 - 486DX contained an optimized i487 block - Still separate pipeline (pipeline flush when starting and ending use) - Communication over an internal bus
Commodore Amiga was one of the earlier machines that used multiple processors - Motorola 680x0 main processor - Blitter (block image transferrer - moving data, fill operations, line
drawing, performing boolean operations) - Copper (Co-Processor - change address for video RAM on the fly)
![Page 23: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/23.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
What now – are today’s cores really “Symmetric”?
Nehalem
![Page 24: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/24.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Review of General Data Path on Conventional Computer Hardware Architectures
communication system
application
user space
kernel space
sending:
communication system
application
receiving:
communication system
application
forwarding:
transport (TCP/UDP)
network (IP)
link
![Page 25: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/25.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Network Processors: Main Idea
Traditional system: - slow - resource demanding - shared with other operations
Network processors: - a computer within the computer - special, programmable hardware - offloads host resources
![Page 26: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/26.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
IXA: Internet Exchange Architecture
IXA - a broad term to describe the Intel network architecture - HW & SW, control- & data plane
IXP: Internet Exchange Processor - processor that implements IXA
- IXP1200 is the first IXP chip (4 versions)
- IXP2xxx has now replaced the first version
![Page 27: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/27.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
IXA: Internet Exchange Architecture IXP1200 basic features - 1 embedded 232 MHz StrongARM - 6 packet 232 MHz µengines - onboard memory - 4 x 100 Mbps Ethernet ports - multiple, independent busses - low-speed serial interface - interfaces for external memory
and I/O busses - …
![Page 28: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/28.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
IXA: Internet Exchange Architecture IXP2400 basic features
- 1 embedded 600 MHz XScale - 8 packet 600 MHz µengines - onboard memory - 3 x 1 Gbps Ethernet ports - multiple, independent busses - low-speed serial interface - interfaces for external memory
and I/O busses - …
![Page 29: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/29.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
IXP1200 Architecture
RISC processor: - StrongARM running Linux - control, higher layer protocols and exceptions - 232 MHz
Microengines: - low-level devices with limited set of instructions - transfers between memory devices - packet processing - 232 MHz
Access units: - coordinate access to external units
Scratchpad: - on-chip memory - used for IPC and synchronization
![Page 30: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/30.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
IXP1200 IXP2400
SRAM
FLASH
MEMORY MAPPED
I/O
DRAM
SRAM access
SDRAM access
SCRATCH memory
PCI access
IX access
Embedded RISK CPU
(StrongARM)
PCI bus
IX bus
DRAM bus
SRAM bus
microengine 2
microengine 1
microengine 5
microengine 4
microengine 3
microengine 6
multiple independent
internal buses
IXP1200
![Page 31: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/31.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
IXP2400
IXP2400 Architecture
microengine 8
SRAM
coprocessor
FLASH
DRAM
SRAM access
SDRAM access
SCRATCH memory
PCI access
MSF access
Embedded RISK CPU (XScale)
PCI bus
receive bus
DRAM bus
SRAM bus
microengine 2
microengine 1
microengine 5
microengine 4
microengine 3
multiple independent
internal buses slowport
access
…
transmit bus
![Page 32: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/32.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Graphics Processing Units (GPUs)
buss connector
& memory
hub
GPU: a dedicated graphics rendering device
2D 3D
First GPUs,
80s: for early 2D operations Amiga and Atari used a blitter, Amiga had also the copper
90s: 3D hardware for game consoles like PS and N64 3dfx Voodoos 3D add-on card for PCs
New powerful GPUs, e.g.,:
Nvidia GeForce GX280 240 1476 MHz core 1 GB memory memory BW: 159 GB/sec PCI Express 2.0
similar to other manufacturers …
![Page 33: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/33.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
General Purpose Computing on GPU The - high arithmetic precision - extreme parallel nature - optimized, special-purpose instructions - available resources - …
… of the GPU allows for general, non-graphics related operations to be performed on the GPU
Generic computing workload is off-loaded from CPU and to GPU
⇒ More generically: Heterogeneous multi-core processing
![Page 34: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/34.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
nVIDIA G200 / GF100
- 1.4 / 3 billion transistors - 240 / 512 shaders
- 512 / 384 bit memory bus (GDDR3 / 5)
- 159 / 177 GB/sec memory bandwidth
- 933 / 1344 Gflops
- PCI Express 2.0
![Page 35: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/35.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
nVIDIA GT200 Streaming Multiprocessor ( SM )
Store to
SP 0 RF 0
SP 1 RF 1
SP 2 RF 2
SP 3 RF 3
SP 4 RF 4
SP 5 RF 5
SP 6 RF 6
SP 7 RF 7
Constant L 1 Cache
L 1 Fill
Load from Memory
Load Texture
S F U
S F U
Instruction Fetch Instruction L 1 Cache
Thread / Instruction Dispatch
L 1 Fill
Work
Control
Results
Shared Memory
Store to Memory
Stream Multiprocessors (SMs) - fundamental thread block unit - 8 stream processors (SPs)
(scalar ALU for threads) - 2 super function units (SFUs)
(cos, sin, log, ...) - 8 32KB local register files (RFs) - 16 kB level 1 cache - 64 kB shared memory - 256 kB global level 2 cache
Number of stream multiprocessors - 1 - Quadro NVS 130M - 16 - GeForce 8800 GTX - 30 - GeForce GTX 285 - 4x30 - Tesla S1070
![Page 36: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/36.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
Memory Bandwidth for CPU and GPU
Marketed as GPGPUs
![Page 37: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/37.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
STI (Sony, Toshiba, IBM) Cell Motivation for the Cell - Cheap processor - Energy efficient - For games and media processing - Short time-to-market
Conclusion - Use a multi-core chip - Design around an existing, power-
efficient design - Add simple cores specific for game and
media processing requirements
![Page 38: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/38.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
STI (Sony, Toshiba, IBM) Cell Cell is a 9-core processor - combining a light-weight general-
purpose processor with multiple co-processors into a coordinated whole
- Power Processing Element (PPE) • conventional Power processor • not supposed to perform all
operations itself, acting like a controller
• running conventional OSes • 16 KB instruction/data level 1 cache • 512 KB level 2 cache
![Page 39: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/39.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
STI (Sony, Toshiba, IBM) Cell
- Synergistic Processing Elements (SPE)
• specialized co-processors for specific types of code, i.e., very high performance vector processors
• local stores • can do general purpose operations • the PPE can start, stop, interrupt
and schedule processes running on an SPE
- Element Interconnect Bus (EIB) • internal communication bus • connects on-chip system elements:
PPE & SPEs the memory controller (MIC) two off-chip I/O interfaces
• 25.6 GBps each way
![Page 40: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/40.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
STI (Sony, Toshiba, IBM) Cell - memory controller
• Rambus XDRAM interface to Rambus XDR memory
• dual channels at 12.8 GBps 25.6 GBps
- I/O controller • Rambus FlexIO interface which
can be clocked independently
• dual configurable channels
• maximum ~ 76.8 GBps
![Page 41: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/41.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
STI (Sony, Toshiba, IBM) Cell
- Cell has in essence traded running everything at moderate speed for the ability to run certain types of code at high speed
- used for example in • Sony PlayStation 3:
3.2 GHz clock 6 SPEs for general operations 1 SPE for security for the OS
• Toshiba home cinema: decoding of 48 HDTV MPEG streams dozens of thumbnail videos simultaneously on screen
• IBM blade centers: 3.2 GHz clock Linux ≥ 2.6.11
![Page 42: INF5063: Programming heterogeneous multi-core processors...1 exercise (not graded) on Intel IXP ... - Still separate pipeline (pipeline flush when starting and ending use) - Communication](https://reader033.vdocument.in/reader033/viewer/2022042713/5fab96071f310674a52e6333/html5/thumbnails/42.jpg)
INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo
The End: Summary
Heterogeneous multi-core processors are already everywhere
Challenge: programming - Need to know the capabilities of the system - Different abilities in different cores - Memory bandwidth - Memory sharing efficiency - Need new methods to program the different
components
Next time: how to start programming the Intel IXP