programming the cell multiprocessor işıl Öz. outline cell processor – objectives – design and...
TRANSCRIPT
Outline
Cell processor– Objectives– Design and architecture
Programming the cell– Programming models
CellSs
Cell Processor
Cell Broadband Engine Architecture– Cell BE
Developed by STI (SCEI-Toshiba-IBM) design center– STI formed in 2000 – STI design center opened in 2001– Introduced in 2005– 65 nm in 2007, 45 nm in 2008
Cell Processor Objectives
Outstanding performance especially on game/multimedia applications
– Memory latency– Power efficiency– Processor frequency and pipeline depth
Real time response to the user and the network Applicable to a wide range of platforms Support for introduction in 2005
Cell Architecture
a 64-bit Power processor element (PPE)
8 synergistic processor elements (SPE)
Memory controller Bus-interface controller Element interconnect
bus
Synergistic Processor Elements
SPEs– DMA
(Direct Memory Access Unit)– LS
(Local Store Memory)– SXUs
(Execution Units)
Controllers
Memory Interface Controller
– interfaces to the Rambus XDR I/O unit which communicates directly to DRAM modules
Bus Interface Controller– interfaces to the Rambus
FlexIO which provides to communicate with system components
Element Interconnect Bus
EIB– Coherent, on-chip bus– Connects the processing
elements, memory and I/O devices
Programming the Cell
Local store memory in SPEs (256KB) SIMD nature of dataflows The size of the register file (128 bits) Single program context
Programming Models
Function offload model Device extension model Computational acceleration model Streaming models Shared-memory multiprocessor model Asymmetric thread runtime model
A programming model:CellSs
Cell superscalar– Simple and flexible– Automatic parallelism of sequential program– Task scheduling and data handling
CellSs Structure
Based on – code annotations– C language
Composed of– Source compiler– Runtime library
CellSs Compiler
Source to source compiler– Function(task) to be executed in the SPEs– Function parameter directions– Parameters that are arrays and their lengths
No pointers!
Parallelism on CellSs
Annotated codeAnnotated code
Generated code for the PPEGenerated code for the PPE Generated code for the SPEGenerated code for the SPE
CellSs Syntax
Three types of pragmas– initialization and finalization
css start and css finish
– task css task [input inout output]
– synchronization css wait
CellSs Runtime
Execute function– Add a node in task graph– Data dependency analysis (RaW, WaR, Waw)– Parameters renaming– Task submission
Tracing
Generates Paraver trace files by a tracing component embedded in the CellSs runtime– when the main program enters or exits– when an annotated function is called in the main
program– when a task is started or finished
Performance Analysis
Matmul– Block matrix multiplication
TSP– Recursive implementation of Traveling Salesman
Problem
Cholesky– Block matrix Cholesky factorization
Performance Analysis
x-axis : timeline y-axis : a thread of the application green : events yellow : communications
Pros and Cons
annotations– simple– but limited
data transfer transparently to the user code task dependency analysis
rely on other compilers for– code vectorization (SPE performance)– lower level code optimization
Related Work
OpenMP Accelerated Library Framework (ALF) Thread level synchronization Sequoia Rapidmind Ohara Graphics Processor Units (GPUs)
References
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy, “Introduction to the Cell multiprocessor”, IBM J. Res. & Dev. Vol. 49 No. 4/5 July/ September 2005.
Pieter Bellens, Josep M. Perez, Rosa M. Badia and Jesus Labarta, “CellSs: a Programming Model for the Cell BE Architecture”, Supercomputing Conference, 2006.
M. W. Riley, J. D. Warnock, D. F. Wendel, “Cell Broadband Engine processor:Design and implementation”, IBM J. Res. & Dev. Vol. 51 No. 5 September 2007.
J. M. Perez, P. Bellens, R. M. Badia, J. Labarta, “CellSs: Making it easier to program the Cell Broadband Engine processor”, IBM J. Res. & Dev. Vol. 51 No. 5 September 2007.
http://www.ibm.com/developerworks/power/cell/ www.bsc.es/cellsuperscalar