prepared and presented by: class presentation of custom dsp implementation course this is a class...
TRANSCRIPT
Prepared and Presented by:
Class Presentation ofCustom DSP Implementation Course
This is a class presentation. All data are copyrights of their respective authors as listed in the references and have been used here for educational purposes only.
ECE Department – University of Tehran
S.H.R. Ahmadi
The CELL processor
Notice:• Photos and Diagrams are proprietary to IBM
• The Cell processor, Power & PowerPC are trademarks of IBM
• PlayStation™ 3 is a trademark of Sony Computer Entertainment Inc. (SCEI)
• FlexIO™ & XDR™ are Rambus Inc. trademarks
• All data are gathered from public sources which are listed in the “References”
Outline
• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References
Development History
• Completely secret and under cover• March 12, 2001 – “Cell” announced
– “supercomputer-on-a-chip” from Sony,Toshiba,IBM– Capable of TeraFlops computation speed– $400m investment in 5 years
• March, 2002 – Okamoto speech– 2005 target date– First glimpse of cell idea: 1000x figure
• August, 2002 – Cell design finished – near “tape out”– “4-16 general-purpose processor cores per chip”
Development History
• November, 2002 – Rambus licenses “Yellowstone” technology to Toshiba– Yellowstone : 3.2 GHz memory
• January, 2003 – Rambus licenses Yellowstone/Redwood Technology to Sony– Redwood – parallel interface between chips
• January, 2003– Cell at 4 GHz, 1024 bit bus, 64 MB memory,
PowerPC– At least 4 patents in 2002 & 2003 on:
• Hardware & software architecture• Processing modules• Memory protection• data synchronization
Development History
• 2004– Marketing NEWS– Some general technical data
• May, 2004– CELL-based Workstation will be made
• Application : digital content creation
• February, 2005– Formal introduction at ISSCC’05– Extensive media coverage
• May, 2005– Sony’s PlayStation3 formal announcement
Outline
• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References
Specifications & Architecture
• Broadband Processor Architecture– Optimized for broadband media and 3D graphics
• 90-nm PD-SOI process, 8M (copper)• 234 million transistors in ~ 235 mm2
• 4.6 GHz operation at 1.3v• 85° Celsius operating temp. with heat sink• Thermal protection schemes• 2965 core connections / ~ 1300 pins
• 256 GFlops SP-FP , 26 GFlops DP-FP• HUGE communication speed to outside• 4 x 128 bit internal bus (ring), 96 Bytes/cycle
Specifications & Architecture
BPA (Cell) design features:• Multi-Core Architecture• Based on the Power Architecture
– Code compatibility
• Coherent and cooperative off-load processing• Enhanced SIMD architecture• Power efficiency improved• “Absolute timers“ allow "hard” realtime data
processing– Good estimation of execution time is possible
• Big-endian memory– Support Apple, but not Intel
• Isolation mechanism for secure code execution
Specifications & Architecture
BPA (Cell) design justification:
• Multi-Core Non-Homogeneous Architecture– Better Power
• 3-level Model of Memory– Main Memory, Local Store, Registers– Better Memory
• Large Register File & SW Controlled Branching– Allows deeper pipelines– Better Frequency
Specifications & Architecture
CPU: (Power Processor Element)• 64-bit Power Architecture™ with VMX(SIMD)• In-order, 2-way hardware Multi-threading
– Simple design improvements possible– predictable execution times
• Coherent Load/Store Cache (32KB L1 - 512KB L2)• Redesigned for use in the Cell processor
Serves as a:• multi-OS GPP• Control unit for SPEs
Specifications & Architecture
SPE: (synergistic Processing Element)• Dual issue, 128-bit 4-way SIMD
– Vector Processing
• 4 Integer Units + 4 FP Units• 8-,16-,32-bit Integer + 32-,64-bit FP
• 128x128-bit Registers
• 256KB Local-Store Memory (specially designed)
– Caches are not used– Data & Instruction in LS
Specifications & Architecture
SPE:• Coherent & Cooperative off-load engines for
CPU– Works independently– Not directly tied to CPU as co-processor
• Dedicated DMA engine– Move data : CPUSPE or SPESPE– Parallel or Serial with other SPEs
• Dynamically configurable to protect resources
• Can perform security algorithms
Specifications & Architecture
• 8 SPE blocks, each with 32 GFlops or 32 Gops
Monstrous processing power
Need to be fed accordingly
Solution :EIBHigh-Speed MEM (Dual XDR™)High-Speed IO (FlexIO™)
Specifications & Architecture
EIB: (Element Interconnect Bus)• Data ring for internal communication• Four 16 byte data rings – low latency• Multiple simultaneous transfers• 96B/cycle peak bandwidth (@ ½ CPU
speed )
Specifications & Architecture
External Memory Bus:• Licensed from Rambus• Dual XDR™ interface (25.6GB/s @ 3.2GHz)
External IO:• Licensed from Rambus• FlexIO™ interface (each 2-wire bit @ 800Mbps)• Total 76.8 GB/s ( 7 Tx Bytes + 5 Rx Bytes )
• Excessive Shielding is necessary– Many VDD/GND wires– 90% of all pins
Outline
• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References
Applications
According to IBM:
• CELL design was based on the analysis of a broad range of workloads in areas such as cryptography, graphics transform and lighting, physics, fast-Fourier transforms (FFT), matrix operations, and scientific workloads
• The Cell processor is designed for graphics- and network-intensive jobs ranging from video games to complex imaging for the medical, defense, automotive and aerospace industries
Applications• Games,3D Graphics,Video,Audio
– Image manipulation; Video processing, encoding, decoding
• DSP (Digital Signal Processing)– FFT (e.g. SETI); Distributed DSP
• Digital Rights Management– Cryptography; Secure data processing
• Scientific Calculations– Linear system solvers; Linear algebra; PDE
• Super Computing• Servers (Commercial databases)• Stream Processing Applications
– Serial use of SPE blocks (e.g. Digital TV)
Outline
• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References
Software Aspects
According to Experts:• Programming the Cell processor requires new
tools & new programming paradigm– Because SPE programs should be self-contained
with data and instruction bundles
• For a game console, programmers will craft custom optimized code. The next challenge for the STI is to find a way to make this architecture accessible to programmers beyond game developers
• Cell is "OS neutral" and supports multiple OS simultaneously
Software Aspects• Tool chain for Cell is built on PowerPC Linux
– Early availability of SIMD-optimized compilers
– Development of high-performance graphics and media libraries for the Broadband Architecture entirely in C
– CELL team developed the first SPU compiler
– Development of an advanced parallelizing compiler with auto-SIMDization features based on IBM XL compiler technology
Outline
• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References
Marketing & NEWS
• “Cell is basically a vector supercomputer on a chip”, we present the 2004 Microprocessor Report Analysts’ Choice Award for Best Technology to the Cell Processor
• IBM is working with companies to integrate Cell microprocessor into third-party products
• The companies are working with open-source compiler developers to create software development tools for programmers
Marketing & NEWS
• Sony PlayStation™ 3• Cell Processor running at 3.2Ghz
– 7 special purpose 3.2Ghz processors– 218 gigaflops of performance
• 256Mb XDR main RAM at 3.2 GHz• 256Mb of GDDR VRAM at 700Mhz• Support for seven Bluetooth controllers• Supports Blu-ray DVD format• System Floating Point Performance of 2 teraflops• Communication Ethernet, Wi-Fi IEEE 802.11,
Bluetooth • Output in HDTV resolution up to 1080p as
standard
Marketing & NEWS
• Cell Processor Based Workstation (CPBW)
• From Sony Group and IBM• First Prototype “Powered On”• 16 TeraFlops in a rack (est.)• Optimized for Digital Content Creation
– Computer entertainment– Movies– Real-time rendering– Physics simulation
• Affordable by Small Businesses (and Individuals)
Marketing & NEWS
• CELL Industries• Our Objective : Distributing Cell Power
• Facilitate small-scale supercomputer applications for Cell
• Cell-based systems– affordable for individuals and small to medium-sized
businesses
• Our Cell PCI-x plug-in card, xpac-zero– fastest and most economical way for people to get their
hands on some real computing power
• Uses Cell as a general-purpose numerical accelerator– The xpac-zero card acts much like a video card
Outline
• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References
References
• IBM, Sony, Toshiba papers in ISSCC’05– “A Streaming Processing Unit for a CELL
Processor”, B. Flachs et. al. – “The Design and Implementation of a First-
Generation CELL Processor”, D. Pham et. al.
• “Microprocessor Report”,Reed Electronics Group, 2005, Jan. 31 & Feb. 14
• “IBM’s Cell Processor : The next generation of computing?”,D.K. Every, Shareware Press, Feb. 2005
References
• “Power Efficient Processor Architecture and The Cell Processor”, H.P. Hofstee, HPCA-11 2005
• “Power Efficient Processor Design and the Cell Processor”, IBM, 2005
• “Introducing the IBM/Sony/Toshiba Cell Processor“,J. H. Stokes, http://arstechnica.com/
• “Cell Architecture Explained”,N. Blachford, http://www.blachford.info/