prepared and presented by: class presentation of custom dsp implementation course this is a class...

36
Prepared and Presented by: Class Presentation of Custom DSP Implementation Course This is a class presentation. All data are copyrights of their respective authors as listed in the references and have been used here for educational ECE Department – University of Tehran S.H.R. Ahmadi The CELL processor

Upload: eileen-trible

Post on 15-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Prepared and Presented by:

Class Presentation ofCustom DSP Implementation Course

This is a class presentation. All data are copyrights of their respective authors as listed in the references and have been used here for educational purposes only.

ECE Department – University of Tehran

S.H.R. Ahmadi

The CELL processor

Notice:• Photos and Diagrams are proprietary to IBM

• The Cell processor, Power & PowerPC are trademarks of IBM

• PlayStation™ 3 is a trademark of Sony Computer Entertainment Inc. (SCEI)

• FlexIO™ & XDR™ are Rambus Inc. trademarks

• All data are gathered from public sources which are listed in the “References”

Outline

• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References

Development History

• Completely secret and under cover• March 12, 2001 – “Cell” announced

– “supercomputer-on-a-chip” from Sony,Toshiba,IBM– Capable of TeraFlops computation speed– $400m investment in 5 years

• March, 2002 – Okamoto speech– 2005 target date– First glimpse of cell idea: 1000x figure

• August, 2002 – Cell design finished – near “tape out”– “4-16 general-purpose processor cores per chip”

Development History

• November, 2002 – Rambus licenses “Yellowstone” technology to Toshiba– Yellowstone : 3.2 GHz memory

• January, 2003 – Rambus licenses Yellowstone/Redwood Technology to Sony– Redwood – parallel interface between chips

• January, 2003– Cell at 4 GHz, 1024 bit bus, 64 MB memory,

PowerPC– At least 4 patents in 2002 & 2003 on:

• Hardware & software architecture• Processing modules• Memory protection• data synchronization

Development History

• 2004– Marketing NEWS– Some general technical data

• May, 2004– CELL-based Workstation will be made

• Application : digital content creation

• February, 2005– Formal introduction at ISSCC’05– Extensive media coverage

• May, 2005– Sony’s PlayStation3 formal announcement

Outline

• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References

Specifications & Architecture

• Broadband Processor Architecture– Optimized for broadband media and 3D graphics

• 90-nm PD-SOI process, 8M (copper)• 234 million transistors in ~ 235 mm2

• 4.6 GHz operation at 1.3v• 85° Celsius operating temp. with heat sink• Thermal protection schemes• 2965 core connections / ~ 1300 pins

• 256 GFlops SP-FP , 26 GFlops DP-FP• HUGE communication speed to outside• 4 x 128 bit internal bus (ring), 96 Bytes/cycle

Specifications & Architecture

BPA (Cell) design features:• Multi-Core Architecture• Based on the Power Architecture

– Code compatibility

• Coherent and cooperative off-load processing• Enhanced SIMD architecture• Power efficiency improved• “Absolute timers“ allow "hard” realtime data

processing– Good estimation of execution time is possible

• Big-endian memory– Support Apple, but not Intel

• Isolation mechanism for secure code execution

Specifications & Architecture

BPA (Cell) design justification:

• Multi-Core Non-Homogeneous Architecture– Better Power

• 3-level Model of Memory– Main Memory, Local Store, Registers– Better Memory

• Large Register File & SW Controlled Branching– Allows deeper pipelines– Better Frequency

FlexIO

Specifications & Architecture

CPU: (Power Processor Element)• 64-bit Power Architecture™ with VMX(SIMD)• In-order, 2-way hardware Multi-threading

– Simple design improvements possible– predictable execution times

• Coherent Load/Store Cache (32KB L1 - 512KB L2)• Redesigned for use in the Cell processor

Serves as a:• multi-OS GPP• Control unit for SPEs

Specifications & Architecture

SPE: (synergistic Processing Element)• Dual issue, 128-bit 4-way SIMD

– Vector Processing

• 4 Integer Units + 4 FP Units• 8-,16-,32-bit Integer + 32-,64-bit FP

• 128x128-bit Registers

• 256KB Local-Store Memory (specially designed)

– Caches are not used– Data & Instruction in LS

Specifications & Architecture

SPE:• Coherent & Cooperative off-load engines for

CPU– Works independently– Not directly tied to CPU as co-processor

• Dedicated DMA engine– Move data : CPUSPE or SPESPE– Parallel or Serial with other SPEs

• Dynamically configurable to protect resources

• Can perform security algorithms

Specifications & Architecture

• 8 SPE blocks, each with 32 GFlops or 32 Gops

Monstrous processing power

Need to be fed accordingly

Solution :EIBHigh-Speed MEM (Dual XDR™)High-Speed IO (FlexIO™)

Specifications & Architecture

EIB: (Element Interconnect Bus)• Data ring for internal communication• Four 16 byte data rings – low latency• Multiple simultaneous transfers• 96B/cycle peak bandwidth (@ ½ CPU

speed )

Specifications & Architecture

External Memory Bus:• Licensed from Rambus• Dual XDR™ interface (25.6GB/s @ 3.2GHz)

External IO:• Licensed from Rambus• FlexIO™ interface (each 2-wire bit @ 800Mbps)• Total 76.8 GB/s ( 7 Tx Bytes + 5 Rx Bytes )

• Excessive Shielding is necessary– Many VDD/GND wires– 90% of all pins

Outline

• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References

Applications

According to IBM:

• CELL design was based on the analysis of a broad range of workloads in areas such as cryptography, graphics transform and lighting, physics, fast-Fourier transforms (FFT), matrix operations, and scientific workloads

• The Cell processor is designed for graphics- and network-intensive jobs ranging from video games to complex imaging for the medical, defense, automotive and aerospace industries

Applications• Games,3D Graphics,Video,Audio

– Image manipulation; Video processing, encoding, decoding

• DSP (Digital Signal Processing)– FFT (e.g. SETI); Distributed DSP

• Digital Rights Management– Cryptography; Secure data processing

• Scientific Calculations– Linear system solvers; Linear algebra; PDE

• Super Computing• Servers (Commercial databases)• Stream Processing Applications

– Serial use of SPE blocks (e.g. Digital TV)

Applications

Outline

• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References

Software Aspects

According to Experts:• Programming the Cell processor requires new

tools & new programming paradigm– Because SPE programs should be self-contained

with data and instruction bundles

• For a game console, programmers will craft custom optimized code. The next challenge for the STI is to find a way to make this architecture accessible to programmers beyond game developers

• Cell is "OS neutral" and supports multiple OS simultaneously

Software Aspects• Tool chain for Cell is built on PowerPC Linux

– Early availability of SIMD-optimized compilers

– Development of high-performance graphics and media libraries for the Broadband Architecture entirely in C

– CELL team developed the first SPU compiler

– Development of an advanced parallelizing compiler with auto-SIMDization features based on IBM XL compiler technology

Outline

• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References

Marketing & NEWS

• “Cell is basically a vector supercomputer on a chip”, we present the 2004 Microprocessor Report Analysts’ Choice Award for Best Technology to the Cell Processor

• IBM is working with companies to integrate Cell microprocessor into third-party products

• The companies are working with open-source compiler developers to create software development tools for programmers

Marketing & NEWS

• Sony PlayStation™ 3• Cell Processor running at 3.2Ghz

– 7 special purpose 3.2Ghz processors– 218 gigaflops of performance

• 256Mb XDR main RAM at 3.2 GHz• 256Mb of GDDR VRAM at 700Mhz• Support for seven Bluetooth controllers• Supports Blu-ray DVD format• System Floating Point Performance of 2 teraflops• Communication Ethernet, Wi-Fi IEEE 802.11,

Bluetooth • Output in HDTV resolution up to 1080p as

standard

Marketing & NEWS

• Cell Processor Based Workstation (CPBW)

• From Sony Group and IBM• First Prototype “Powered On”• 16 TeraFlops in a rack (est.)• Optimized for Digital Content Creation

– Computer entertainment– Movies– Real-time rendering– Physics simulation

• Affordable by Small Businesses (and Individuals)

Marketing & NEWS

• CELL Industries• Our Objective : Distributing Cell Power

• Facilitate small-scale supercomputer applications for Cell

• Cell-based systems– affordable for individuals and small to medium-sized

businesses

• Our Cell PCI-x plug-in card, xpac-zero– fastest and most economical way for people to get their

hands on some real computing power

• Uses Cell as a general-purpose numerical accelerator– The xpac-zero card acts much like a video card

Outline

• Development History• Specifications & Architecture• Applications• Software Aspects• Marketing & NEWS• References

References

• IBM, Sony, Toshiba papers in ISSCC’05– “A Streaming Processing Unit for a CELL

Processor”, B. Flachs et. al. – “The Design and Implementation of a First-

Generation CELL Processor”, D. Pham et. al.

• “Microprocessor Report”,Reed Electronics Group, 2005, Jan. 31 & Feb. 14

• “IBM’s Cell Processor : The next generation of computing?”,D.K. Every, Shareware Press, Feb. 2005

References

• “Power Efficient Processor Architecture and The Cell Processor”, H.P. Hofstee, HPCA-11 2005

• “Power Efficient Processor Design and the Cell Processor”, IBM, 2005

• “Introducing the IBM/Sony/Toshiba Cell Processor“,J. H. Stokes, http://arstechnica.com/

• “Cell Architecture Explained”,N. Blachford, http://www.blachford.info/

Thank you