evaluating the imagine stream processor
DESCRIPTION
Evaluating the Imagine Stream Processor. Jung Ho Ahn , William J. Dally, Brucek Khailany , Ujval J. Kapasi , and Abhishek Das ISCA 2004. Motivation. Provide efficiency of an ASIC Provide flexibility of a programmable processor Simplify special-purpose processor design - PowerPoint PPT PresentationTRANSCRIPT
Evaluating the Imagine Stream Processor
Jung Ho Ahn, William J. Dally, Brucek Khailany, Ujval J. Kapasi, and Abhishek Das
ISCA 2004
Motivation• Provide efficiency of an ASIC• Provide flexibility of a programmable processor• Simplify special-purpose processor design • Lower special-purpose processor design cost• Provide better applicability• Target media applications
Stream Architecture
Development Board
PowerPC, 150 MHz2 x Imagine, 200 MHzFPGA Bridge, 66 MHz
256MB of SDRAM / Imagine, 100 MHz
Applications
Mapping
Execution on a Single Stream
…
…
…
…Iteration n
Iteration 1
…
……
Output Stream
Input Stream
SRFKernel 1
Execution of Multiple KernelsSRF Kernel 1
Stream 1
Stream 2
Stream 3
…
…
…
processing…
…
…
Kernel 2
processing…
…
…
Kernel 3
processing…
…
…
Stream 4
…
Application PerformanceGOPS: 18%
GFLOPS: 60%
Sources of Overhead
Stream Length Effects
Access Pattern Effects
Energy Efficiency
Energy consumption per FLOP :(when normalized to 0.13um 1.2V process)
Imagine @ 200 MHz:277pJ/FLOP
TI C67x DSP @ 225MHz:889pJ/FLOP (3.2x more)
Intel Pentium M @ 1200GHz:3600pJ/FLOP (13x more)
Memory Bandwidth Requirement
Host Processor Bandwidth Requirement
Programming Model
Compiler OptimizationsStream Ordering
Compiler OptimizationsSRF Overlapping and Packing
Compiler OptimizationsStrip-mining
Compiler OptimizationsLoop Unrolling and Software Pipelining
Conclusions
• Provides performance close to that of ASIC and flexibility via programming
• Can sustain between 16% and 60% of the peak arithmetic performance
• Exposed 2-level register file allows compiler to exploit locality
• Broader applicability• Requires considerable programming effort• Limited to media applications with regular control-
flow
Collab Questions
• How does the performance compare to other processors? (Dan, Marko, Jason, Prateeksha, Chris)
• What is the compiler efficiency? (Mario, Liang)• How were the design decisions motivated? (Jing,
Marisabel)• How does the programming model compare to that
of GPUs? (Greg)
Kernels