research on reconfigurable computing using impulse c carmen li shen mentor: dr. russell duren...
TRANSCRIPT
Research on Research on Reconfigurable Reconfigurable
Computing Using Impulse Computing Using Impulse CC
Carmen Li ShenMentor: Dr. Russell Duren
February 1, 2008
Presentation Overview
• Background Information
• Introduction
• Impulse C
• Current Work
• Conclusion & Future Research
• Questions
Background Information
• Reconfigurable computing
• Field Programmable Gate Arrays (FPGAs)
• Hardware Description Languages (HDLs): – Verilog– VHDL
• C++ and C-based software programming languages: – System C – Impulse C
Reconfigurable Computing
• Employing programmable logic devices where the hardware-based logic itself is being modified
• Reprogram hardware vs. modifying the program that use a fixed hardware configuration
• Programming FPGAs vs. Von Neumann Computers
– Reconnecting internal gates to modify the hardware
– The hw is optimized to perform one function
– Vs. changing software running on a processor
Image provided by: http://www.fhpca.org/images/Maxwell_small.jpg
Field Programmable Gate Array
Custom Circuitry
μProc
RAM
I/O
Microprocessor
• User I/O
• TCP/IP
• Control & Test Benches
Custom Circuitry
• Complex calculations (e.g. NN, DSP)
FPGA
Image provided by: http://www.nuhorizons.com/products/NewProducts/POQ13/xilinx.html
SRC-6e Hardware Architecture
Features:• 2 XC2V6000 FPGA
• 288 MACs , BRAMs
• 2 Pentium 3
• 24MB of SRAM
• 64-bit ports
• Cost ~ $300,000
Intel® μP
L2
MIOC
PCI CommonMemory
SNAP
Controller
On-Board Memory (24 MB)
FPGA
Intel® μP
L2
μP Board
FPGA
6x 800 MB/s
6x 800 MB/s
MAP
Chain Port
800 MB/s
315/195 MB/s
Chain Port
800 MB/s
XUP Virtex II Pro Platform
Features:• XC2VP30 FPGA
• 136 MACs , BRAMs
• 2 PowerPC
• 256 MB DDR SDRAM
• 10/100 Ethernet
• SATA connectors
• Serial, JTAG, audio, video, USB, etc. ports
• Cost ~ $300 - $1,600
Research
• Our research:
– Impulse C
– Multiple FPGAs
• Methodology:
– Implement a calculation-intensive program
– Compare to previous work and the SRC-6e
Image provided by: http://www.gamedev.net/reference/programming/features/vehiclenn/figure1.png
Willis Troy Dr. EisenbarthDr. Duren
Neural Network
• Trained network
• 27 inputs
• 3 Hidden Layers
(with 40 50 & 70 nodes)
• 1200 outputs
• Additions, multiplication, squashing
Hidden Layers
Impulse C
• C-language development tool
• FPGA-accelerated computing
• Function library for parallel programming fully compatible with ANSI C
• CoDeveloper Tools
• Mixed software/hardware
• Cost ~ 3,000
Image provided by: http://www.ilink.co.jp/public/img/product/impulse/imp-c/flow.jpg
Impulse C
• Data movement via streams and shared memory
• Shared memory tradeoff: large but slow– Memory accessed via OPB bus (opb2plb bridge)
• Floating point implementation supported
• Customized instructions– xil_printf (2,953 bytes) vs printf (51,788 bytes)– Does not support type real numbers (floating point) or long-long
types (64 bit)
Impulse C to BitstreamBuild Simulation
Executable
Launch ANSI-C Simulation Executable
Generate HDL
select a platform target
Export Generated Hardware
Export Generated Software
Xilinx Platform Studio Project (EDK)
Xilinx Platform Studio Project (EDK)
Image Filter DMA Example
Current Implementation
Inputs & 3 Hidden Layers
600 Output Nodes
600 Output Nodes
Neural Network
Big_NeuralNet_sw.cSoftware Processes
Memory Object
Big_NeuralNet_hw.cHardware Process
Configuration Function
Sigmoid functiony(x) = -y0”*(x – x0)2 + y0’*(x – x0) + y0
Projects Comparison
Similarities
• Reconfigurable Computing
• Neural Network and Weights
• FPGAs
Differences
• Implementation using VHDL vs. C
• Fixed point vs. Floating point
• Platforms / Architectures
Timing Results for Neural Network Solutions
Architecture Language Execution Time
PC – Pentium 4 C 280 µs
SRC-6E
Carte C (parallel) 572.55 µs
VHDL (serial) 1000 µs
VHDL (parallel node) 250 µs
VHDL (parallel input) 15 µs
Baylor RC Cluster
VHDL (1 board) 15 µs
VHDL (3 boards) 6.7 µs
Impulse C (3 boards) TBD
Impulse C (16 boards) TBD
2x
2x
Conclusion & Future Work
• Reconfigurable Computing
• SRC-6e vs. XUP boards architectures
• NN Calculations & Timing Results
• Explore different levels of parallelism across multiple FPGA boards using multiple communication schemes
• Ethernet, MPI, SATA Interfaces
• RC cluster of Virtex II PRO Willis Troy Dr. EisenbarthDr. Duren
Acknowledgements
• Dr. Russell Duren
• Dr. Steven Eisenbarth
• Willis Troy
Questions