accelerated data processing on soc with fpga · marek vasut i software engineer at denx s.e. since...
TRANSCRIPT
![Page 1: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/1.jpg)
Accelerated Data Processing on SoC with FPGA
Marek Vasut <[email protected]>
June 3, 2015
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 2: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/2.jpg)
Marek Vasut
I Software engineer at DENX S.E. since 2011I Embedded and Real-Time Systems Services, Linux kernel and
driver development, U-Boot development, consulting, training.
I Versatile Linux kernel hacker
I Custodian at U-Boot bootloader
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 3: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/3.jpg)
Structure of the talk
I Motivation
I Introduction to FPGAs
I Your first FPGA data cruncher
I Interfacing with Linux
I Speeding things up
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 4: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/4.jpg)
Why listen to this talk
I Get fresh ideas
I Learn something new
I Reduce energy envelope of your device
I Process data quickly and efficiently
You won’t learn marketing stuff or random benchmark numbers
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 5: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/5.jpg)
FPGA
I Abbr. for Field Programmable Gate Array
I Programmable logicI Usually used for:
I Digital Signal Processing (DSP)I Data crunchingI Custom hardware interfacesI ASIC prototypingI . . .
I Common vendors – Xilinx, Altera, Lattice, Microsemi. . .
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 6: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/6.jpg)
Internal structure
W.T.Freemanhttp://www.vision.caltech.edu/CNS248/Fpga/fpga1a.gif
CC BY 2.5: http://creativecommons.org/licenses/by/2.5/
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 7: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/7.jpg)
FPGA and the outside
I FPGA has plenty of I/O options:I Regular I/O with configurable voltage levelsI Differential I/OI High-speed SerDesI . . .
I Usual interface with host:I Stand-alone FPGA, usually PCIe, USB, . . .I FPGA on a CPU bus (PowerPCs, ie. ML507)I Built into CPU (SoCFPGA/Zynq), usually AMBA/AXI
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 8: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/8.jpg)
Programming the FPGA
I Each vendor has his own tools – Altera Quartus, Xilinx Vivado
I FPGA tools often closed source :-(
I FPGA bitstream format is closed :-(
I Basic vendor tools available free of charge
I Sufficient amount of functionality to implement data cruncher
I Vendor tools needed for place-and-route and assembler
I Third-party tools for synthesis are available
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 9: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/9.jpg)
Comparison to a GPU – I.
CPU GPU FPGA
Toolchain Open Closed ClosedHW design Proprietary Proprietary Your ownHW units Fixed Fixed As neededI/O Limited None As needed
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 10: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/10.jpg)
HDL – Hardware Description Language
I FPGA content is written in HDLs
I HDL – Hardware Description Language
I HDLs are used to model behavior of logic block
I Two major HDLs – VHDL and Verilog
I Tools often allow seamless mixing of HDLs
I Many readily-available cores under acceptable license:OpenCores http://opencores.org/
OpenCores projects http://opencores.org/projects
CERN Open HW Repo http://www.ohwr.org/
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 11: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/11.jpg)
Modeling behavior
HW Behavior modeling vs. Writing CPU code:
I Vastly different and confusing to software people :-)
I CPU: Programmer implements an algorithm
I FPGA: Programmer implements hardware to run the algorithm
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 12: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/12.jpg)
Implicit parallelism
I Everything in a block is executed in parallel
I All conditions in a conditional statement are tested in parallelif, case – differs from C
1 if (foo == 1) bar <= 1’b0;
2 else bar <= 1’b1;
I Blocks are executed in parallel
1 begin
2 x <= 1’b0;
3 y <= 1’b1;
4 end
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 13: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/13.jpg)
Combinatorial vs. Sequential logic
I Combo – imm. value of var is the product of the imm. inputsof the function:
assign Z = X ^ Y;
I Seq logic is sync to clock (involves a latch)
always @(posedge clk)
Z <= DAT;
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 14: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/14.jpg)
Verilog example
I Looks like C, based on C, but behaves differently
I Used a lot in Europe
I Example: CRC5, polynomial x5 + x2 + x0
I Example modified from:http://www.asic-world.com/examples/verilog/
serial_crc.html
1 module crc5 (
2 /* SYSTEM I/O */
3 input reset,
4 input clk,
5 /* CRC5 I/O */
6 input data,
7 output reg [4:0] crc
8 );
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 15: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/15.jpg)
Verilog example II
1 always @(posedge clk) begin
2 if (reset) begin
3 crc <= 5’b00000;
4 end else begin
5 crc[0] <= data ^ crc[4];
6 crc[1] <= crc[0];
7 crc[2] <= crc[1] ^ data ^ crc[4];
8 crc[3] <= crc[2];
9 crc[4] <= crc[3];
10 end
11 end
12 endmodule
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 16: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/16.jpg)
VHDL example
I Distinctive syntax based on Ada
I More explicit typing system than Verilog
I Used a lot in the USA
I Example: CRC5, polynomial x5 + x2 + x0
I Example from http://outputlogic.com/?page_id=321
1 library ieee;
2 use ieee.std_logic_1164.all;
3
4 entity crc is
5 port ( data_in : in std_logic_vector (0 downto 0);
6 rst, clk : in std_logic;
7 crc_out : out std_logic_vector (4 downto 0));
8 end crc;
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 17: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/17.jpg)
VHDL example II
1 architecture imp_crc of crc is
2 signal lfsr_q: std_logic_vector (4 downto 0);
3 signal lfsr_c: std_logic_vector (4 downto 0);
4 begin
5 crc_out <= lfsr_q;
6 lfsr_c(0) <= lfsr_q(4) xor data_in(0);
7 lfsr_c(1) <= lfsr_q(0);
8 lfsr_c(2) <= lfsr_q(1) xor lfsr_q(4) xor data_in(0);
9 lfsr_c(3) <= lfsr_q(2);
10 lfsr_c(4) <= lfsr_q(3);
11
12 process (clk,rst) begin
13 if (rst = ’1’) then
14 lfsr_q <= b"11111";
15 elsif (clk’EVENT and clk = ’1’) then
16 lfsr_q <= lfsr_c;
17 end if;
18 end process;
19 end architecture imp_crc;
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 18: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/18.jpg)
Comparison to a GPU – II.
CPU GPU FPGA
Languages All OpenCL, CUDA OpenCL, HDLsDesign paradigm Sequential Seq/Par ParallelDesign granularity Instruction Instruction GateOpt. possibility Low Low HighOpt. difficulty Low Low High
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 19: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/19.jpg)
Development and debugging
I Simulation (on developer’s system)
I Probing (on-target)
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 20: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/20.jpg)
Simulation
I Simulation tools:Icarus Verilog http://iverilog.icarus.com/
ghdl http://home.gna.org/ghdl/
ModelSim http://en.wikipedia.org/wiki/ModelSim/
I Write testcase for a module in an augmented HDL
I Execute testcase
I Observe results
I View waveformsI Decode and inspect bussesI Trigger on complex conditionsI . . .
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 21: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/21.jpg)
Probing
I Used to observe design on target
I Think of this as a bus analyzer in the FPGA
I Probing tools (ie. SignalTap)
I Design is augmented with a probing IP, FPGA isreprogrammed
I Probing is controlled through a debug probe attached to theFPGA (JTAG or similar)
I Probe internal signals, observe waveforms, trigger on complexconditions. . .
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 22: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/22.jpg)
Structuring the design
I HDL files – lowest in the hierarchy
I IP block – collection of HDL files with an interface
I FPGA design – collection of IP blocks
I Vendor tools contain tools to assemble IP blocks into FPGAdesign – ie. Altera QSys.
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 23: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/23.jpg)
Comparison to a GPU – III.
CPU GPU FPGA
Simulation QEMU ? Icarus, ModelSimDebugger GDB CUDA-GDB, CodeXL SignalTap
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 24: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/24.jpg)
Linux interface
I No standard in-kernel FPGA interface due to variance ofdesigns
I Attempts do exist:I Device Tree Overlay(s) stored in FPGAI SDB –
http://www.ohwr.org/projects/fpga-config-space
I Usually there are control registers in the FPGA design
I Usually the DMA is involved (either on FPGA or CPU side)I Two options for controlling the FPGA:
I Custom Linux kernel driverI Userspace utility
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 25: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/25.jpg)
Custom kernel driver
I Driver written to match the particular FPGA bitstream
I Driver can crash the host machine if written badly :-(
I Driver usually exports custom userland I/O
I splice(2)
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 26: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/26.jpg)
Userland approach
I Userland accesses the FPGA registers via uio
I The uio is like a restricted devmem
I In case DMA is involved, kernel module to prepare the datafor the DMA (ie. assure cache coherency) is needed.
I CMA might be used to export large slab of custom kernelmemory to user
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 27: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/27.jpg)
Performance tuning
I FPGA is clocked at 50...200MHz, not much
I The fabric is usually rated at much more!
I Synthesize PLL, which generates faster clock
I Clock your design from the PLL
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 28: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/28.jpg)
Design tricks
I Use combo logic where possible
I Create pipelined designs and make sure the pipeline issaturated
I Synthesize multiple units and compute in parallel
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 29: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/29.jpg)
Altera OpenCL
I OpenCL EP 1.0 implementation for Altera FPGAs
I Easier for SW developers
I Only thin shim must be ported to the FPGA
I Closed source compiler :-(
I Even needs a license :-C
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 30: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/30.jpg)
Conclusion
I FPGAs are strong in parallel, pipelined workloads
I FPGAs give the user almost gate-level performance
I FPGAs are manufactured using bleeding-edge process
I FPGAs deploy excellent Performance-per-Watt
I There is no simple unified Linux interface
I Developing an FPGA content can be difficult
I The FPGA ecosystem is still rather closed
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA
![Page 31: Accelerated Data Processing on SoC with FPGA · Marek Vasut I Software engineer at DENX S.E. since 2011 I Embedded and Real-Time Systems Services, Linux kernel and driver development,](https://reader034.vdocument.in/reader034/viewer/2022042209/5eacb3f7dfa6b977944b7d3d/html5/thumbnails/31.jpg)
The End
Thank you for your attention!
Contact: Marek Vasut <[email protected]>
Marek Vasut <[email protected]> Accelerated Data Processing on SoC with FPGA