ece699 lecture 7 - ece.gmu.edu · recommended videos & slides m.s. sadri, zynq training •...

50
Efficient Communication Between Hardware Accelerators and PS ECE 699: Lecture 7

Upload: doanliem

Post on 26-Apr-2019

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Efficient Communication Between

Hardware Accelerators and PS

ECE 699: Lecture 7

Page 2: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Recommended Videos & Slides M.S. Sadri, ZYNQ Training

•  Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging •  Lesson 7 – AXI Stream Interface In Detail (RTL Flow) •  Lesson 9 – Software development for ZYNQ using Xilinx SDK (Transfer data from ZYNQ PL to PS)

Xilinx Advanced Embedded System Design on Zynq

•  Memory Interfacing (see Resources on Piazza)

Page 3: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Recommended Paper & Slides

M. Sadri, C. Weis, N. Wehn, and L. Benini, “Energy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ,” Proc. 10th FPGAworld Conference, Stockholm 2013, available at http://www.googoolia.com/wp/2014/03/07/my-cv/

Page 4: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Mapping of an Embedded SoC Hardware Architecture to Zynq

Source: Xilinx White Paper: Extensible Processing Platform

Page 5: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Simple Custom Peripheral

Page 6: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Simple Custom Accelerator

Page 7: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Example of a Custom Accelerator

Page 8: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Block Diagram of the Pattern Counter

Page 9: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Ways of Implementing AXI4 Slave Units

Page 10: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Pixel Processing Engine

Page 11: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

PS-PL Interfaces and Interconnects

Source: The Zynq Book

Page 12: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

•  GP ports are designed for maximum flexibility

•  Allow register access from PS to PL or PL to PS

•  Good for Synchronization

•  Prefer ACP or HP port for data transport

General-Purpose Port Summary

Page 13: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

•  HP ports are designed for maximum bandwidth access to external memory and OCM

•  When combined can saturate external memory and OCM bandwidth – HP Ports : 4 * 64 bits * 150 MHz * 2 = 9.6 GByte/sec – external DDR: 1 * 32 bits * 1066 MHz * 2 = 4.3 GByte/sec – OCM : 64 bits * 222 MHz * 2 = 3.5 GByte/sec

•  Optimized for large burst lengths and many outstanding transactions

•  Large data buffers to amortize access latency •  Efficient upsizing/downsizing for 32 bit accesses

High-Performance Port Summary

Page 14: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Using Central DMA

Page 15: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: Xilinx Advanced Embedded System Design on Zynq

•  High-bandwidth Direct Memory Access (DMA) between a memory-mapped source address and a memory-mapped destination address

•  Optional Scatter Gather (SG)

•  Initialization, status, and control registers are accessed through an AXI4-Lite slave interface

Central DMA

Page 16: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Using Central DMA in the Scatter-Gather Mode

Page 17: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Scatter Gather DMA Mode

Source: Symbian OS Internals/13. Peripheral Support

Page 18: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Custom Accelerator with the Master AXI4 Interface

Page 19: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Ways of Implementing AXI4 Master Units

Page 20: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

AXI4-Full

Page 21: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Image Rotation Unit

Page 22: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

FFT Unit

Page 23: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Sample Generator

Page 24: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

PL-PS Interfaces

Page 25: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Accelerator Architecture with DMA

Source: Building Zynq Accelerators with Vivado HLS, FPL 2013 Tutorial

Page 26: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

AXI DMA-based Accelerator Communication

Write to Accelerator •  processor allocates buffer •  processor writes data into buffer •  processor flushes cache for buffer •  processor initiates DMA transfer

Read from Accelerator •  processor allocates buffer •  processor initiates DMA transfer •  processor waits for DMA to complete •  processor invalidates cache for buffer •  processor reads data from buffer

Page 27: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

/* Flush the SrcBuffer before the DMA transfer */ Xil_DCacheFlushRange((u32)TxBufferPtr, BYTES_TO_SEND); . . . . . . . . /* Invalidate the DstBuffer after the DMA transfer */ Xil_DCacheInvalidateRange((u32)RxBufferPtr, BYTES_TO_RCV);

Flushing and Invalidating Cache

Page 28: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

1.  Start the MM2S channel running by setting the run/stop bit

to 1, MM2S_DMACR.RS = 1.

2.  If desired, enable interrupts by writing a 1 to

MM2S_DMACR.IOC_IrqEn and MM2S_DMACR.Err_IrqEn.

3.  Write a valid source address to the MM2S_SA register.

4.  Write the number of bytes to transfer in

the MM2S_LENGTH register.

The MM2S_LENGTH register must be written last.

All other MM2S registers can be written in any order.

Simple DMA Transfer Programming Sequence for MM2S channel (1)

Page 29: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

1.  Start the S2MM channel running by setting the run/stop bit

to 1, S2MM_DMACR.RS = 1.

2.  If desired, enable interrupts by by writing a 1 to

S2MM_DMACR.IOC_IrqEn and S2MM_DMACR.Err_IrqEn.

3.  Write a valid destination address to the S2MM_DA register.

4.  Write the length in bytes of the receive buffer in the

S2MM_LENGTH register.

The S2MM_LENGTH register must be written last.

All other S2MM registers can be written in any order.

Simple DMA Transfer Programming Sequence for S2MM channel (1)

Page 30: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

/* Transmit a packet */ Status = XAxiDma_SimpleTransfer(&AxiDma,(u32) TxBufferPtr, BYTES_TO_SEND, XAXIDMA_DMA_TO_DEVICE); if (Status != XST_SUCCESS) { return XST_FAILURE; } while (!TxDone);

. . . . . .

/* Receive a packet */ Status = XAxiDma_SimpleTransfer(&AxiDma,(u32) RxBufferPtr,

BYTES_TO_RCV, XAXIDMA_DEVICE_TO_DMA); if (Status != XST_SUCCESS) { return XST_FAILURE; } while (!RxDone);

Transmitting and Receiving a Packet Using High-Level Functions

Page 31: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

/* Transmit a packet */ Xil_Out32(AxiDma.TxBdRing.ChanBase + XAXIDMA_SRCADDR_OFFSET, (u32) TxBufferPtr); Xil_Out32(AxiDma.TxBdRing.ChanBase + XAXIDMA_CR_OFFSET, Xil_In32(AxiDma.TxBdRing.ChanBase +XAXIDMA_CR_OFFSET) | XAXIDMA_CR_RUNSTOP_MASK); Xil_Out32(AxiDma.TxBdRing.ChanBase + XAXIDMA_BUFFLEN_OFFSET, BYTES_TO_SEND); while (TxDone == 0);

Transmitting a Packet Using Lower-Level Functions

Page 32: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

/* Receive a packet */ Xil_Out32(AxiDma.RxBdRing.ChanBase + XAXIDMA_DESTADDR_OFFSET, (u32) RxBufferPtr); Xil_Out32(AxiDma.RxBdRing.ChanBase+XAXIDMA_CR_OFFSET, Xil_In32(AxiDma.RxBdRing.ChanBase+XAXIDMA_CR_OFFSET) | XAXIDMA_CR_RUNSTOP_MASK); Xil_Out32(AxiDma.RxBdRing.ChanBase + XAXIDMA_BUFFLEN_OFFSET, BYTES_TO_RCV); while (RxDone == 0);

Receiving a Packet Using Lower-Level Functions

Page 33: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

PL-PS Interfaces

Page 34: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Accelerator Architecture with Coherent DMA

Source: Building Zynq Accelerators with Vivado HLS, FPL 2013 Tutorial

Page 35: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Coherent AXI DMA-based Accelerator Communication

Write to Accelerator •  processor allocates buffer •  processor writes data into buffer •  processor flushes cache for buffer •  processor initiates DMA transfer

Read from Accelerator •  processor allocates buffer •  processor initiates DMA transfer •  processor waits for DMA to complete •  processor invalidates cache for buffer •  processor reads data from buffer

Page 36: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

•  ACP allows limited support for Hardware Coherency – Allows a PL accelerator to access cache of the Cortex-A9 processors – PL has access through the same path as CPUs including caches, OCM, DDR, and peripherals – Access is low latency (assuming data is in processor cache) no switches in path

•  ACP does not allow full coherency – PL is not notified of changes in processor caches – Use write to PL register for synchronization

•  ACP is compromise between bandwidth and latency – Optimized for cache line length transfers – Low latency for L1/L2 hits – Minimal buffering to hide external memory latency – One shared 64 bit interface, limit of 8 masters

Accelerator Coherency Port (ACP) Summary

Page 37: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

•  Four AXI-based DMA services are provided –  Central DMA (CDMA)

•  Memory-to-memory operations –  DMA

•  Memory to/from AXI stream peripherals –  FIFO Memory Mapped To Streaming

•  Streaming AXI interface alternative to traditional DMA –  Video DMA

•  Optimized for streaming video application to/from memory

Source: Xilinx Advanced Embedded System Design on Zynq

AXI-based DMA Services

Page 38: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: Xilinx Advanced Embedded System Design on Zynq

Streaming FIFO

Page 39: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

•  General AXI interconnect has no support for the AXI stream interface –  axi_fifo_mm_s provides this

facility –  FIFO included

•  Added as all other types of IP are from the IP Catalog

•  Features –  AXI4/AXI4-Lite slave interface –  Independent internal 512B-128KB

TX and RX data FIFOs –  Full duplex operation

Source: Xilinx Advanced Embedded System Design on Zynq

Streaming FIFO

Page 40: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

•  Slave AXI connection –  RX/TX FIFOs –  Interrupt controller –  Control registers

•  Three user-side AXI Stream interfaces

–  TX data –  RX data –  TX control

Streaming FIFO

Page 41: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: Xilinx Advanced Embedded System Design on Zynq

AXI Video DMA Controller

Page 42: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Design Goal

Hardware accelerator capable of working for arbitrary values of parameters lm, ln, lp, defined in software, with the only limitations imposed by the total size and the word size of internal memories.

Page 43: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Passing Parameters to an Accelerator

Option 1: Parameters (e.g., lm, ln, lp) are passed using AXI_Lite

Option 2: Parameters (e.g., lm, ln, lp) are passed in the header of input data Option 3: Parameters inferred from the size of

transmitted input data (not possible in general case of matrix multiplication)

Input size: (2lm+ln + 2lp+lm)*8 Output size: (2lp+ln)*32 (for lm≤16)

Page 44: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Source: M.S. Sadri, Zynq Training

Choosing Optimal Parameters

Page 45: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Energy and Performance Exploration of Accelerator

Coherency Port Using Xilinx ZYNQ

Mohammadsadegh Sadri, Christian Weis, Norbert When and Luca Benini

Department of Electrical, Electronic and Information Engineering (DEI) University of Bologna, Italy

Microelectronic Systems Design Research Group, University of Kaiserslautern, Germany

{mohammadsadegh.sadr2,luca.benini}@unibo.it, {weis,wehn}@eit.uni-kl.de

ver0

Page 46: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini – Energy and performance exploration of ACP Using ZYNQ

46

Source Image (image_size

bytes)

@Source Address

FIR

Result Image (image_size

bytes)

@Dest Address

read process

write

Loop: N times Measure execution interval.

FIFO: 128K

128K

Selection of Pakcets: (Addressing) - Normal - Bit-reversed

Allocated by: kmalloc

dma_alloc_coherent Depends on the memory

Sharing method

Image Sizes: 4KBytes 16K 65K 128K 256K 1MBytes 2MBytes

We define : Different methods to

accomplish the task.

Measure : Execution time & Energy.

Processing Task Definition

Page 47: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini – Energy and performance exploration of ACP Using ZYNQ

47

Memory Sharing Methods

Accelerator

ACP SCU L2 DRAM

•  ACP Only (HP only is similar, there is no SCU and L2)

•  CPU only (with&without cache) •  CPU ACP (CPU HP similar)

Accelerator

ACP SCU L2 DRAM

CPU

1

2

ACP --- CPU --- ACP ---

Page 48: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini – Energy and performance exploration of ACP Using ZYNQ

48

Speed Comparison

256K 1MBytes 128K 64K 16K 4K

ACP Loses!

298MBytes/s 239MBytes/s

CPU OCM between CPU ACP & CPU HP

Page 49: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini – Energy and performance exploration of ACP Using ZYNQ

49

Energy Comparison

CPU only methods : worst case!

CPU ACP ; always better energy than CPU HP0 When the image size grows CPU ACP converges CPU HP0

CPU OCM always between CPU ACP and CPU HP

Page 50: ECE699 lecture 7 - ece.gmu.edu · Recommended Videos & Slides M.S. Sadri, ZYNQ Training • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging • Lesson 7 – AXI

Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini – Energy and performance exploration of ACP Using ZYNQ

50

Lessons Learned & Conclusion

•  If a specific task should be done by accelerator only: •  For small arrays ACP Only & OCM Only can be used •  For large arrays (>size of L2$) HP Only always acts better.

•  If a specific task should be done by the cooperation of CPU and accelerator:

•  CPU ACP and CPU OCM are always better than CPU HP in terms of energy •  If we are running other applications which heavily depend on caches, CPU OCM and then CPU HP are preferred!