california state university, northridge hardware

56
California State University, Northridge Hardware Acceleration of Edge Detection Using HLS A graduate project submitted in partial fulfilment of requirements For the degree of Master of Science in Computer Engineering By Pavan Kumar Murthy May 2019

Upload: others

Post on 12-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

California State University, Northridge

Hardware Acceleration of Edge Detection Using HLS

A graduate project submitted in partial fulfilment of requirements

For the degree of Master of Science in

Computer Engineering

By

Pavan Kumar Murthy

May 2019

ii

The graduate student Pavan Kumar Murthy is approved:

_________________________ _______________

Dr. John Valdovinos Date

_________________________ _______________

Dr. Xiaojun (Ashley) Geng Date

_________________________ _______________

Dr. Shahnam Mirzaei, Chair Date

California State University, Northridge

iii

Acknowledgments

Firstly, I would like to convey my sincere gratitude and thanks to my graduate chair

Dr. Shahnam Mirzaei for providing his invaluable technical expertise that helped me

out in case of any technical issues and for the constant motivation, which helped me

in accomplishing the project objective.

I'd also like to extend my wholehearted thanks to all teaching and non-teaching staff

for imparting the necessary knowledge and for addressing all the technical needs that

helped in completion and submission of the project on time.

Lastly, I would like to thank my parents and friends for their constant support that

helped me throughout this long process.

iv

Table of Contents

Signature

ii

Acknowledgements

iii

List of Figures

vi

Abstract

viii

Chapter 1. An Overview on Image Processing

1.1 Edge Detection

1.1.1 Sobel Operator

1.1.1.1 Pseudo Code for Sobel Edge Detection

1

1

2

4

Chapter 2. System-on-Chip Design Stages

2.1 Design

2.2 Design Verification

2.3 Optimization

2.4 Fabrication

5

5

6

6

7

Chapter 3. Hardware Acceleration Using HLS

3.1 High-Level Synthesis

3.1.1 Phases of HLS

8

8

9

Chapter 4. Zybo Z7-20: Zynq-7000 ARM/FPGA SoC Development Board

4.1 Features of Zybo z7-20

4.2 Xilinx APSoC Architecture

11

11

12

Chapter 5. Image Processing Platform on Zybo Z7-20

5.1 Components in the Block Diagram

14

14

Chapter 6. Sobel Filter IP Creation Using Vivado HLS

6.1 Implementation

24

24

v

6.2 Interfacing

6.3 C Synthesis and Optimization

25

25

Chapter 7. Software Application

27

Chapter 8. Results and Analysis

8.1 C and Co Simulation Results

8.2 Resource Utilization

28

28

29

Chapter 9. Conclusion

31

Bibliography

32

Appendix

34

vi

List of Figures

Figure 1 Sobel Convolution Filters

2

Figure 2 Sobel pseudo convolution filter applied to an image

3

Figure 3 Sobel filter pixel calculation example

3

Figure 4 Gradient magnitude calculation

4

Figure 5 SoC Design Flow

6

Figure 6 HLS Scheduling and HLS Binding

9

Figure 7 HLS Control Logic Extraction Phase

10

Figure 8 Zybo Z7-20

11

Figure 9 Key Features

11

Figure 10 Block Diagram of Xilinx APSoC architecture

12

Figure 11 Zynq Processing System

15

Figure 12 DVI2RGB

16

Figure 13 AXI GPIO

17

Figure 14 Video Timing Controller (Detector)

18

Figure 15 Video In to AXIS

19

Figure 16 AXI-4 Stream Subset Converter

20

Figure 17 VDMA

21

Figure 18 Video Timing Controller (Generator

22

Figure 19 RGB2DVI

23

Figure 20 Dataflow Pipelining

26

Figure 21 Input Image for C and Co Simulation

28

Figure 22 Result of C and Co Simulation

28

vii

Figure 23 Parallel Sobel Operations

29

Figure 24 Performance Estimates

29

Figure 25 Resource Utilization

30

Figure 26 Apeman Action Camera

34

Figure 27 Image Processing Platform (Part 1)

41

Figure 28 Image Processing Platform (Part 2)

42

Figure 29 Image Processing Platform (Part 3)

43

Figure 30 HLS IP Integrated within the Image Processing Platform (Part 1)

44

Figure 31 HLS IP Integrated within the Image Processing Platform (Part 2)

45

Figure 32 HLS IP Integrated within the Image Processing Platform (Part 3)

46

Figure 33 HLS IP Integrated within the Image Processing Platform (Part 4)

47

Figure 34 HLS IP Integrated within the Image Processing Platform (Part 5)

48

viii

Abstract

Hardware Acceleration of Edge Detection Using HLS

By

Pavan Kumar Murthy

Master of Science in Computer Engineering

The hardware Acceleration of Edge Detection Using HLS report will describe the

implementation of Sobel edge detection algorithm on the programmable logic as well

as analysis of the resources utilised by the Zynq FPGA. To implement edge detection,

the HLS functions are leveraged to create an image processing platform. HLS allows

us to work at higher levels of abstraction when we want to develop an FPGA

application, thereby reducing the time required for developing the solution and

reducing the costs in case of a commercial project.

The system mainly consists of three modules namely the image processing platform

which includes the HDMI input to output that acts as a base for implementing image

processing algorithms, the Sobel edge detector IP developed using Vivado HLS, the

software application required to facilitate image capture from the camera module

which helps in accomplishing real-time edge detection.

The hardware module required for this project is implemented using Vivado Design

Suite, which includes Vivado 2017.4, Vivado HLS 2017.4.

1

CHAPTER 1

An Overview on Image Processing

Image processing is the technique of applying complex computer algorithms on input

images to get an enhanced image or to extract some features from an image. It is a

kind of signal processing in which the input signal is an image and the output signal

might be an image or a characteristic of the image. One of the most common and

basic operations in image processing is to apply edge detection operators.

1.1 Edge Detection

Edge detection is a mathematical process of image processing in which the input

image is digitally processed, and the location at which the image brightness rapidly

varies is identified. The edge detection provides useful information such as the shape,

size or the location of a particular object of interest.

One of the many methods of edge detection that is implemented in this project is the

edge detection using Sobel operator.

2

1.1.1 Sobel Operator

The Sobel operator typically creates a grey scale image which helps in identifying the

edges as shades of grey or white. Here the edges are identified by detecting the

change in the gradient values in both horizontal and vertical directions. To calculate

magnitude of the gradients convolution filters are applied to the original image and

the result of the convolution is combined to evaluate the magnitude of the gradient.

A convolution filter is much smaller when compared to the whole input image and

because of that the filter is slid over a little portion of the input image, to change the

pixel’s value and then shift one pixel to the right and this continues until it the end of

that row is reached. The process starts again at the beginning of the next row. Below

is the example of Sobel convolution filters.

Figure 1 Sobel Convolution Filters

3

Below is an example of the Sobel operation on an input image. The example shows a

conventional Sobel filter that is applied at the top left corner of the image, and the

resulting pixel values at the point of interest in the image is calculated using the

formula shown below. The center of the filter is moved on top of the pixel that needs

to be manipulated from the input image. The values of I and J are used to move the

file pointer to perform multiplication operation.

Figure 2 Sobel pseudo convolution filter applied to an image

Figure 3 Sobel filter pixel calculation example

4

The gradient magnitude calculation formula is given below:

Figure 4 Gradient magnitude calculation

1.1.1.1 Pseudo Code for Sobel Edge Detection

a. Input the Sample image

b. Apply the convolution filters in both Gx and Gy directions.

c. Apply Sobel edge detection and the gradient on the input image

d. On the input image, apply convolution filters in Gx and Gy directions

separately.

e. Combine the magnitudes of the gradient calculated in both the directions to get

the absolute magnitude of the gradient.

f. The resulting magnitude of the gradient is the image with the edges in it.

5

CHAPTER 2

System-on-Chip Design Stages

2.1 Design

A system on chip includes both hardware functioning units like the microprocessor

which can be programmed using high level code like C/C++ and communication

systems like connect, control, direct and interface between the functional modules.

Basic idea behind SoC design is the development of both the hardware and software

modules at the same time. While designing the system, the optimizations and

constraints are also taken into account. The other key feature involved in the SoC

Design is the verification process.

After the architecture of the SoC is defined, the hardware elements required are

created at an abstract level using Hardware Description Language to define the

behaviour of the system or using C++ and synthesize the code into RTL using HLS

tools. These elements created are connected using HDL to create the entire SoC

design.

6

Figure 5 SoC Design Flow

2.2 Design Verification

After the SoC is designed it is required to perform chip level verification to ensure

that the system behaves as required and the process of chip-level verification is called

as functional verification, which contributes to a significant portion of the time and

efforts spent in product development life cycle.

2.3 Optimization

Apart from RTL/Higher-level design, the SoC must operate efficiently, so applying

optimizations is the key in SoC designing. The task of optimization considers factors

like Power Consumption, Throughput, Latency and resource utilization.

7

2.4 Fabrication

The optimization factors mentioned earlier is the basis for the place/route tool to

transform the designer’s intent into the SoC design. While transfomation is taking

place, the analysis of the design takes place using various methods of timing analysis,

simulations, and Logic analysers to confirm that the constraints related to timing,

signal integrity, etc. specified by the designer are met. Once the physical design

specifications are met, the files that describe each and every layer of the chip are sent

to the foundry’s mask shop for the etching process. These are then sent to the wafer

fabrication plant where the SoC dice is created, packaged and tested. SoCs are

customised using different types of technologies and some of the common ones are

Full Custom ASIC, Standard Cell ASIC and FPGAs.

8

CHAPTER 3

Hardware Acceleration Using HLS

3.1 High-Level Synthesis

High Level Synthesis (HLS) is a robotized configuration process in which the desired

algorithmic behaviour is interpreted and the digital hardware is created which

implements that behaviour. The main advantage is that use of HLS increases the level

of abstraction, while developing an FPGA application which saves time. Apart from

this, HLS also gives better control over optimizing the design architecture by

efficiently building and verifying the hardware.

The logic synthesis usually happens at RTL level, but in case of HLS we only define

the algorithm the system has to implement and the interconnect protocol. The HLS

tool handles the rest of the micro architecture design by transforming any untimed or

partially timed code into a fully synthesisable RTL code which gives a detailed cycle-

by-cycle implementation on the hardware.

Hardware Acceleration is the technique of utilizing computer hardware to perform

certain functions more efficiently than software-based implementation. Hardware

acceleration is peformed using Vivado HLS tool and its libraries such as:

HLS_OpenCV which enables us to leverage the OpenCV framework

HLS_Video library allows us to utilize the necessary image processing

functions.

9

3.1.1 Phases of HLS

The high level synthesis used to describe the design of the IP is untimed unlike the

Verilog/VHDL designs. Thus, when the HLS tool translates high level code into

Verilog/VHDL code, it goes through a number of stages before creating the RTL.

a. Scheduling – This stage determines and schedules execution of instructions

that are to be implemented.

b. Binding – This stage is for assigning the resources available on the device

once the scheduling phase is completed.

c. Control Logic Extraction – The control logic extraction phase is used to

extract the control signals and create control structures like creating state

machines to exert control over the behaviour of the model

Figure 6 HLS Scheduling and HLS Binding

10

Figure 7 HLS Control Logic Extraction Phase

While Performing the C synthesis, the HLS tool trades off between the performance

and logic resources availability which results in the performance of the IP core

implemented.

11

CHAPTER 4

Zybo Z7-20: Zynq-7000 ARM/FPGA SoC Development Board

4.1 Features of Zybo Z7-20

In this project, the Edge detection Algorithm is implemented on the Zybo Z7-20

board. The Zybo Z7 has a rich arrangement of media and network peripherals that can

be used to implement embedded vision algorithms like Edge detection, Object

tracking, etc. The Zybo Z7 is based on the Xilinx APSoC architecture which includes

a dual-core ARM Cortex-A9 processor with Xilinx 7-series FPGAs.

Figure 8 Zybo Z7-20

One of the features of the Zybo board used in this project is the HDMI input and

Output which enables attachment of a HDMI camera. The other key features of the

Zybo board include the following:

Figure 9 Key Features

12

4.2 Xilinx APSoC Architecture

The APSoC is a special kind of FPGA in which the ARM based processors as well as

the reconfigurable hardware on a chip is included. The block diagram of APSoC

architecture is given below. The architecture shows the PS sections marked in green

and PL section marked in yellow.

Figure 10 Block Diagram of Xilinx APSoC architecture

13

The APSoC features include the following

a. Processor Side: Dual core ARM Cortex – A9 MPCore processing system with

integrated memory controllers and peripherals. The ARM processor is

completely independent from the programmable logic

b. Programmable Logic: The firmly incorporated programmable rationale is used

to extend the processing system. It also provides high performance ARM AXI

interfaces. The PL side provides scalable density and performance.

c. Flexible Array of Input/Output: The APSoC provides wide scope of external

multi-standard I/O interfaces. It also provides HDMI IN and OUT that are

used in this project for real-time video streaming. The APSoC also has an elite

incorporated serial transceivers as well as Analog-to-Digital inputs.

14

CHAPTER 5

Image Processing Platform on Zybo Z7-20

The image processing platform is set-up on a Zybo-Z7 20 all programmable SOC that

enables HDMI input to output which is used as a base for HLS-based edge detection.

In addition to the Zybo the following is used in this project:

1. HDMI camera (Apeman 1080P action camera)

2. Cables for the HDMI input and output ports

3. HDMI monitor

The image streaming application is created using the following:

1. Vivado 2017.4

2. Xilinx SDK 2017.4

3. Digilent Vivado library (Download and Extract)

5.1 Components in the Block Diagram

In the Vivado block diagram we add the following IPs:

Zynq Processing System: The processing system is used to configure and

provide control signals for the whole image processing system. The DDR

of the processor is used as a DRAM buffer. The following configurations

are explicitly ensured:

a. PL clock 0 = 200 MHz

b. PL clock 1 = 100 MHz

15

c. HP 0 Slave enabled – The slave is used to transfer images to and from

the PS DDR

d. GP 0 Master enabled – This feature is used to configure and control

the flow of data in the whole image processing chain

Figure 11 Zynq Processing System

16

DVI2RGB:

a. This IP converts the HDMI image/video stream into a 24-bit RGB

format with the appropriate Hsync and Vsync.

b. The resolution is set 1280x720

Figure 12 DVI2RGB

17

AXI GPIO:

a. This IP provides universally useful input/output interface to AXI4-lite

interface.

b. The AXI GPIO is configured as a single output channel and the

reconfigurable width of the channel is set to 1.

c. We ensure that the AXI GPIO has a single output to ensure hot plug

detect from the HDMI source. Failure to assert this means that no

image/video is received from the camera module

Figure 13 AXI GPIO

18

Video Timing Controller:

a. The video timing controller IP is generally used with the input video

to AXI4-stream IP core that helps in detecting the format and timing of

the incoming video signal or with the AXI4-stream IP to video out to

create outgoing video timing for downstream sinks.

b. Here we configure the core to act as detector, which is used to detect

the mode of the incoming HDMI camera module.

Figure 14 Video Timing Controller (Detector)

19

Video In to AXIS:

a. This IP core converts and synchronises the parallel video into an AXI

stream.

b. The IP configured to have an independent clock so that the pixel clock

and AXI clock are different.

c. The TDATA signal collects image data, the TUSER signal identifies

the start of frame and the TLAST signal identifies the end of the line.

Figure 15 Video In to AXIS

20

AXIS Subset Converter:

a. This IP provides re-map functionality using which the

TDATA/TUSER can be re-mapped into the correct RGB format.

b. The TDATA width is configured to 3 bytes

c. Here we use two of this IP core which is before and after the VDMA

Figure 16 AXI-4 Stream Subset Converter

21

Video Direct Memory:

a. This IP core is used to store the AXI stream video into AXI memory

mapped form for storage in the PS DDR memory.

b. For the output, the read channel accesses the PS DDR to convert the

AXI memory mapped into AXI stream.

c. Because we need to store as well as stream the video for the output, we

ensure that both the read and write channels are enabled.

Figure 17 VDMA

22

Video Timing Controller:

a. This IP is used as a timing source, which is used by the AXIS to video

output to generate the output parallel video and syncs.

b. The timing is dependent upon the input video timing.

Figure 18 Video Timing Controller (Generator)

23

RGB2DVI:

a. This IP converts the vertical/horizontal syncs along with the output

parallel video into HDMI format, for display purposes using an HDMI

monitor.

Figure 19 RGB2DVI

Dynamic Clock Generator:

a. To configure the output clocks dynamically, the dynamic clock

generator is used.

b. This enables the pixel clock frequency to be changed over using AXI-

lite depending on the video format received.

24

CHAPTER 6

Sobel Filter IP Creation Using Vivado HLS

The Sobel edge detection IP is created using HLS functions, which helps us in

skipping out the traditional RTL way of implementation on the FPGA. In case of the

RTL approach, it is required to create line buffers for the convolution filters and then

implement the magnitude calculation whereas, in case of High Level Synthesis, the

HLS tool will take care of the lower level RTL implementation.

6.1 Implementation

The IP is implemented using Vivado HLS and its HLS_OpenCV and HLS_Video

libraries that increase the levels of abstraction. To leverage the OpenCV framework

which is popular for implementing image processing algorithms, the HLS_OpenCV

library is used.

Some of the functions of the HLS_Video library used in this project are:

a. HLS::CvtColor – “This converts the color scheme between color and gray

scale depending upon the configuration”

b. HLS::Gaussian – “The Gaussian blurring is applied to the input image for

noise removal and image smoothening.”

c. HLS::Sobel – “Depending on the configuration, this function can be used to

apply Sobel convolution in either horizontal or vertical directions. Here two

implementations of this IP core are used as we need the convolution in both

the directions for the magnitude calculation.”

25

d. HLS::AddWeighted – “This IP core allows us to calculate the resulting

magnitude calculation using the results of the Sobel operator in both the

directions.”

6.2 Interfacing

The AXI stream is used to move image data within the programmable logic enabling

the creation of a high performance image processing path required for easy addition

and creation of elements.

As the Sobel IP has to be able to accept AXI stream input and generate its output in

the same format, below functions from Vivado HLS library is used.

a. HLS::AXIvideo2Mat – “Converts an AXI stream image/video into HLS::Mat

format on which further image processing algorithms can be implemented.”

b. HLS::Mat2AXIvideo – “Converts the image from HLS::Mat format to AXI

stream format, which is used for displaying the result of edge detection.”

6.3 C Synthesis and Optimizations

For optimization, the Dataflow #pragma is used in the C code to achieve the highest

possible frame rate.

The Sobel operation in both the directions is performed parallel, so that the use of

dataflow pragma ensures coarse grain pipelining.

26

Figure 20 Dataflow Pipelining

To apply the pipelining optimization the result of the Gaussian blurring is split into

two parallel paths and then recombined using the HLS::AddWeighted function to

calculate the magnitude of the gradient. The splitting is done using the

HLS::Duplicate that duplicates the Gaussian blurred image in two separate output

images that can be processed in parallel.

After the High level code is ready, the C simulation and synthesis is performed which

produces the output image in which the edges are extracted from the input image.

The high level code for implementing the Sobel IP core along with the video

streaming code to display the result of edge detection is given in the appendix. It

includes the header file, test-bench and the Sobel filter function code.

27

CHAPTER 7

SOFTWARE APPLICATION

After the block diagram is ready, the bitstream generated is built and exported to

Xilinx SDK to create the video streaming application. The software application

written in C will do the following:

a. The GPIO which is connected to the HDMI IN hot plug detect is asserted. The

HDMI source generates the video after a wait time of about 5 seconds, when

this signal is asserted

b. The Video Timing Controller acting as the detector is configured to report the

incoming video mode.

c. The Video Timing Controller acting as the generator is configured to generate

the timing in accordance with the video mode detected.

d. The dynamic clock is configured.

e. The VDMA is configured to have read/write access from the PS DDR.

The software application required for video streaming is given in the appendix.

28

CHAPTER 8

RESULTS AND ANALYSIS

8.1 C and Co Simulation Results

The input image provided for C and Co Simulation is:

Figure 21 Input Image for C and Co Simulation

The resulting output image with the edges extracted is:

Figure 22 Result of C and Co Simulation

29

8.2 Resource Utilization

Before exporting to HLS IP to the Vivado IP repository, the resources utilized by the

IP are analysed to ensure that the IP can be implemented on the Zybo board.

The analysis also confirms that the two Sobel operators used in the IP design using

dataflow pragma are working in parallel.

Figure 23 Parallel Sobel Operations

The summary of the performance estimates of the HLS IP are as below:

Figure 24 Performance Estimates

30

The summary of the resource utilisation of the HLS IP is as follows:

Figure 25 Resource Utilization

From the estimates above, it is clear that the resource utilized to implement the HLS

IP is less than the total available resources on the Zybo Z7 20 board.

After exporting the IP, a zip file containing all the necessary information required to

add the newly created Sobel IP is generated.

This file is added to the user IP repository in Vivado 2017.4 to integrate the HLS IP

with the image processing platform and the edge detection in real time is

implemented.

31

CHAPTER 9

CONCLUSION

Edge detection is almost everywhere when it comes to computer vision solutions like

ADAS, consumer products, retail etc. Designing these solutions using pure Hardware

Description Language is not easy because of the constant algorithm upgrades, which

means the RTL team involved in creation and verification flow will have to start over

which is highly undesirable and leads to delays in the production schedule. The only

solution to this is to adopt High-Level Synthesis design flow because high level

coding is easier to maintain, modify and re-use.

In this project an example of how to implement edge detection is successfully

demonstrated by integrating an Apeman HD camera with the very powerful Zynq

device which is used extensively in the field of image processing.

Although the objective of the project is accomplished, there is a lot of scope for

improvement because the Zynq can be used to implement more complex image

processing applications like Lane detection, Object tracking, etc., which I intend to

implement in the future.

32

Bibliography

[1] Adam Taylor (2018, May 11). “Creating a Zynq or FPGA-Based, Image

Processing Platform”.

https://www.hackster.io/adam-taylor/creating-a-zynq-or-fpga-based-image-

processing-platform-e79394

[2] “Digital Image Processing”. Retrieved from Wikipedia

https://en.wikipedia.org/wiki/Digital_image_processing

[3] Processing System 7 v5.5 LogiCORE IP Product Guide

https://www.xilinx.com/support/documentation/ip_documentation/processing_system

7/v5_5/pg082-processing-system7.pdf

[4] AXI Video Direct Memory Access v6.3 LogiCORE IP Product Guide

https://www.xilinx.com/support/documentation/ip_documentation/axi_vdma/v6_3/pg

020_axi_vdma.pdf

[5] Video Timing Controller v6.1 LogiCORE IP Product Guide.

https://www.xilinx.com/support/documentation/ip_documentation/v_tc/v6_1/pg016_v

_tc.pdf

[6] Jeff Johnson (2014, August 6). “Using the AXI DMA in Vivado”

http://www.fpgadeveloper.com/2014/08/using-the-axi-dma-in-vivado.html

[7] Zynq-7000, Technical Reference Manual

https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-

TRM.pdf

[8] XAPP1167, Accelerating OpenCV Applications with Zynq-7000 All

Programmable SoC using Vivado HLS Video Libraries

https://www.xilinx.com/support/documentation/application_notes/xapp1167.pdf

33

[9] XAPP890, Zynq All Programmable SoC Sobel Filter Implementation Using the

Vivado HLS Tool

https://www.xilinx.com/support/documentation/application_notes/xapp890-zynq-

sobel-vivado-hls.pdf

[10] Vivado Design Suit User Guide, High Level Synthesis

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2014_1/ug902-

vivado-high-level-synthesis.pdf

[11] Samta Gupta, Susmita Ghosh Mazumdar (2013, February 2). “Sobel Edge

Detection Algorithm”

https://pdfs.semanticscholar.org/6bca/fdf33445585966ee6fb3371dd1ce15241a62.pdf

34

APPENDIX

Camera used for HDMI input to the Zybo:

Figure 26 Apeman Action Camera

Some of the features of the camera include:

a. Resolution: 1280 x 720

b. 14MP 1080P full HD

c. 2-inch LCD display

35

High-Level Synthesis source codes:

Filename: cvt_colour.cpp

/************************************************************************** * Engineer: Pavan Kumar Murthy * Create Date: 03/14/2019 * File Name: cvt_colour.cpp * Description: This file is used apply sobel operator on the input image **************************************************************************/ #include "cvt_colour.h" void image_filter(STRM_AXI& STRM_IN, STRM_AXI& STRM_OUT) { #pragma HLS INTERFACE axis port=STRM_IN #pragma HLS INTERFACE axis port=STRM_OUT EDGE_RGB edge_0(EDGE_HHT, EDGE_WTH); EDGE_GRY edge_1(EDGE_HHT, EDGE_WTH); EDGE_GRY edge_2(EDGE_HHT, EDGE_WTH); EDGE_GRY edge_2a(EDGE_HHT, EDGE_WTH); EDGE_GRY edge_2b(EDGE_HHT, EDGE_WTH); EDGE_GRY edge_3(EDGE_HHT, EDGE_WTH); EDGE_GRY edge_4(EDGE_HHT, EDGE_WTH); EDGE_GRY edge_5(EDGE_HHT, EDGE_WTH); EDGE_RGB edge_6(EDGE_HHT, EDGE_WTH); ; #pragma HLS dataflow hls::AXIvideo2Mat(STRM_IN, edge_0); hls::CvtColor<HLS_BGR2GRAY>(edge_0, edge_1); hls::GaussianBlur<3,3>(edge_1,edge_2); hls::Duplicate(edge_2,edge_2a,edge_2b); hls::Sobel<1,0,3>(edge_2a, edge_3); hls::Sobel<0,1,3>(edge_2b, edge_4); hls::AddWeighted(edge_4,0.5,edge_3,0.5,0.0,edge_5); hls::CvtColor<HLS_GRAY2RGB>(edge_5, edge_6); hls::Mat2AXIvideo(edge_6, STRM_OUT); }

36

Filename: cvt_colour.h

/************************************************************************** * Engineer: Pavan Kumar Murthy * Create Date: 03/14/2019 * File Name: cvt_colour.h * Description: This file is used as header for cvt_colour.cpp **************************************************************************/ #include "hls_video.h" #include <ap_fixed.h> #define EDGE_WTH 1280 #define EDGE_HHT 720 typedef hls::stream<ap_axiu<24,1,1,1> > STRM_AXI; typedef hls::Mat<EDGE_HHT, EDGE_WTH, HLS_8UC3> EDGE_RGB; typedef hls::Mat<EDGE_HHT, EDGE_WTH, HLS_8UC1> EDGE_GRY; void image_filter(STRM_AXI& STRM_IN, STRM_AXI& STRM_OUT);

37

Filename: cvt_colour_tb.cpp

/************************************************************************** * Engineer: Pavan Kumar Murthy * Create Date: 03/14/2019 * File Name: cvt_colour.h * Description: This file is used as a testbench for cvt_colour.cpp **************************************************************************/ #include <hls_opencv.h> #include "cvt_colour.h" #include <iostream> using namespace std; int main (int argc, char** argv) { IplImage* srceImg; IplImage* dstnImg; STRM_AXI srceAxiStrm, dstnAxiStrm; srceImg = cvLoadImage("sample.bmp"); dstnImg = cvCreateImage(cvGetSize(srceImg), srceImg->depth, srceImg->nChannels); IplImage2AXIvideo(srceImg, srceAxiStrm); image_filter(srceAxiStrm, dstnAxiStrm); AXIvideo2IplImage(dstnAxiStrm, dstnImg); cvSaveImage("edge.bmp", dstnImg); cvReleaseImage(&srceImg); cvReleaseImage(&dstnImg); }

38

SDK software application:

Filename: helloworld.c

/************************************************************************ * Engineer: Pavan Kumar Murthy * Create Date: 03/21/2019 * File name: helloworld.c * Description: This file is used to stream real-time edge detection vide ************************************************************************/ #include <stdio.h> #include "platform.h" #include "xil_printf.h" #include "xparameters.h" #include "xgpio.h" #include "xvtc.h" #include "xaxivdma.h" #include "xaxivdma_i.h" #include "vga_modes.h" #include "dynclk.h" #define DEMO_IMG_FRAME (720*1280) #define DEMO_IMG_STRIDE (1280*3) #define DISPLAY_IMG_FRAMES 3 XVtc VtcInst0,VtcInst1; XVtc_Config *vtc_config0,*vtc_config1 ; XGpio hpdIn; XAxiVdma vdma; XAxiVdma_DmaSetup vdma_DMA; XAxiVdma_Config *vdma_Config; ClkConfig clk_Reg; ClkMode clk_Mode; u32 BufFrm[DISPLAY_IMG_FRAMES][DEMO_IMG_FRAME]; u32 *FrmsP[DISPLAY_IMG_FRAMES]; VideoMode vid; int main() { init_platform(); disable_caches(); XVtc_Timing vtc_Timing; XVtc_SourceSelect SelSrc; int op_status; u16 res; vtc_config = XVtc_LookupConfig(XPAR_VTC_0_DEVICE_ID); XVtc_CfgInitialize(&VtcInst, vtc_config, vtc_config->BaseAddress); vtc_config2 = XVtc_LookupConfig(XPAR_VTC_1_DEVICE_ID); XVtc_CfgInitialize(&VtcInst2, vtc_config2, vtc_config2>BaseAddress); //asserting the Hot Plug Detect XGpio_Initialize(&hpdIn, XPAR_AXI_GPIO_0_DEVICE_ID); XGpio_DiscreteWrite(&hpdIn,1,0x1); sleep(20);

39

XGpio_DiscreteWrite(&hpdIn,2,0x1); ///needs time here vid = VMODE_1280x720; vtc_Timing.HActiveVideo = vid.width; vtc_Timing.HFrontPorch = vid.hps - video.width; vtc_Timing.HSyncWidth = vid.hpe - video.hps; vtc_Timing.HBackPorch = vid.hmax - video.hpe + 1; vtc_Timing.HSyncPolarity = vid.hpol; vtc_Timing.VActiveVideo = vid.height; vtc_Timing.V0FrontPorch = vid.vps - vid.height; vtc_Timing.V0SyncWidth = vid.vpe - vid.vps; vtc_Timing.V0BackPorch = vid.vmax - vid.vpe + 1; vtc_Timing.V1FrontPorch = vid.vps - vid.height; vtc_Timing.V1SyncWidth = vid.vpe - vid.vps; vtc_Timing.V1BackPorch = vid.vmax - vid.vpe + 1; vtc_Timing.VSyncPolarity = vid.vpol; vtc_Timing.Interlaced = 0; memset((void *)&SourceSelect, 0, sizeof(SourceSelect)); SelSrc.VBlankPolSrc = 1; SelSrc.VSyncPolSrc = 1; SelSrc.HBlankPolSrc = 1; SelSrc.HSyncPolSrc = 1; SelSrc.ActiveVideoPolSrc = 1; SelSrc.ActiveChromaPolSrc= 1; SelSrc.VChromaSrc = 1; SelSrc.VActiveSrc = 1; SelSrc.VBackPorchSrc = 1; SelSrc.VSyncSrc = 1; SelSrc.VFrontPorchSrc = 1; SelSrc.VTotalSrc = 1; SelSrc.HActiveSrc = 1; SelSrc.HBackPorchSrc = 1; SelSrc.HSyncSrc = 1; SelSrc.HFrontPorchSrc = 1; SelSrc.HTotalSrc = 1; XVtc_RegUpdateEnable(&VtcInst2); XVtc_SetGeneratorTiming(&VtcInst2, &vtcTiming); XVtc_SetSource(&VtcInst2, &SelSrc); XVtc_EnableGenerator(&VtcInst2); XVtc_Enable(&VtcInst2); XVtc_EnableDetector(&VtcInst); XVtc_Enable(&VtcInst); xil_printf("Video Mode = %i ", result); xil_printf("\n\r"); for (int g = 0; g < 3; g++) { FrmsP[g] = BufFrm[g]; } Vdma_Config = XAxiVdma_LookupConfig(XPAR_AXIVDMA_0_DEVICE_ID); XAxiVdma_CfgInitialize(&vdma, vdmaConfig, vdmaConfig->BaseAddress); vid = VMODE_1280x720; ClkFindParams(vid.freq, &clk_Mode); ClkFindReg(&clk_Reg, &clk_Mode); ClkWriteReg(&clk_Reg, 0x43C20000);

40

ClkStop(0x43C20000); ClkStart(0x43C20000); vdma_DMA.FrameDelay = 0; vdma_DMA.EnableCircularBuf = 1; vdma_DMA.EnableSync = 0; vdma_DMA.PointNum = 0; vdma_DMA.EnableFrameCounter = 0; vdma_DMA.VertSizeInput = vid.height; vdma_DMA.HoriSizeInput = (vid.width)*3; vdma_DMA.FixedFrameStoreAddr = 0; vdma_DMA.FrameStoreStartAddr[0] = (u32) FrmsP[0]; vdma_DMA.Stride = (vid.width)*3; XAxiVdma_DmaConfig(&vdma, XAXIVDMA_WRITE, &(vdma_DMA)); op_status = XAxiVdma_DmaSetBufferAddr(&vdma, XAXIVDMA_WRITE,vdma_DMA.FrameStoreStartAddr); op_status = XAxiVdma_DmaStart(&vdma, XAXIVDMA_WRITE); op_status = XAxiVdma_StartParking(&vdma, 0, XAXIVDMA_WRITE); XAxiVdma_DmaConfig(&vdma, XAXIVDMA_READ, &(vdma_DMA)); XAxiVdma_DmaSetBufferAddr(&vdma, XAXIVDMA_READ,vdma_DMA.FrameStoreStartAddr); XAxiVdma_DmaStart(&vdma, XAXIVDMA_READ); XAxiVdma_StartParking(&vdma, 0, XAXIVDMA_READ); while(1) { } return 0; }

41

Block Diagram of the Image Processing Platform:

Part 1:

Figure 27 Image Processing Platform (Part 1)

42

Part 2:

Figure 28 Image Processing Platform (Part 2)

43

Part 3:

Figure 29 Image Processing Platform (Part 3)

44

Complete Block Diagram with the HLS IP Integrated:

Part 1:

Figure 30 HLS IP Integrated within the Image Processing Platform (Part 1)

45

Part 2:

Figure 31 HLS IP Integrated within the Image Processing Platform (Part 2)

46

Part 3:

Figure 32 HLS IP Integrated within the Image Processing Platform (Part 3)

47

Part 4:

Figure 33 HLS IP Integrated within the Image Processing Platform (Part 4)

48

Part 5:

Figure 34 HLS IP Integrated within the Image Processing Platform (Part 5)