1 implementation in hardware of video processing algorithm performed by: yony dekell & tsion...

25
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital System Lab

Upload: damian-benson

Post on 05-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

1

Implementation in Hardware of Video Processing Algorithm

Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk

SPRING 2008

High Speed Digital System Lab

Page 2: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

2

Project Goals

Real time video signal filtering based on

nonlinear diffusion algorithm.

• Studying the algorithm of nonlinear diffusion.

• Studying the work environment of Synplify DSP.

• Implementing on FPGA, a real time video processing algorithm.

Page 3: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Non linear Diffusion Filtering

3

The nonlinear diffusion is an iterative algorithm that provides local smoothing of the picture and at the same time edges preservation.

Here you can see 3 steps along the iterative process.

Original image Step one Step two Step three

Page 4: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Project stages

• Simulink design of an existing Matlab code• Adaptation of the Simulink design to SynplifyDSP

components and constraints.• Synthesis of the VHDL code produced by SynplifyDSP

using SynplifyPro• Integration of the above RTL component within the Gidel

card architecture using Quartus II and ProcWizard• Place and route by using Quartus II• Loading RBF file to Gidel’s Procstar II card using

ProcWizard

4

Page 5: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Comparison between SynplifyDSP and direct VHDL implementationPros:

• The SynplifyDSP tool plugs into the familiar Simulink

environment.• The development is fast.

Cons:• Hard to obtain an optimal implementation (non optimal

critical path)• VHDL code that is hard to understand and therefore it is

difficult to make changes

5

Page 6: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Simulink design

6

R

G

B

Page 7: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Simulink design

7

R

G

B

Page 8: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

From Simulink to SynplifyDSP

We had to change our design because:

1) We choose not to use any buffer between the DVI connection and the processing of the input.

2) In the Simulink design we use matrices to represent images, but SynplifyDSP can only use vectors.

8

Page 9: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Image representations

• Image as matrix

• Image as vector

9

333231

232221

131211

aaa

aaa

aaa

333231232221131211 aaaaaaaaa

Page 10: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Computing derivation

10

333231232221131211 aaaaaaaaa 0 0 0

0 0 0

333231233322322131132312221121013012011 aaaaaaaaaaaaaaaaaa

false result false resulttrue result

333231232221131211 aaaaaaaaa

233322322131

132312221121

131312121111

aaaaaa

aaaaaa

aaaaaa

232221

131211

131211

aaa

aaa

aaa

333231

232221

131211

aaa

aaa

aaa

Matrix derivation

Vector derivation

Page 11: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

SynplifyDSP design

11

R

G

B

Page 12: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

R

G

B

SynplifyDSP design

12

Page 13: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

ROM component in SynplifyDSP design

• In SynplifyDSP we can’t implement the mathematical expression:

To overcome this problem we use ROM components that function as LUT.

Loading the ROM is done by creating an array.• SynplifyDSP automatically uses a LUT to

calculate the LOG function.

13

5.0

Page 14: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Fixed point precision• In Matlab and Simulink we work at full precision. • But when we implements the above design on

FPGA, we have to work with fixed point precision.

Hence we need to estimate how many bits we should use per signal, in order to get a satisfactory error.

• It appears that using 12 bits for the fraction of each signal provides satisfactory precision.

14

Page 15: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Matlab and Synplify comparison

• We measure the error between the Matlab code output and the SynplifyDSP output.

• For 1 iteration: relative root MSE = 1%

15

Matlab result Synplify result

Page 16: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

SynplifyDSP – VHDL code

16

Page 17: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Synplify Pro

17

Page 18: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Synplify Pro

18

Page 19: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Synplify Pro

Performance Summary

***************************

Worst slack in design: 13.447

Requested Estimated Requested Estimated

Starting Clock Frequency Frequency Period [ns] Period [ns] Slack

-----------------------------------------------------------------------------------------------------------------

clk 44.0 MHz 107.8 MHz 22.727 9.280 13.447

================================================================

19

• Requested Frequency – the minimal frequency we want to achieve.• Estimated Frequency – the frequency of the current design.• Requested Period – the maximal period we want to achieve for a single

cycle.• Estimated Period - single cycle time of the current design.• Slack – this is the extra time we have in single cycle. A negative value indicates that timing constraints could not be met.

Page 20: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Procwizard + Quartus

• In the ProcWizard we create the interface between the FPGA and daughter board DVI port.

• The Quartus performs the place and route according to the Procwizard interface and the SynplifyPRO node-level netlist.

20

Page 21: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Procwizard

21

Page 22: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Block Diagram

22

CLK

I2C 2

3Data

DVIReceiver

VideoInDVD

CLK

I2C 2

3Data

VideoOut

ComputerScreen

DVITransmitter

Procstar II Board

DVIDaughter

Board

Pix

elD

ata

Clo

ck

VS

YN

C

HS

YN

C

Top Level DesignC

lock

VS

YN

CHS

YN

C Pixel

Data

DVIConnector

DVIConnector

2424

Page 23: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Rates & Frequencies

• The DVI connection provides one pixel (24 bits) per clock.

• DVI frame rate is 60 frames per second.• Minimum clock frequency of DVI standard

is : 25.175 MHz• Our goal was : 43MHz (for 800 600)• Achieved frequency: 107.8 MHz • We achieved our goal by using pipeline • The bit rate is 43M 24bit 1Gbit/sec 23

Page 24: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Memory

• For 10 iteration we use 10 55KB ROMs and 3 log 0.4KB ROMs and 3

8KB ROMs.

• ROM size = 3*0.4K+3*8K+10*55K=574KB

24

5.0

2

Page 25: 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital

Time table

25

JAN

4

DEC

28

DEC

21

DEC

14

DEC

7

NOV

30

NOV

23

Date (week starting at…)

Assignment

Working on minimizing the fixed point

precision of the synplifyDSP components in the simulink implementation

Working on minimizing the ROM size

Studying the DVI protocol and fitting the

implementation for working with DVI

Planning and creating the Parallel

implementation

Final Presentation