학습목표 - koreatech사용하지않는architecture 4를이용하여compile 해야한다. ......

Post on 18-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

학습목표

• HW/SW codesign에 대한 기본 개념을 실습을 통해 익힌다.

• ARM ADS를 이용하여 JPEG 예제를 수행하고 분석해본다. 이를통해 HW 수행에 적합한 task가 어떤 것인지 찾아본다.

• HDL을 이용하여 직접 HW 블록을 설계해본다.

• HDL을 이용하여 직접 버스와 HW 블록을 연결하는 wrapper를설계해본다.

• 직접 설계한 모듈을 ModelSim을 이용해 simulation해 보고, 사용법을 익힌다.

• 간단한 SoC를 구성하기 위해 여러 모듈들을 integration하고실제 simulation해 봄으로써, codesign에 따른 성능 향상을확인한다.

강의자료 활용 Guideline

• JPEG 이외에 다른 여러 가지 예제들을 이용하는 실습을 하는 것도 좋다.

• 프로파일의 결과에 따라 꼭 computation intensive한 task를 HW로 보낼 필요는 없다. 상황에 따라, 즉 HW module의 reuse나 설계 용이성 등에 따라, HW로 보낼 task를 설계자가 결정할 수 있다.

• 통신 방식은 polling, interrupt 등 다양하게 사용될 수 있다.

• 통신의 양에 따라서 때로는 codesign한 결과가 처음보다 나쁘게나올 수 있으므로 double buffering, task level pipelining 등 성능 향상을 위한 다양한 기법을 적용하는 것도 좋다.

• Simulation 도구의 버전에 따라서 다른 결과를 나타낼 수도 있다. ModelSim은 6.2a를 ADS는 1.2를 사용할 것을 권장한다. 만약arm compiler가 RealView로 되어있을 경우에는 thumb 모드를사용하지 않는 architecture 4를 이용하여 compile 해야 한다.

Contents

1. Overview

2. ADS를 이용한 JPEG encoder profiling

3. HDL을 이용한 HW IP 설계 (VHDL or verilog)

4. ModelSim을 이용한 RTL simulation 실습

5. AHB slave 인터페이스를 만족하는 wrapper 설계

6. Codesign에 따른 성능비교

Overview

• Final architecture (Simple SoC)

– Codesign에 따른 성능비교• ARM7 프로세서만을 이용한 JPEG encoder 수행• ARM7 프로세서와 Quantize, Zigzag HW 블록을 이용한 JPEG encoder 수행

ARM7tdmi-s

Quantize& Zigzag(HW IP)

ROM RAM

wrapper

UARTGPIO

AMBA AHB bus

Contents

1. Overview

2. ADS를 이용한 JPEG encoder profiling

3. HDL을 이용한 HW IP 설계 (VHDL or verilog)

4. ModelSim을 이용한 RTL simulation 실습

5. AHB slave 인터페이스를 만족하는 wrapper 설계

6. Codesign에 따른 성능비교

About the ADS

• ADS consists of a suite of applications, together with supporting documentation and examples, that enable you to write and debug applications for the ARM family of RISC processors.

• You can use ADS to develop, build, and debug C, C++, or ARM assembly language programs

Components of ADS

• Command-line development tools

– armcc, armcpp, armasm, armlink, armsd

• GUI development tools

– AXD, CodeWarrior IDE

• Utilities

– fromELF, armprof

• Supporting software.

– ARMulator

• Documentation

– http://www.arm.com

Objective

• Using ADS tool

• Profiling JPEG encoder code

– Our JPEG encoder 구성

ReadBmp ChenDCT Quantize Zigzag HuffEncode

“bmp” image source “jpg” compressed image

About JPEG encoder

Bmp image

88 Preshift

DCT

Bound

Quantize

Zigzag

Encoding

8

8jpg format

Using Code Warrior (1)

• STEP 1

– new project

• file->new project

Using Code Warrior (2)

• STEP 2

– Add Files

• Copy jpeg.c file to project folder

• Add jpeg.c (mouse right click on the project background)

Using Code Warrior (3)

• STEP 3

– DebugRel setting

Using Code Warrior (4)

• STEP 4

– Target settings

Using Code Warrior (5)

• STEP 5

– ARM Assembler setting

Using Code Warrior (6)

• STEP 6

– Arm C compiler setting

Using Code Warrior (7)

• STEP 7

– ARM C compiler optimization level setting

Using Code Warrior (8)

• STEP 8

– Other compiler setting

• ARM C++ compiler, Thumb C compiler, Thumb C++ compiler

Using Code Warrior (9)

• STEP 9

– ARM Linker Setting (output)

Using Code Warrior (10)

• STEP 10

– MAKE ALL

– Launch AXD debugger

Using AXD (1)

• STEP 11

– Load Image with profiling on

Using AXD (2)

• STEP 12

– Profiling setting

Using AXD (3)

• STEP 13

– Run simulation until “_sys_exit”

– Write profiling data

Using AXD (4)

• STEP 14

– Run “armprof jpeg.prf > jpeg_profile_result.txt” command

Contents

1. Overview

2. ADS를 이용한 JPEG encoder profiling

3. HDL을 이용한 HW IP 설계 (VHDL or verilog)

4. ModelSim을 이용한 RTL simulation 실습

5. AHB slave 인터페이스를 만족하는 wrapper 설계

6. Codesign에 따른 성능비교

HW IP design using HDL

• Design the HW IP in HDL (VHDL)

– Modify reference code– Design quantize block– Design zigzag block

Modify SW code (1)

• Reference C code

Quantize(int *matrix, int *qmatrix){

int i, m, q;for (i=0; i<BLOCKSIZE; i++){

m = matrix[i]; q = qmatrix[i];if (m>0)

matrix[i] = (m+q/2)/q;else

matrix[i] = (m-q/2)/q;}

}

Designing divider? It’s too slow and hard!

Other solution Designing shifter

Modify SW code (2)

• Replace divider with shifter

Quantize(int *matrix, int *qmatrix){

int i, m, q;for (i=0; i<BLOCKSIZE; i++){

m = matrix[i]; q = qmatrix[i];d = qdiv[i];if (m>0)

matrix[i] = (m+(q>>1))>>delse

matrix[i] = -((-m+(q>>1))>>d);}

}

Modify SW code (3)

• Quantization matrix Shift adapted Quantization matrix

#define BLOCKSIZE 64//#define QuantizationMatrix LuminanceQuantization#define QuantizationMatrix ShitfAdaptedQuantization

Modify SW code (4)

• Launch AXD with modified code

Design the Quantize and Zigzag block in HDL

• Design the Quantize and Zigzag block in HDL

HW_IP

godone

inData(31:0)outData(31:0)

<Pseudo Code>

If (go=‘1’) thenDoBehavior()done <= ‘1’;

end if;

DoBehavior(){

Read data from inData memoryDo quantize with shifterDo zigzagWrite data to outData memory

}

writeEnableAddr(6:0)

resetclock

Contents

1. Overview

2. ADS를 이용한 JPEG encoder profiling

3. HDL을 이용한 HW IP 설계 (VHDL or verilog)

4. ModelSim을 이용한 RTL simulation 실습

5. AHB slave 인터페이스를 만족하는 wrapper 설계

6. Codesign에 따른 성능비교

Using ModelSim

• STEP 1

– File->New->Project

Using ModelSim

• STEP 2

– Add source codes to the project

Using ModelSim

• STEP 3

– Complete the quantize.vhd file

Using ModelSim

• STEP 4

– Compile all the HDL codes in order.

Using ModelSim

• STEP 5

– Simulate the tb (provided testbench) and verify the module

Using ModelSim

• STEP 6

– Add the signal to wave view to debugging

Using ModelSim

• STEP 7

– Run the simulation

Using ModelSim

• STEP 8

– Verify that the code runs correctly

Contents

1. Overview

2. ADS를 이용한 JPEG encoder profiling

3. HDL을 이용한 HW IP 설계 (VHDL or verilog)

4. ModelSim을 이용한 RTL simulation 실습

5. AHB slave 인터페이스를 만족하는 wrapper 설계

6. Codesign에 따른 성능비교

AHB Slave Interface

AHB Decoder

AHB Timing

Response Type

HRESP[1:0] Response Description00 OKAY Transaction Completed

01 ERROR Error Occurs

10 RETRY Transaction Not Completed

Master Must Retry

11 SPLIT Transaction Not Completed

Master Must Retry

Slave Informs Completion

ERROR/RETRY/SPLIT: two cycle response

Burst Mode

HBURST[2:0] Type Description000 SINGLE Single transfer

001 INCR Incrementing burst of unspecified length

010 WRAP4 4-beat wrapping burst

100 WRAP8 8-beat wrapping burst

011 INCR4 4-beat incrementing burst

101 INCR8 8-beat incrementing burst

110 WRAP16 16-beat wrapping burst

111 INCR16 16-beat incrementing burst

Transfer Type

HTRANS[1:0] Type Description00 IDLE No data transfer required

Requires zero wait state OKAY response

01 BUSY Same as IDLE between burst transfers

Address/Control unrelated previous

10 NONSEQ Single transfer or the first of a burst

to the previous transfer

11 SEQ Remaining transfers in a burst

Address/control related

to the previous transfer

Transfer Size

HSIZE[2:0] Size Description000 8 bits Byte

001 16 bits Halfword

010 32 bits Word

100 128 bits 4-word line

011 64 bits -

101 256 bits 8-word line

110 512 bits -

111 1024 bits -

AHB HW wrapper design

• AMBA AHB의 slave interface와 quantize block 사이의 wrapper 설계

go

AHBwrapper

Quantizeblock

done

writeEnableaddr(6:0)

inData(31:0)outData(31:0)

AHB slave interface

Wrapper diagram for AHB slave interface

D Q

En

hselhwritehtranshaddr

write

go

done

address

wdata

haddr(9:8)

rdata

haddr(8:2)

hsel and htrans(1) and hwrite and not(haddr(9))

hready

hwdata

hrdata

Contents

1. Overview

2. ADS를 이용한 JPEG encoder profiling

3. HDL을 이용한 HW IP 설계 (VHDL or verilog)

4. ModelSim을 이용한 RTL simulation 실습

5. AHB slave 인터페이스를 만족하는 wrapper 설계

6. Codesign에 따른 성능비교

HW/SW codesign

• 설계된 wrapper와 HW block을 프로세서에 연결하여 통합 검증

– 주어지는 ARM7tdmi-s 프로세서 모델을 사용하여 HW/SW Co-Simulation을 수행

• Goal : Compare two models

– Pure SW JPEG

– Modified SW JPEG and HW module (Quantize and Zigzag)

HW/SW codesign

• Our simple SoC platform

ARM7tdmi-s

Quantize& Zigzag(HW IP)

ROM RAM

wrapper

UARTGPIO

AMBA AHB bus

Module name ID Base memory map

ROM (0)

(1)

UART (2) 0x20000000

(3)

HWIP (4) 0x40000000

0x00000000

RAM 0x10000000

GPIO 0x30000000

Target I – SW only

• AXD simulation

– Do ‘make’ command with given ‘jpeg.c’ file• Directory $SRCS/testDALRisc/sw

• ‘main’ function is replaced with ‘C_Entry’ function– ‘boot.s’ contains initialize and jump instruction to ‘C_Entry’

– Launch AXD with compiled ‘jpeg.axf’ file

Target I – SW only

• ModelSim simulation

– Steps• open project “dalrisc.mpf” (directory $SRCS/testDALRisc/modelsim)

• vsim work.tb

• mem load –i D:/works/dalrisc/sw/jpeg.mem /tb/uut/rom/memarr

• add waves

• run 104 ms

Target I – SW only

• Dump memory

– Steps

• export memory (GUI command)

– Final result will be stored on variable ‘ofp’

– The address range of ‘ofp’ 0x98b ~ 0xb09

• ./hex2bin dumped.mem dumped.jpg

– This command converts dumped hex data to jpeg file.

Target II – HW/SW codesign

• Modify JPEG SW code

– Synchronization method : polling

extern void waitFlag(int *flag);

HWPollingQuantizeNZigzag(int *input, int *output){

int *indata = (int*)0x40000000;int *outData = (int*)0x40000100;int *go = (int*)0x40000200;int *done = (int*)0x40000300;int i;

for (i=0; i<BLOCKSIZE; i++)*(inData+i) = input[i];

*go = 1;waitFlag(done);for (i=0; i<BLOCKSIZE; i++)

output[i] = *(outData+i);}

Target II – HW/SW codesign

• Compile and link modified SW code

– Cygwin make environment

• armasm –cpu arm7tdmi-s boot.s

• armasm –cpu arm7tdmi-s waitFlag.s

• armcc –cpu arm7tdmi-s -g -c –O2 jpeg_quant.c

• armlink –o jpeg_quant.axf –ro-base 0x0 –rw-base 0x10000000 –map –symbols –info Sizes –first boot.o boot.o waitFlag.o jpeg_quant.o –list jpeg_quant.info

• fromelf jpeg_quant.axf –bin –output jpeg_quant.bin

• ./bin2hex jpeg_quant.bin jpeg_quant.mem

Target II – HW/SW codesign

• ModelSim simulation

– Steps• Open project – “dalrisc.mpf”• Add your HW IP (ahbquantize.vhd, etc …)• Modify “Top_module.vhd” file

Replace “dummy” to “ahbquant” module

Target II – HW/SW codesign

• ModelSim simulation

– Steps (cont’d)

• compile uploaded codes

• vsim work.tb

• mem load –i D:/works/dalrisc/sw/jpeg_quant.mem /tb/uut/rom/memarr

• run 98 ms

• Dump memory• mem save ...

Target II – HW/SW codesign

• ModelSim simulation

– Debugging

• compile modified codes

• restart -f

• mem load –i D:/works/dalrisc/sw/jpeg_quant.mem /tb/uut/rom/memarr

• add waves that you want to see...

• break at which you want to see...

• run -all

Summary

• Analysis and compare the result

– Total SW time: 2,581,050 cycles (1 cycle = 40 ns)

– Iteration Number: 180

– Pure SW design (339,547 cycles)

– HW/SW codesign (138,420 cycles)

Module name SW time(cycles)

SW 1 iteration time (cycles)

Total communication

1 iteration communication

Quantize 250,104

149,443

01,389

830 0

0

Zigzag 0

Module name HW time(cycles)

HW 1 iteration time (cycles)

Total communication

1 iteration communication

Quantize 11,700

0

126,72065

0 0

704

Zigzag 0

top related