학습목표 - koreatech사용하지않는architecture 4를이용하여compile 해야한다. ......
TRANSCRIPT
학습목표
• HW/SW codesign에 대한 기본 개념을 실습을 통해 익힌다.
• ARM ADS를 이용하여 JPEG 예제를 수행하고 분석해본다. 이를통해 HW 수행에 적합한 task가 어떤 것인지 찾아본다.
• HDL을 이용하여 직접 HW 블록을 설계해본다.
• HDL을 이용하여 직접 버스와 HW 블록을 연결하는 wrapper를설계해본다.
• 직접 설계한 모듈을 ModelSim을 이용해 simulation해 보고, 사용법을 익힌다.
• 간단한 SoC를 구성하기 위해 여러 모듈들을 integration하고실제 simulation해 봄으로써, codesign에 따른 성능 향상을확인한다.
강의자료 활용 Guideline
• JPEG 이외에 다른 여러 가지 예제들을 이용하는 실습을 하는 것도 좋다.
• 프로파일의 결과에 따라 꼭 computation intensive한 task를 HW로 보낼 필요는 없다. 상황에 따라, 즉 HW module의 reuse나 설계 용이성 등에 따라, HW로 보낼 task를 설계자가 결정할 수 있다.
• 통신 방식은 polling, interrupt 등 다양하게 사용될 수 있다.
• 통신의 양에 따라서 때로는 codesign한 결과가 처음보다 나쁘게나올 수 있으므로 double buffering, task level pipelining 등 성능 향상을 위한 다양한 기법을 적용하는 것도 좋다.
• Simulation 도구의 버전에 따라서 다른 결과를 나타낼 수도 있다. ModelSim은 6.2a를 ADS는 1.2를 사용할 것을 권장한다. 만약arm compiler가 RealView로 되어있을 경우에는 thumb 모드를사용하지 않는 architecture 4를 이용하여 compile 해야 한다.
Contents
1. Overview
2. ADS를 이용한 JPEG encoder profiling
3. HDL을 이용한 HW IP 설계 (VHDL or verilog)
4. ModelSim을 이용한 RTL simulation 실습
5. AHB slave 인터페이스를 만족하는 wrapper 설계
6. Codesign에 따른 성능비교
Overview
• Final architecture (Simple SoC)
– Codesign에 따른 성능비교• ARM7 프로세서만을 이용한 JPEG encoder 수행• ARM7 프로세서와 Quantize, Zigzag HW 블록을 이용한 JPEG encoder 수행
ARM7tdmi-s
Quantize& Zigzag(HW IP)
ROM RAM
wrapper
UARTGPIO
AMBA AHB bus
Contents
1. Overview
2. ADS를 이용한 JPEG encoder profiling
3. HDL을 이용한 HW IP 설계 (VHDL or verilog)
4. ModelSim을 이용한 RTL simulation 실습
5. AHB slave 인터페이스를 만족하는 wrapper 설계
6. Codesign에 따른 성능비교
About the ADS
• ADS consists of a suite of applications, together with supporting documentation and examples, that enable you to write and debug applications for the ARM family of RISC processors.
• You can use ADS to develop, build, and debug C, C++, or ARM assembly language programs
Components of ADS
• Command-line development tools
– armcc, armcpp, armasm, armlink, armsd
• GUI development tools
– AXD, CodeWarrior IDE
• Utilities
– fromELF, armprof
• Supporting software.
– ARMulator
• Documentation
– http://www.arm.com
Objective
• Using ADS tool
• Profiling JPEG encoder code
– Our JPEG encoder 구성
ReadBmp ChenDCT Quantize Zigzag HuffEncode
“bmp” image source “jpg” compressed image
About JPEG encoder
Bmp image
88 Preshift
DCT
Bound
Quantize
Zigzag
Encoding
8
8jpg format
Using Code Warrior (1)
• STEP 1
– new project
• file->new project
Using Code Warrior (2)
• STEP 2
– Add Files
• Copy jpeg.c file to project folder
• Add jpeg.c (mouse right click on the project background)
Using Code Warrior (3)
• STEP 3
– DebugRel setting
Using Code Warrior (4)
• STEP 4
– Target settings
Using Code Warrior (5)
• STEP 5
– ARM Assembler setting
Using Code Warrior (6)
• STEP 6
– Arm C compiler setting
Using Code Warrior (7)
• STEP 7
– ARM C compiler optimization level setting
Using Code Warrior (8)
• STEP 8
– Other compiler setting
• ARM C++ compiler, Thumb C compiler, Thumb C++ compiler
Using Code Warrior (9)
• STEP 9
– ARM Linker Setting (output)
Using Code Warrior (10)
• STEP 10
– MAKE ALL
– Launch AXD debugger
Using AXD (1)
• STEP 11
– Load Image with profiling on
Using AXD (2)
• STEP 12
– Profiling setting
Using AXD (3)
• STEP 13
– Run simulation until “_sys_exit”
– Write profiling data
Using AXD (4)
• STEP 14
– Run “armprof jpeg.prf > jpeg_profile_result.txt” command
Contents
1. Overview
2. ADS를 이용한 JPEG encoder profiling
3. HDL을 이용한 HW IP 설계 (VHDL or verilog)
4. ModelSim을 이용한 RTL simulation 실습
5. AHB slave 인터페이스를 만족하는 wrapper 설계
6. Codesign에 따른 성능비교
HW IP design using HDL
• Design the HW IP in HDL (VHDL)
– Modify reference code– Design quantize block– Design zigzag block
Modify SW code (1)
• Reference C code
Quantize(int *matrix, int *qmatrix){
int i, m, q;for (i=0; i<BLOCKSIZE; i++){
m = matrix[i]; q = qmatrix[i];if (m>0)
matrix[i] = (m+q/2)/q;else
matrix[i] = (m-q/2)/q;}
}
Designing divider? It’s too slow and hard!
Other solution Designing shifter
Modify SW code (2)
• Replace divider with shifter
Quantize(int *matrix, int *qmatrix){
int i, m, q;for (i=0; i<BLOCKSIZE; i++){
m = matrix[i]; q = qmatrix[i];d = qdiv[i];if (m>0)
matrix[i] = (m+(q>>1))>>delse
matrix[i] = -((-m+(q>>1))>>d);}
}
Modify SW code (3)
• Quantization matrix Shift adapted Quantization matrix
#define BLOCKSIZE 64//#define QuantizationMatrix LuminanceQuantization#define QuantizationMatrix ShitfAdaptedQuantization
Modify SW code (4)
• Launch AXD with modified code
Design the Quantize and Zigzag block in HDL
• Design the Quantize and Zigzag block in HDL
HW_IP
godone
inData(31:0)outData(31:0)
<Pseudo Code>
If (go=‘1’) thenDoBehavior()done <= ‘1’;
end if;
DoBehavior(){
Read data from inData memoryDo quantize with shifterDo zigzagWrite data to outData memory
}
writeEnableAddr(6:0)
resetclock
Contents
1. Overview
2. ADS를 이용한 JPEG encoder profiling
3. HDL을 이용한 HW IP 설계 (VHDL or verilog)
4. ModelSim을 이용한 RTL simulation 실습
5. AHB slave 인터페이스를 만족하는 wrapper 설계
6. Codesign에 따른 성능비교
Using ModelSim
• STEP 1
– File->New->Project
Using ModelSim
• STEP 2
– Add source codes to the project
Using ModelSim
• STEP 3
– Complete the quantize.vhd file
Using ModelSim
• STEP 4
– Compile all the HDL codes in order.
Using ModelSim
• STEP 5
– Simulate the tb (provided testbench) and verify the module
Using ModelSim
• STEP 6
– Add the signal to wave view to debugging
Using ModelSim
• STEP 7
– Run the simulation
Using ModelSim
• STEP 8
– Verify that the code runs correctly
Contents
1. Overview
2. ADS를 이용한 JPEG encoder profiling
3. HDL을 이용한 HW IP 설계 (VHDL or verilog)
4. ModelSim을 이용한 RTL simulation 실습
5. AHB slave 인터페이스를 만족하는 wrapper 설계
6. Codesign에 따른 성능비교
AHB Slave Interface
•
AHB Decoder
•
AHB Timing
•
Response Type
HRESP[1:0] Response Description00 OKAY Transaction Completed
01 ERROR Error Occurs
10 RETRY Transaction Not Completed
Master Must Retry
11 SPLIT Transaction Not Completed
Master Must Retry
Slave Informs Completion
ERROR/RETRY/SPLIT: two cycle response
Burst Mode
HBURST[2:0] Type Description000 SINGLE Single transfer
001 INCR Incrementing burst of unspecified length
010 WRAP4 4-beat wrapping burst
100 WRAP8 8-beat wrapping burst
011 INCR4 4-beat incrementing burst
101 INCR8 8-beat incrementing burst
110 WRAP16 16-beat wrapping burst
111 INCR16 16-beat incrementing burst
Transfer Type
HTRANS[1:0] Type Description00 IDLE No data transfer required
Requires zero wait state OKAY response
01 BUSY Same as IDLE between burst transfers
Address/Control unrelated previous
10 NONSEQ Single transfer or the first of a burst
to the previous transfer
11 SEQ Remaining transfers in a burst
Address/control related
to the previous transfer
Transfer Size
HSIZE[2:0] Size Description000 8 bits Byte
001 16 bits Halfword
010 32 bits Word
100 128 bits 4-word line
011 64 bits -
101 256 bits 8-word line
110 512 bits -
111 1024 bits -
AHB HW wrapper design
• AMBA AHB의 slave interface와 quantize block 사이의 wrapper 설계
go
AHBwrapper
Quantizeblock
done
writeEnableaddr(6:0)
inData(31:0)outData(31:0)
AHB slave interface
Wrapper diagram for AHB slave interface
D Q
En
hselhwritehtranshaddr
write
go
done
address
wdata
haddr(9:8)
rdata
haddr(8:2)
hsel and htrans(1) and hwrite and not(haddr(9))
hready
hwdata
hrdata
Contents
1. Overview
2. ADS를 이용한 JPEG encoder profiling
3. HDL을 이용한 HW IP 설계 (VHDL or verilog)
4. ModelSim을 이용한 RTL simulation 실습
5. AHB slave 인터페이스를 만족하는 wrapper 설계
6. Codesign에 따른 성능비교
HW/SW codesign
• 설계된 wrapper와 HW block을 프로세서에 연결하여 통합 검증
– 주어지는 ARM7tdmi-s 프로세서 모델을 사용하여 HW/SW Co-Simulation을 수행
• Goal : Compare two models
– Pure SW JPEG
– Modified SW JPEG and HW module (Quantize and Zigzag)
HW/SW codesign
• Our simple SoC platform
ARM7tdmi-s
Quantize& Zigzag(HW IP)
ROM RAM
wrapper
UARTGPIO
AMBA AHB bus
Module name ID Base memory map
ROM (0)
(1)
UART (2) 0x20000000
(3)
HWIP (4) 0x40000000
0x00000000
RAM 0x10000000
GPIO 0x30000000
Target I – SW only
• AXD simulation
– Do ‘make’ command with given ‘jpeg.c’ file• Directory $SRCS/testDALRisc/sw
• ‘main’ function is replaced with ‘C_Entry’ function– ‘boot.s’ contains initialize and jump instruction to ‘C_Entry’
– Launch AXD with compiled ‘jpeg.axf’ file
Target I – SW only
• ModelSim simulation
– Steps• open project “dalrisc.mpf” (directory $SRCS/testDALRisc/modelsim)
• vsim work.tb
• mem load –i D:/works/dalrisc/sw/jpeg.mem /tb/uut/rom/memarr
• add waves
• run 104 ms
Target I – SW only
• Dump memory
– Steps
• export memory (GUI command)
– Final result will be stored on variable ‘ofp’
– The address range of ‘ofp’ 0x98b ~ 0xb09
• ./hex2bin dumped.mem dumped.jpg
– This command converts dumped hex data to jpeg file.
Target II – HW/SW codesign
• Modify JPEG SW code
– Synchronization method : polling
extern void waitFlag(int *flag);
HWPollingQuantizeNZigzag(int *input, int *output){
int *indata = (int*)0x40000000;int *outData = (int*)0x40000100;int *go = (int*)0x40000200;int *done = (int*)0x40000300;int i;
for (i=0; i<BLOCKSIZE; i++)*(inData+i) = input[i];
*go = 1;waitFlag(done);for (i=0; i<BLOCKSIZE; i++)
output[i] = *(outData+i);}
Target II – HW/SW codesign
• Compile and link modified SW code
– Cygwin make environment
• armasm –cpu arm7tdmi-s boot.s
• armasm –cpu arm7tdmi-s waitFlag.s
• armcc –cpu arm7tdmi-s -g -c –O2 jpeg_quant.c
• armlink –o jpeg_quant.axf –ro-base 0x0 –rw-base 0x10000000 –map –symbols –info Sizes –first boot.o boot.o waitFlag.o jpeg_quant.o –list jpeg_quant.info
• fromelf jpeg_quant.axf –bin –output jpeg_quant.bin
• ./bin2hex jpeg_quant.bin jpeg_quant.mem
Target II – HW/SW codesign
• ModelSim simulation
– Steps• Open project – “dalrisc.mpf”• Add your HW IP (ahbquantize.vhd, etc …)• Modify “Top_module.vhd” file
Replace “dummy” to “ahbquant” module
Target II – HW/SW codesign
• ModelSim simulation
– Steps (cont’d)
• compile uploaded codes
• vsim work.tb
• mem load –i D:/works/dalrisc/sw/jpeg_quant.mem /tb/uut/rom/memarr
• run 98 ms
• Dump memory• mem save ...
Target II – HW/SW codesign
• ModelSim simulation
– Debugging
• compile modified codes
• restart -f
• mem load –i D:/works/dalrisc/sw/jpeg_quant.mem /tb/uut/rom/memarr
• add waves that you want to see...
• break at which you want to see...
• run -all
Summary
• Analysis and compare the result
– Total SW time: 2,581,050 cycles (1 cycle = 40 ns)
– Iteration Number: 180
– Pure SW design (339,547 cycles)
– HW/SW codesign (138,420 cycles)
Module name SW time(cycles)
SW 1 iteration time (cycles)
Total communication
1 iteration communication
Quantize 250,104
149,443
01,389
830 0
0
Zigzag 0
Module name HW time(cycles)
HW 1 iteration time (cycles)
Total communication
1 iteration communication
Quantize 11,700
0
126,72065
0 0
704
Zigzag 0