lab-1: profiling/optimizing video decoder using...

22
Lab-1: Profiling/Optimizing Video Decoder Using ADS National Chiao Tung University Chun-Jen Tsai 3/3/2011

Upload: vukien

Post on 11-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Lab-1: Profiling/OptimizingVideo Decoder Using ADS

National Chiao Tung UniversityChun-Jen Tsai

3/3/2011

2/22

Profiling MPEG-4 SP Decoder Goal: Profiling and optimizing the MPEG-4 video

decoder, m4v_dec Tasks: Profile the video decoder under ADS Analyze the results and identify the hotspots Optimize the decoder based on the hotspot analysis Redraw the pie charts after your optimization

Please also write a report (two-column, no coversheet, 4 pages at most) to summarize youranalysis and optimization of the system model

3/22

Embedded Software Design Flow Take ARM-based systems for example:

*.c/.cppC/C++ source C libraries

*.sasm source

object libraries

C compiler assembler

linker

Librarian

*.oELF object file

*.axf image

axd

ARMulator

System models

developmentboard

All the tools are integrated in the IDE:ARM Developers Suite (ADS)

debug

4/22

Generations of ARM Toolchains ARM SDT –Software Development Tools Final version 2.5, 1998

ARM ADS –ARM Software Development Suite Final version 1.2, 2000 Still popular in the industry

ARM RVDS –RealView Development Suite Latest version 4.0 With emphasis on Electronic System Level (ESL)

design environment

5/22

Cross-Platform Development

serial, Ethernet,or JTAG cable

Host Computer

Development board

ADS IDE environment†

†You can obtain a 45-day full function ADS 1.2 trial CD image from the TAA tutorial can be downloaded from http://www.arm.com/support/tutorials/16213.html

6/22

Verification of Your Software During the development cycle, you often have to

debug your software; for embedded firmwaredevelopment, the process involves threecomponents: Debugger (runs on host computer): axd Debug agent: interface between debugger and your

code Target platform: the platform (simulated or emulated)

that executes your code

7/22

Debug Agent

A debug agent performs the actions requestedby the debugger, for example: setting breakpoints reading from memory writing to memory.

The debug agent is not the program beingdebugged, or the debugger itself

Examples: Angel, JTAG circuits

8/22

Target Platform Target platform can be real hardware or

simulator If simulator platform is used, the core component

is a instruction set simulator (ISS) For example, ARMulator is the famous simulator in

ARM ADS ARMulator also doubles as a platform simulator, but

not as powerful as simulators from other venders(such as CoWare)

9/22

ARM Debugging Setup†

Runs on PC Host

†AXD and armsd Debuggers Guide, Page 1-6.

Probably runson the samePC Host

10/22

ADS Workspace

source window

build messages

project window

11/22

AXD Desktop

disassemblywindow

sourcewindow

Console

Systemview

systemoutput

12/22

Profiling and CPU Cycle Analysis Profiling and CPU cycle analysis are two

different approaches to analyze your software Profiling gives you a per-function complexity analysis CPU cycle analysis gives you more insights into the

software regarding computation vs. memory accesses

Under ADS, you use AXD to do both For profiling, ADS generates some data and a

command line tool armprof is used to analyze data For cycle analysis you must display ARMulator

internal statistics counter in an ADS window

13/22

CPU Cycle Types of ARM Sequential (S cycle)

The ARM core requests a transfer to or from an address which iseither the same, or one word or one-half-word greater than thepreceding address.

Non-sequential (N cycle) The ARM core requests a transfer to or from an address which is

unrelated to the address used in the preceding address. Internal (I cycle)

The ARM core does not require a transfer, as it performing aninternal function, and no useful prefetching can be performed atthe same time

Coprocessor register transfer (C cycle) The ARM core wished to use the data bus to communicate with a

coprocessor, but does not require any action by the memorysystem.

14/22

The System Model Used in Labs For labs, we will use m4v_dec –an MPEG-4

video decoder†, as the system model Contains 28 files, 5212 lines of C code

Differences between m4v_dec and xvid 0.9: Simpler API Support for Simple Profile combined mode with resync

marker Decoder-only library Pure C implementation (thus, can be used as a

system model)†m4v_dec is based on version 0.9 of the GNU MPEG-4 codec project, xvid

(see http://www.xvid.org for latest xvid source).

15/22

About the Source Package There are two project files in the project directory,“m4v_dec”: “m4v_dec.mcp”is the project workspace file for ADS; double-

click this and ADS will bring up the Development IDE “Makefile”is the make file for eCos/gcc toolchain

In the “tools”directory, there is a Win32 program,vidview.exe, for playing the decoded video (output.yuv)

In the “bitstream”directory, there is a samplecompressed video bitstream, foreman_150.m4v

16/22

Video Decoder Block Diagram

+VLD Q-1 IDCT Use MC?Y

N

macroblock mode, motion vector

DCTcoefficient data

VLD: variable length decodingDC/AC–1: inverse DC/AC predictionQ–1: inverse quantizationIDCT: inverse transformMC: motion compensationBilinear: half-pel Interpolation

DC/AC-1

videobitstream

decodedimage

referenceimage

To output(display)

The functional block diagram of m4v_dec:

Bilinear

MC

17/22

About Optimization A sample result of profiling is as follows

Obviously, for optimization you want to start withthe idct() function in idct.c

IDCT

Inverse Quantization

Interpolation

Boundary Extension

Color Conversion

Motion Compensation

DC/AC Prediction

VLC Decoding

18/22

Main Decoder Modules You may want to take a deeper look of the

following files: bilinear8x8.c (interpolation) idct.c (inverse DCT) mbcoding.c (VLC decoding) quant_h263.c (inverse quantization) mem_transfer.c (motion compensation) mbprediction.c (DC/AC prediction)

19/22

Hint: Removing Floating Point Floating point operations in general can be

removed as follows:

main(int argc, char **argv){

double a, b;int c;

a = 3.14159;b = 1.41421;c = (int) floor(a+b+0.5);

}

main(int argc, char **argv){

int a, b;int c;

a = 6434; // 2048*3.14159b = 2896; // 2048*1.41421c = (a+b+1024)>>11;

}

20/22

After Removing FP Operations An example of optimized result:

IDCT

Inverse Quantization

Interpolation

Boundary Extension

Color Conversion

Motion Compensation

DC/AC Prediction

VLC Decoding

21/22

Necessary Charts in Your Report

In your report, you shall provide the followinginformation Draw a pie chart that shows major CPU load

distribution among functions For the top-10 functions which consume most CPU

time, draw a pie chart to show the distribution ofmemory cycles

22/22

References for This Lab You can find some pdf ebooks related to this lab in the

document folder of your ADS installation directory: For General ARM Programming

ADS Programming Guide (includes three manuals) Writing Efficient C for ARM (ARM App. Note 34)

For Profiling using AXD AXD and armsd Debuggers Guide

For Using ARMulator ADS Debug Target Guide Benchmarking with ARMulator (ARM App. Note 93) The ARMulator Configuration File (ARM App. Note 52)

For MPEG-4 Video Decoder Knowledge Class slides “Google”or “Wikipedia”