platform-based design for mpeg-4 video encoder

28
Platform-based Design for MPEG-4 Video Encoder Presenter: Yu-Han Chen

Upload: thora

Post on 12-Jan-2016

36 views

Category:

Documents


4 download

DESCRIPTION

Platform-based Design for MPEG-4 Video Encoder. Presenter: Yu-Han Chen. Video Coding Standards. Storage. Broadcasting. Storage. HDTV. MPEG2. SDTV. Telcomm. MPEG1. 1994. Resolution/Quality. Telcomm. 1992. Storage. H.261. CIF. 1990. H.263. MPEG4. Multimedia. QCIF. 1999. 1995. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Platform-based Design for MPEG-4 Video Encoder

Platform-based Design forMPEG-4 Video Encoder

Presenter: Yu-Han Chen

Page 2: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 2

Video Coding Standards

H.263

H.261

MPEG1

MPEG2

MPEG4

1990

19921994

19991995

Telcomm

Storage

StorageBroadcasting

TelcommStorage

Multimedia

Data Rate

Re

so

luti

on

/Qu

alit

y

QCIF

CIF

SDTV

HDTV

10K 100K 1M 10M bps

Page 3: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 3

Introduction

Multimedia applications are emerging Video-phone, camcorder, surveillance, and video

streaming

MPEG-4 provides a total solution for these applications High compression ratio for limited bandwidth Error robustness to error-prone environment Content interactivity for more functionalities besides

‘seeing’

Page 4: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab.

Proposed MPEG-4 Encoder

MPEG-4 video encoding Platform-based system architecture Motion encoding module Texture encoding module

4

Page 5: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 5

MPEG-4 Simple Profile Video Encoder

DCT QDC/AC

Prediction

IDCT IQ

VLC

MC

ME

+

Frame Memory

BlockEngine

VideoSource

BitstreamScan

Page 6: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 6

Complexity Analysis of Optimized Software Model

• SPL3 foreman sequence at 30 fps• ME – full search with half-stop algorithm• DCT/IDCT – row-column decomposition

Computing Controlling Memory Access(MIPS) (MIPS) (MBytes/Sec)

ME 6,142.64 75.91 2,766.08 77.33 30,668.20 80

IDCT 539.31 6.66 109.49 3.06 2,016.22 5.26DCT 442.16 5.46 58.52 1.64 1,621.95 4.23MC 386.12 4.77 271.45 7.59 1,987.58 5.18Q 205.55 2.54 129.33 3.62 629.79 1.64ACDC 112.08 1.39 64.36 1.8 387.65 1.01SCAN 91.96 1.14 60.33 1.69 385.09 1IQ 93.8 1.16 56.66 1.58 338.12 0.88VLC 77.6 0.96 60.65 1.7 301.77 0.79TOTAL 8,092 100 3,577 100 38,336 100

Units % % %

Page 7: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 7

Characteristics of Video Coding Tools

Processingtype

Example Parallelism FeaturesDatatype

Frequency(CIF 30fps)

PreferredImplementation

ProgramControl

Coding modeselection,predictor

Mostlysequential

High complexity 16-bitVery Low(10K Hz)

SW

Streamprocessing

VLC/VLD,CAE/CAD,parsing, RLD

Mostlysequential

High complexity, non-word-alignedprocessing

< 16-bitMedium(1~10MHz)

HW or SW

BlockProcessing

DCT/IDCT,MC, ME,filters

HighLow complexity, highdata rate, regular

8, 16-bit

High(10M~10GHz)

HW

[Micro]

Page 8: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 8

Implementation Demands

Computational power is up to 12 GIPS ME is the most important key component DCT/IDCT is the second one Dedicated hardware accelerators is employed

Implementation for various features of algorithms Software for irregular and sequential ones Hardware for high-processing rate ones

HW/SW co-design is the most promising solution to achieve a cost-effective system

Page 9: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 9

Platform for MPEG-4 Video Coding

SRAMHyRISCFirmware

ME

Wrapper

MC

Wrapper

BlockEngine

Wrapper

DMA

Wrapper

Sequencer

Wrapper

ExternalMemory

Coeff.Generator

MEMIFBitstream

Unit

Wrapper

RISC BUS (16 bits)

Data BUS (32 bits)

Coeff.Buffer

VirtualTools

CHIP is inside the dot-line region

Platform-based system includes HYRISC, RBUS and DBUS, DMA, MEMIF

Hardware accelerators includes ME, MC, BE(DCT/IDCT,Q,IQ,ACDCP), Bitstream Unit,

Share Memory (CG, CB)

Page 10: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 10

Motion Encoding Module

SpiralPattern

Rom-basedDiamondPattern M

UX

Ctrl.

(id,u,v)

FIFO

feed

full

AG

(id,u,v)

empty

fetch

SWMEM

MBRAM

AdderTree

AccumulatorComparatorElimination

start/finishdata_in

id

(pmvx, pmvy)

mode

(mvx.m

vy, SAD

)

(id,u,v)

Pattern Generation Distortion CalculationFIFO

Loading Path:Ref. Sum Ram/MB Ram/Ref. Ram

stop

RangeChecker

Page 11: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 11

Summary of ME

Low cost and high performance hybrid motion estimation is proposed

Dynamic modes for various applications Applications of real-time and low power

PDS (Predictive Diamond Search) mode Applications of high compression quality

FFS (Fast Full Search) mode Spiral full search with PDE (Partial Distortion Elimination)

Page 12: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab.

Texture Encoding Module

Interleaving DCT/IDCT schedule DCT and IDCT are performed interleaved for the

same block Sub-structure sharing technique

Applied on AC/DC prediction datapath and Q/IQ by extracting the same formula term

12

Page 13: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab.

Interleaved DCT/IDCT Processing

13

Y1 Y3Y2 Y4 Cb Cr

1-D 1-D

Q

IQ

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ

DCT

1-D 1-DIDCT

time1 2 3 4 5 6 7 8 9 10 11 12 13

1-DDCT/IDCT

Unit

TransposeMemory

DMUX1:2

Z

X

YYMUX

2:1

Page 14: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 14

Sub-structure Sharing of Q/IQ and ACDC Prediction

Scalar operation : (QAC x QPA) / QPX Share partial result (QAC x QP = M) in IQ module Share data-path of Q for M / QPx

Y1 Y3Y2 Y4 Cb Cr

1-D 1-D

Q

IQ(Y1)

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ(Y2)

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ(Y3)

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ(Cb)

DCT

1-D 1-DIDCT

1-D 1-D

Q

IQ(Cr)

DCT

1-D 1-DIDCT

time1 2 3 4 5 6 7 8 9 10 11 12 13

1-D 1-D

Q

IQ(Y4)

DCT

1-D 1-DIDCT

DIV(Y1)

DIV(Y2)

DIV(Y3)

DIV(Y4)

DIV(Cb)

DIV(Cr)

Page 15: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 15

Chip Features and Layout

Chip MPEG-4 Video Encoder

Specification Simple profile @ Level 3Encoding Complexity 352 x 288 at 30 fps

Technology TSMC 0.35 um 1P4M

Die Size 5.1 x 5.1 mm2

Logic gate count 71,459 gates

On-chip memory 39,080 bits

Off-chip memory 2,027,527 bits

Transistor count 828692 trans.

Package 208 CQFP

Input PAD 67

Output PAD 83

Power PAD 48

Working frequency 40 MHzVoltage 3.3V

Power Consumption 339.51mW

Page 16: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab.

Hardware/Software Co-Design Flow

16

Page 17: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 17

Subject View 

FFS (QP = 16 PSNR_Y=32.4012, Bits=9537)

PDS (QP = 16, PSNR_Y=32.0256, Bits=9465)

Worse case of PSNR drop (0.3962 dB) at the 69th frame

Page 18: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 18

R-D Curve for Stefan (High Activity)

24

26

28

30

32

34

36

0 500 1000 1500 2000 2500 3000

Bit rate (Kbps)

PSN

R Y

(dB

)

PDS

FFS

Page 19: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 19

Conclusion

A cost-effective MPEG-4 video encoder is proposed Hardware accelerators

A novel hybrid motion estimation architecture A cost-effective texture block engine

architecture Platform-based system backbone

Compromise flexibility and high performance HW/SW co-design flow and tools

Page 20: Platform-based Design for MPEG-4 Video Encoder

20

Thank you

Page 21: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 21

DCT/IDCT Coefficient Matrix

N=8

16

7cos

8

3cos

16

5cos

16

3cos

8cos

16cos

4cos

2

N

g

f

e

d

c

b

aEven Symmetric

gedbbdeg

fccffccf

ebgddgbe

aaaaaaaa

dgbeebgd

cffccffc

bdeggedb

aaaaaaaa

A

Odd Symmetric

Page 22: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 22

1-D DCT and IDCT

1-D DCT (Y=AX)

1-D IDCT (Y=ATX)

)4()3(

)5()2(

)6()1(

)7()0(

)6(

)4(

)2(

)0(

XX

XX

XX

XX

fccf

aaaa

cffc

aaaa

Y

Y

Y

Y

)4()3(

)5()2(

)6()1(

)7()0(

)7(

)5(

)3(

)1(

XX

XX

XX

XX

bdeg

dgbe

ebgd

gedb

Y

Y

Y

Y

)6(

)4(

)2(

)0(

)3(

)2(

)1(

)0(

X

X

X

X

faca

cafa

cafa

faca

Y

Y

Y

Y

)7(

)5(

)3(

)1(

X

X

X

X

bdeg

dgbe

ebgd

gedb

)6(

)4(

)2(

)0(

)4(

)5(

)6(

)7(

X

X

X

X

faca

cafa

cafa

faca

Y

Y

Y

Y

)7(

)5(

)3(

)1(

X

X

X

X

bdeg

dgbe

ebgd

gedb

Preprocessing

Postprocessing

Data Reordering

Data Reordering

8 MAC operation down to 4!

Page 23: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 23

DCT/IDCT Architecture

DRU(Data Reordering Unit):

BDEG MATRIX VECTOR MULTIPLIER

ACF MATRIX VECTOR MULTIPLIER

DRUTRANSPOSE

MEMORYIDRUX

YZ

MUXB

MUXA LIFO MUXC MUXDADD SUB

X

Y

INSEL

Two parallel MAC

Preprocessing Postprocessing

Two 1-D operation multiplexing

DCT IDCT

Page 24: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 24

Multiplication of Constant Coefficients

Only 7 constant coefficients used Sign Digit representation

Minimum nonzero term (1, -1) Shift and Add

Avoid dedicated multiplier

C o e f f i c i e n t V a l u e 1 2 b i t

s i g n e d S D r e p r e s e n t a t i o n

N o . o f

N o n - Z e r o

a 0 . 3 5 3 5 5 0 . 3 5 3 5 1 35352.022222 97542 5

b 0 . 4 9 0 3 9 0 . 4 9 0 2 3 49036.02222 13971 4

c 0 . 4 6 1 9 4 0 . 4 6 1 9 1 46191.02222 10751 4

d 0 . 4 1 5 7 3 0 . 4 1 5 5 2 41577.0222222 1297532 6

e 0 . 2 7 7 7 9 0 . 2 7 7 3 4 27783.02222 11842 4

f 0 . 1 9 1 3 4 0 . 1 9 0 9 1 19135.02222 14843 4

g 0 . 0 9 7 5 5 0 . 0 9 7 1 6 09753.02222 13854 4

Page 25: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 25

Gate Count Distribution

0%3%3%3%

4%

5%

5%

7%

8%

9%9%

21%

23%

ME (15565)

DCT/IDCT (14785)

VLC (6505)

WRAPPER (6215)

HYRISC (5785)

Q (4736)

MC (3459)

IQ (3278)

COGEN (2619)

DMA (2382)

LPSEQ (2045)

DCACP (1885)

BUS (300)

Page 26: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 26

Memory DistributionFunctions Characteristic Depth x Width Num. Bits

Current MB for ME Asynchronous 32x32 2 2,048

Block buffers for MC/COGEN

16x32 6 3,072

Search window for ME 288x8 8 18,432

ACDC Prediction Two port 76x12 1 912

Two port 386x12 1 4,632

Two port 64x12 1 768Transpose mem. forDCT /IDCT

64x16 1 1,024

RISC data RAM Two port 512x16 1 8,192

Total 21 39,080

RISC instruction ROMROM

1024x22 1 22,528

External RAMSRAM withACK 152,416x32 1 4,877,312

Total 2 4,899,840

On-chipRAM Scan buffer

Off-chipRAM

Page 27: Platform-based Design for MPEG-4 Video Encoder

27

Power Estimation

Page 28: Platform-based Design for MPEG-4 Video Encoder

DSP/IC Design Lab. 28

Power Consumption Estimation

Originalfeatures

Case 1 Case 2 Case 3

Technology (μ m) 0.35 0.18 0.18 0.18Spec. CIF at 30 fps CIF at 30 fps QCIF at 15 fps QCIF at 15 fps

Encoding complexity (MBs/s) 11880 11880 11880 2970

Working frequency (MHz) 40 40 5 5Voltage (V) 3.3 1.5 1.5 1.5Gated clock No No No Yes

Power estimation (mW)339.51

(Powermill)154.32 19.29 6.55

Case 1 – 0.18μm Case 2 – 0.18μm, 1/8 computational power Case 3 – 0.18μm, 1/8 computational power, gated clock