m-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · jpeg images. to decode each jpeg...

64

Upload: vuongxuyen

Post on 15-Feb-2019

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,
Page 2: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

M-JPEG Decoding Using OpenCL on FusionGuillaume de Bailliencourt | Morgan Multimedia | Manager

Page 3: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Session Description

3

Page 4: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Session DescriptionThis session presents a project with the goal of decoding and displaying an M-JPEG video stream using the GPU part of an AMD Fusion APU as much as possible. An M-JPEG video stream is composed of a sequence of JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization, inverse DCT (Discrete Cosinus Transform), pixel upsampling, and color conversion. The CPU part of the AMD Fusion APU handles the first decoding stage. The GPU part of the APU performs all the other decoding stages and displays the image. OpenCL and DirectX are used to code the GPU part in DirectShow environment.

4

Page 5: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Overview

5

Page 6: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

OverviewM-JPEG applicationsM-JPEG “standard”DirectShow graphSourcesMorgan M-JPEG DecoderVideo Renderers• Common DirectShow Video Renderers• Morgan DirectX 10 Video Renderer (supporting D3D10 Interop)

Timing & BenchmarkPSNR & SSIMUsage examples & benefits

6

Page 7: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

M-JPEG applications Past & Present

7

Page 8: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

M-JPEG applicationsPast (90’s)

H/W Capture & Editing• Fast Screen Machine (still images)• Targa 1000 & 2000 (Mac & PC)• Matrox Rainbow Runner• Miro DC10, DC20 & DC30• Iomega Buz• Professional solutions (Avid, GrassValley, …)

All were using H/W codec (Zoran, ST Micro, Ti …)

8

Page 9: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

M-JPEG applicationsPresent

Non MP4 Digicam & DSLR in video mode• Up to 1080p 30fps• MOV or AVI container

High-end DSLR in burst mode• JPEG files sequence

Webcams & HD Webcams (Microsoft, Logitech)• Chip streams M-JPEG data to USB (decoded in driver)• Up to 1080p 30fps• Up to 2.5K x 2K 10fps

IP Cameras• Streams M-JPEG to IP network

Video & Digital Cinema Editing• Transcoding DSLR sources• Low-res proxy

9

Page 10: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

M-JPEG “standard”

10

Page 11: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

M-JPEG “standard”There’s no real standard

Microsoft OpenDML AVI File Format Extensions• ‘MJPG’ FourCC• Missing Huffman tables• Interlaced if height > 288 (2 JPEG per frame)• Not respected in many Digicam ‘MJPG’ AVI files (HD progressive, Huffman tables, …)

QuickTime File Format Specification (MOV)• Photo-JPEG• MJPEG-A & MJPEG-B

- Missing Huffman tables, or not …- Missing JPEG markers, or not …

Others• Old h/w bitstreams ‘TVMJ’, ‘FLJP’, …

- Little endian- Missing JPEG tables and markers

Complex “universal” parsing & Huffman decoding

11

Page 12: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

DirectShow graph

12

Page 13: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Source Decoder Renderer

DirectShow graph

13

Page 14: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Sources

14

Page 15: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

SourcesFile Source• Need Demux

- AVI- MOV- Other container (MKV, JPEG file sequence, …)

AVStream• Webcam / USB 2.0• Other devices / Other buses

NetStream• HTTP• RTP / RTSP• Other protocol

15

Page 16: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG Decoder with GPU off-loading

16

Page 17: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG DecoderOverview

JPEG decoding & display diagramCPU part : C++, ASM & SIMDGPU part : OpenCL & AML (AMD Media Library)Overlapping CPU decoding & GPU decodingAvoid Memory Transfer (Zero Memory Copy on APU)Multithreaded decodingOutput to mapped host mem | device mem | D3D10 Interop

17

Page 18: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Parsing + Huffman + De‐zigzag

iQ + iDCT

Upsampling+ Color

Conversion + Scaling

Display

Morgan M-JPEG DecoderJPEG decoding & display diagram

18

Page 19: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG DecoderCPU part : C++, ASM & SIMD

C++ for core decoder object wrapper (multithreaded)C++ for JPEG parserASM for Huffman decoder• Output to small temp buffer, fits in cache

SIMD for mem transfer between CPU & GPU partsSIMD (integer 16-bits signed) for CPU iQ & iDCT (benchmark GPU vs CPU)CPU optimized decoder vs in-box M-JPEG decoder• x3 faster / 1 core• x6 faster / 2 cores• x9 faster / 3 cores• x11 faster / 4 cores

19

Page 20: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG DecoderGPU part : OpenCL & AML (AMD Media Library)

“AML is a library of OpenCL-based kernels that allow codec developers to use many degrees of freedom in implementing an optimized set of video encoders, decoders, and transcoders that use combinations of CPU, GPU/APU shaders, and GPU/APU dedicated hardware.” Mike Schmit - Sr Manager, Video Software - AMD

“JpegDecode” AML kernels• Inputs

- Buffer of raw DCT coefficients (16-bits signed), 3 components, Planar or Interleaved supported- Quantization Tables- Buffer description

• Do iQ & iDCT (32-bits float precision)• Output

- YCbCr buffer (8-bits unsigned)

20

Page 21: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG DecoderOverlapping CPU & GPU decoding

Double buffered input for kernelEnqueue OpenCL async commands once CPU decoding is finishedRequires async sources

21

Page 22: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG DecoderAvoid Memory Transfer (Zero Memory Copy on APU)

Memory Copy

Zero Memory Copy

• CPU decoding : CL_MEM_READ_ONLY | CL_MEM_USE_PERSISTENT_MEM_AMD• GPU decoding : CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR

22

Page 23: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG DecoderMultithreaded decoding

N Cores => N Frames / N CPU decoding Threads running in parallel1 context, 1 deviceN command queuesN kernel instancesThreads synchronisationRespect frame order (Out of order execution to In order delivery)Requires async sources for performance boost (transcoding)At nominal frame rate or with sync sources, balances load over N cores

23

Page 24: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Multithreaded decoding

Requires async sources for performance boost (transcoding)

Morgan M-JPEG Decoder

24

Page 25: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Multithreaded decoding

At nominal frame rate or with sync sources, balances load over N cores

Morgan M-JPEG Decoder

25

Page 26: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Morgan M-JPEG DecoderOutput to mapped host mem | device mem | D3D10 Interop

Output to mapped host mem when• Downstream filter need host memory (CPU encoder)• Downstream filter is using DirectX 7 or 9 surfaces (no more D3D9 Interop, DXVA Sharing

undocumented)• On APU, small impact (Zero Memory Copy)

Output to device mem when• Downstream filter supports OpenCL (GPU encoder, GPU processing)• Using MEDIASUBTYPE_AMLV (AML)

Output to D3D10 Interop when• Using Morgan DirectX 10 Video Renderer• Writing custom D3D10 processing/rendering code

26

Page 27: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Video Renderers

27

Page 28: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Video RenderersCommon DirectShow Video Renderers (in-box)

Video Mixing Renderer 7 (VMR7)• It allocates DDRAW 7 Surfaces• Morgan M-JPEG Decoder side

- Optimized Lock/Unlock using AM_GBF_NODDSURFACELOCK flag on GetBuffer()- Query IVMRSurface on sample- Call IVMRSurface::GetSurface to get LPDIRECTDRAWSURFACE7- Lock LPDIRECTDRAWSURFACE7- Copy kernel output (mapped host memory) to locked surface- Unlock surface

28

Page 29: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Video RenderersCommon DirectShow Video Renderers (in-box)

Video Mixing Venderer 9 (VMR 9)• It allocates D3D9 Textures• Morgan M-JPEG Decoder side

- Optimized Lock/Unlock using AM_GBF_NODDSURFACELOCK flag on GetBuffer()- Query IVMRSurface9 on sample- Call IVMRSurface9::GetSurface to get IDirect3DSurface9- Call IDirect3DSurface9::LockRect- Copy kernel output (mapped host memory) to locked surface- Unlock surface

29

Page 30: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Video RenderersCommon DirectShow Video Renderers (in-box)

Enhanced Video Renderer (EVR)• It allocates D3D9 Textures• Morgan M-JPEG Decoder side

- Optimized Lock/Unlock using AM_GBF_NODDSURFACELOCK flag on GetBuffer()- Query IMFGetService on sample- Call IMFGetService::GetService to get IDirect3DSurface9- Call IDirect3DSurface9::LockRect- Copy kernel output (mapped host memory) to locked surface- Unlock surface

30

Page 31: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Video RenderersMorgan DirectX 10 Video Renderer

Without D3D10 Interop• It allocates D3D10 Textures (D3D10_USAGE_DYNAMIC, D3D10_CPU_ACCESS_WRITE)• Morgan M-JPEG Decoder side

- Optimized Map/Unmap using AM_GBF_NODDSURFACELOCK flag on GetBuffer()- Query IVMRSurface10 on sample- Call IVMRSurface10::GetTexture to get ID3D10Texture2D- Call ID3D10Texture2D::Map- Copy kernel output (mapped host mem) to mapped texture- Unmap Texture

31

Page 32: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Video RenderersMorgan DirectX 10 Video Renderer

With D3D10 Interop• It allocates D3D10 Textures (D3D10_USAGE_DEFAULT,

D3D10_RESOURCE_MISC_SHARED)• Morgan M-JPEG Decoder side

- Set AM_GBF_NODDSURFACELOCK | AMD_MM_GPU_USE_D3D10_INTEROP flag on GetBuffer()

- Query IVMRSurface10 on sample- Call IVMRSurface10::GetTexture to get ID3D10Texture2D- Call clCreateFromD3D10Texture2DKHR with ID3D10Texture2D (one time)- Call clEnqueueAcquireD3D10ObjectsKHR- Call clEnqueueCopyBufferToImage (Copy kernel output to D3D10 Texture)- Call clEnqueueReleaseD3D10ObjectsKHR

32

Page 33: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Video RenderersMorgan DirectX 10 Video Renderer

GPU Processing• Accepts inputs > 8 bpc• One or Two pass• Uses D3D10 Pixel Shaders• 32-bits float precision• Upsampler / Scaler• YUV to RGB• RGB range• Chromatic adaptation

(optional)• Output to 8, 10 or 16 bpc

33

Page 34: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing & Benchmark

34

Page 35: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing & BenchmarkOverview

System setup & Reference clipTiming• Output to device mem• Output to mapped host mem• Output to D3D10 Interop

Benchmark• GPU vs CPU

35

Page 36: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

System setup & Reference clipSystem setup• Quad core Llano APU (no L3 cache, 4x1MB L2 cache)• “CPU” clock 24x100MHz = 2.4GHz• TurboCore (800MHz–2400MHz, can be even higher than 2.4GHz)• RAM 8GB DDR3 @ 667x2 = 1333MHz• “GPU” HD 6550D @ 594MHz (BeaverCreek)• APU set to “Performance” mode (TurboCore policy)• Win 7 x64• AMD APP Profiler, use clEnqueueMarker to mark key points in timeline (CPU start/stop, Deliver, …)• GraphStudio x64 (GraphEdit like)

Reference clip• Shot by DSLR (Panasonic GF1, “Customized” firmware)• 1080p 30fps 4:2:0 37Mb/s• MOV container (Photo-JPEG)• Played at full speed for timing & benchmark

36

Page 37: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing

Output to device mem (In all cases Input is mapped device mem)1 CPU decoding Thread / 1 Core

37

Deliver (n)

(Frame n)

(n-1)

(n+1)

(n)

Deliver (n-1)

Page 38: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing

Output to device mem2 CPU decoding Threads / 2 Cores

38

(Frame n)(n-2)

(n+1)(n-1)

(n+2)

(n-3)

(n-2)

(n-1)

(n)

Page 39: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing

Output to device mem3 CPU decoding Threads / 3 Cores

39

(n+1)(n+2)

(n+3)

(n+4)(n+5)

(n)

Page 40: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing

Output to device mem4 CPU decoding Threads / 4 Cores

40

(n)

(n+1)(n-1)

(n+2)

(n+3)

(n+4)

(n+5)

(n+6)

Page 41: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing

Output to mapped host men / Copy to Downsream Filter Input Buffer1 CPU decoding Thread / 1 Core

41

Deliver

Page 42: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Timing

Output to D3D10 Interop / Morgan DirectX 10 Video Renderer1 CPU decoding Thread / 1 CoreSo far isn’t as efficient as 2 others output methods

42

Render(n-1)

(Frame n)

Page 43: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

BenchmarkGPU vs CPU

CPU outputs to host memGPU outputs to device memConnected to Null RendererReference clip played at full speed

43

Page 44: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

BenchmarkGPU vs CPU

44

In-box MJPEG Decoder : 34.96 ms

(Lower is better)

Page 45: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

BenchmarkGPU vs CPU

45

In-box MJPEG Decoder : 29 fps

(Higher is better)

Page 46: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

BenchmarkGPU vs CPU

46

In-box MJPEG Decoder : 25 % Total

Page 47: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

BenchmarkGPU vs CPU

vs optimized decoder• x2.18 / 1 core• x2.08 / 2 cores• x2.11 / 3 cores• x1.77 / 4 cores

vs in-box decoder• x6.69 / 1 core• x12.52 / 2 cores• x18.52 / 3 cores• x18.97 / 4 cores

47

Page 48: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIM

48

Page 49: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMOverview

Test setup & definitionsCPU only vs CPU+GPU decodingComparing to reference decoder (IJG / integer mode)• CPU only vs Reference decoding• CPU+GPU vs Reference decoding

49

Page 50: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMTest setup & definitions

Same system, same reference clipPSNR (Peak Signal-to-Noise Ratio)• Ratio between the maximum possible power of a signal and the power of corrupting noise

that affects the fidelity of its representation• > 50 db => very good

SSIM (Structural Similarity Index Metric)• A method for measuring the similarity between two images• Designed to improve on traditional methods like PSNR• Near 1 => very good

50

Page 51: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMCPU only vs CPU+GPU decoding

51

CPU only CPU+GPU

Page 52: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMCPU only vs CPU+GPU decoding (iDCT output)

52

CPU only CPU+GPU

Page 53: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMCPU only vs CPU+GPU decoding (iDCT output)

53

(CPU+GPU) - (CPU only) (CPU+GPU) / (CPU only)

Page 54: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMCPU only vs CPU+GPU decoding

PSNR• Overall : 51.2535 db

SSIM• Average : 0.9967392

54

Page 55: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMCPU only vs Reference decoding

PSNR • Overall : 62.4848 db

SSIM • Average : 0.99965868

55

Page 56: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

PSNR & SSIMCPU+GPU vs Reference decoding

PSNR• Overall : 51.3943 db

SSIM• Average : 0.9968016

56

Page 57: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Usage examples & benefits

57

Page 58: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Usage examples & benefitsOverview

TranscodingHD WebcamSecurity & SurveillanceHeavy image processing

58

Page 59: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Usage examples & benefitsTranscoding

Connect to OpenCL Scaler (MEDIASUBTYPE_AMLV)Connect to MP4/H264 OpenCL Encoder (AML based)Connect to MP4 muxer & file writterProduce video for YouTube, Media Player box, iPhone, iPad, …Almost all transcoding done on GPUBenefit : Fast

59

Page 60: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Usage examples & benefitsHD Webcam applications (All done on GPU)

Picture processing (denoise, luma & chroma correction, …)Face/eyes/lips trackingFeature detectionFun decoration / augmented reality• 3D domain• Pixel shaders• Alpha blending

Video conferencing (connected to MP4/H264 OpenCL encoder)Benefit : Fast, save CPU, even with 1080p @ 30fps

60

Page 61: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Usage examples & benefitsSecurity & Surveillance applications (All done on GPU)

Source : IP Camera, “JPEG” Security Camera, WebcamIntrusion detectionRecognition (face, car plate, …)Benefit : Allows heavier real-time processing, save CPU

61

Page 62: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Usage examples & benefitsHeavy image processing applications (All done on GPU)

Medical imageryScience & researchBenefit : Allows heavy processing, save CPU

62

Page 63: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

Thank You, Questions ?

63

Page 64: M-developer.amd.com/wordpress/media/2013/06/1721_final.pdf · JPEG images. To decode each JPEG image, the bitstream needs to pass through five stages: huffman decoding, inverse quantization,

64

Disclaimer & AttributionThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is noobligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.

NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.

The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied.