Transcript
Page 1: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Batch-Pipelining for H.264 Decoding on Multicore Systems

Batch-Pipelining for H.264 Decoding onMulticore Systems

Tang-Hsun Tu and Chih-Wen HsuehGraduate Institute of Networking and Multimedia

National Taiwan University, Taipei, Taiwan 621, R.O.C.{d98944004, cwhsueh}@csie.ntu.edu.tw

Pipelining has been applied in many area to improve performance by overlappingexecutions of computing stages. However, it is difficult to apply on H.264/AVC decodingin frame level, because the bitstreams are encoded with lots of dependencies and littleparallelism is left to be explored. Even slice-level parallelism in H.264 is intuitive, becausethere is usually only one slice in a frame, it is not very applicable. Therefore, after somesoftware improvement, many researches can only adopt hardware assistance. Fortunately,pure software pipelining can be applied on H.264/AVC decoding in macroblock levelwith reasonable performance gain.

However, the pipeline stages might need to synchronize with other stages and incurlots of extra overhead. Moreover, the overhead becomes relatively larger as the stagesthemselves are executed faster with better hardware and software optimization. We firstgroup multiple stages into larger groups as ”batched” pipelining to execute concurrentlyin multicore systems. The stages in different groups might not need to synchronize toeach other so that it incurs little overhead and can be highly scalable. Therefore, a noveleffective batch-pipeline (BP) approach adopting the advantages of both data and functiondecomposition for H.264/AVC decoding on multicore systems is proposed. Moreover,because of its flexibility, BP can be used with other hardware approaches or softwaretechnologies to further improve performance. To optimize our approach, we also analyzehow to group the macroblocks and derive close-form formulas to guide the grouping.

We conduct various experiments on various bitstreams to verify our approach as shownin the following table. The results show that it can speed up to 93% and achieve up to249 and 70 FPS for 720P and 1080P resolutions, respectively, on a 4-core machine overa published optimized H.264 decoder. We believe our batch-pipelining approach createsa new effective direction for multimedia software codec development. The detail papercan be found in http://www.csie.ntu.edu.tw/˜ cwhsueh/papers/BPh264 2010 DCC.pdf.

No. Name Res. Size(M)/# i% OPT PD xBP 0BP BP-i BP PD+BP All(+UD)01. Cornell 1080P 109.7/3598 10.5 38 47% -14% 6% 13% 14% 73% 69 (83%)02. Artbeats 1080P 115.6/2850 33.9 31 44% -12% 0% -5% 3% 57% 50 (61%)03. BBC-CFB 1080P 44.4/2433 21.8 36 55% -11% 7% 5% 7% 80% 70 (93%)04. Shark 1080P 81.3/1801 20.8 27 42% 4% 8% 8% 16% 71% 50 (82%)05. Harbour 720P 9.5/300 1.7 48 20% 22% 9% 24% 26% 80% 87 (81%)06. Night 720P 6.6/300 9.2 64 24% -4% 2% 9% 18% 67% 112 (74%)07. Jets 720P 0.9/300 5.2 142 22% -42% -8% -4% 12% 59% 249 (75%)08. Harbour 480P 5.6/300 5.2 101 15% 21% 3% 9% 21% 56% 168 (65%)09. Crew 480P 3.1/300 33.5 133 21% -1% -10% -8% 11% 36% 215 (62%)10. Sailormen 480P 3.4/300 8.0 140 20% 1% -7% 4% 24% 54% 239 (71%)11. Night 480P 3.0/230 22.0 130 21% -4% -10% -5% 10% 32% 197 (52%)12. Mobile CIF 2.4/300 0.5 299 10% 5% -17% 1% 17% 22% 427 (43%)13. Football CIF 1.7/260 32.4 348 7% -14% -24% -21% 5% 5% 460 (32%)14. Bus CIF 1.1/150 8.3 344 5% -21% -34% -26% 10% -4% 477 (39%)

2010 Data Compression Conference

1068-0314/10 $26.00 © 2010 IEEE

DOI 10.1109/DCC.2010.57

553

Top Related