optimizing a parallel video encoder with message passing and a shared memory architecture

20
Optimizing a Parallel Video Encoder with Message Passing and a Shared Memory Architecture GU Junli SUN Yihe 1

Upload: akiko

Post on 22-Feb-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Optimizing a Parallel Video Encoder with Message Passing and a Shared Memory Architecture. GU Junli SUN Yihe. Outline. Introduction & Related work Parallel encoder implementation Test results and Analysis Conclusions. Introduction & Related work #1. Parallel processing Real time - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

1

Optimizing a Parallel Video Encoder with Message Passing and a Shared

Memory ArchitectureGU Junli

SUN Yihe

Page 2: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

2

Introduction & Related work Parallel encoder implementation Test results and Analysis Conclusions

Outline

Page 3: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

3

Parallel processing◦Real time

Parallel processing type◦Cluster[5], MPP[4]◦ Shared memory[6]

Introduction & Related work #1

Page 4: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

4

MPI (message passing interface)◦Communicate by passing message

Inefficient Shared memory◦ Share the same data space

Efficient

Introduction & Related work #2

Page 5: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

5

Most MPI codes adopt master-slave standard which has one master and couples of slaves to do different jobs.◦Workload imbalance◦Communication cost is high

On a typical shared memory CMP◦ Each code has a private L1 cache◦ Shared a large L2 cache

Introduction & Related work #3

Page 6: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

6

Balanced parallel scheme◦A strip-wise balanced parallel scheme

Parallel encoder implementation #1

Page 7: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

7

◦ Each process take one strip.◦ Each strip contains a number of slices

Sn = Frame_size/P◦ If Sn is not integer -> workload problem

Data dependency◦Message passing

Parallel encoder implementation #2

Page 8: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

8

Hybrid communication

Parallel encoder implementation #3

Page 9: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

9

Hybrid communication◦Combine MPI and shared memory

To reduce the communication cost◦ Ex. It takes 54.5ms to read a file and send the data to others

process by MPI but 9ms by shared memory. The memory allocation scheme has one global shared

memory area to store the original video data from where all processes read the original strip data.

Parallel encoder implementation #4

Page 10: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

10

Three dedicated memory spaces kept by each process including one for original data, a second for reconstructed data and the last for up-sampled data.

Parallel encoder implementation #5

Page 11: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

11

Environment◦ Two Intel Xeon E5310 @1.6 GHz processors, each with 4

cores. Test case◦HD, VGA, SD, CIF and QCIF

Version◦H264 JM10.2

Test results and Analysis #1

Page 12: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

12

Test results and Analysis #2

Page 13: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

13

25% higher speed improvement for the shared memory architecture as Compared to the case of cluster[5].

Test results and Analysis #3

Page 14: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

14

Test results and Analysis #4

Page 15: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

15

Test results and Analysis #5

0.2

Page 16: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

16

Test results and Analysis #6

Page 17: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

17

Test results and Analysis #7

Page 18: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

18

Test results and Analysis #8

Page 19: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

19

Test results and Analysis #9

Page 20: Optimizing a Parallel Video Encoder with Message Passing and  a  Shared Memory Architecture

20

Upgrading legacy MPI applications to the class of shared memory architectures can provide significant performance improvements.

Optimizing the communication mechanism and further enhancements to the hybrid shared-memory and message-passing multi-core processor design can be expected to raise performance to still higher levels.

Conclusion #1