containerizing hardware accelerated applications

Post on 21-Jan-2018

471 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Containerizing Hardware Accelerated Applications

Chelsea Mafrica

Data Center Systems EngineerIntel Corporation

MotivationEvaluate performance impact of containers on a media stack that uses hardware acceleration

Agenda● Hardware accelerators, applications, and media

● Media stack

● When & how to use containers

● Experiment & results

● Portability

Hardware accelerators, applications, and mediaA hardware accelerator is a processor or fixed function specialized to perform specific tasks (excluding a general purpose CPU)

Examples: GPUs, FPGAs, ASICs

Applications that typically benefit from hardware acceleration are ones that can be parallelized

Examples: AI, machine learning, HPC, media

Media refers to video processing

Examples: Video compression and decompression (encode and decode), filters

KERNEL

SERVER

DRIVER

GPU

LIBS

APP APP

APP

USER SPACE

Media stack

Transcode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro Graphics

Media stack with Docker

Transcode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro GraphicsDocker

KERNEL

SERVER

DRIVER

GPU

LIBS

APP

USER SPACE

CONTAINER ENGINE

CONTAINER

Software-only app

ApplicationsLibraries & dependenciesDockerKERNEL

SERVER

LIBS

APP

USER SPACE

CONTAINER ENGINE

CONTAINER

KERNEL

SERVER

DRIVER

GPU

USER SPACE

CONTAINER ENGINE

CONTAINER Media stack with Docker

LIBS

APP APP

APPTranscode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro GraphicsDocker

KERNEL

SERVER

DRIVER

GPU

USER SPACE

Media stack with Docker

LIBS

APP

CONTAINER

APP

LIBS

CONTAINER

CONTAINER ENGINE

Transcode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro GraphicsDocker

• Kernel module installation• Custom kernel build

$ ls /dev/dricard0 card1 controlD64 controlD65 renderD128

Host requirements

FROM centos:7.2.1511MAINTAINER Chelsea Mafrica <chelsea.e.mafrica@intel.com>

COPY intel-linux-media_generic_16.5.1-59511_64bit.tar.gz sample_multi_transcode /root/RUN yum -y -t install mesa-dri-drivers && \yum clean all && \useradd user && \usermod -a -G wheel user && \usermod -a -G video user && \find /usr -name "libdrm*" | xargs rm -rf && \find /usr -name "libva*" | xargs rm -rf && \cd root && \tar -xvf intel-linux-media_generic_16.5.1-59511_64bit.tar.gz && \cp -r etc/* /etc && \cp -r lib/* /lib && \cp -r opt/* /opt && \cp -r usr/* /usr && \cp sample_multi_transcode /home/user && \chown user:user /home/user/sample_multi_transcode && \rm -rf *

WORKDIR /home/user

Dockerfile

docker build -t mss:centos.transcode .

docker run --device=/dev/dri/renderD128 \--volume=/home/user/volume/mss_content:/home/user/content \-i -d mss:centos.transcode bash

docker exec CONTAINER_ID su - user –c \"./sample_multi_transcode -i::h264 content/video_input.264 \-o::h264 content/video_output.264"

Building and running the container

ExperimentTest the number of transcodes that can run on a system before the average performance of a transcode drops below 30 frames per second

APPNAPP1 APPNAPP1 APPNAPP1

CONTAINER

HOSTHOST HOST

CONTAINER1 CONTAINERN

baseline single container case multiple container case

Observations● Variability in container startup time as the system reaches

capacity

● Running in detached mode, negligible change in performance

fram

es p

er s

econ

d

Legal Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. *Other names and brands may be claimed as the property of others. See backup for configuration details.

Transcode Performance

apps

real-time(30 fps)

Observations● Variability in container startup time as the system reaches

capacity

● Running in detached mode, negligible change in performance

● Portability is limited due to driver and hardware requirements

Media stack with Docker

KERNEL

SERVER

DRIVER

GPU

LIBS

APP

USER SPACE

CONTAINER ENGINE

CONTAINER

Transcode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro GraphicsDocker

Media stack with Docker

KERNEL

SERVER

DRIVER

GPU

LIBS

APP

USER SPACE

CONTAINER ENGINE

CONTAINER

Transcode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro GraphicsDocker

Media stack with Docker

KERNEL

SERVER

DRIVER

GPU

LIBS

APP

USER SPACE

CONTAINER ENGINE

CONTAINER

Transcode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro GraphicsDocker

Media stack with Docker

KERNEL

SERVER

DRIVER

GPU

LIBS

APP

USER SPACE

CONTAINER ENGINE

CONTAINER

Transcode applicationsIntel® Media Server StudioIntel® Quick Sync VideoIntel® Iris® Pro GraphicsDocker

Summary● Running accelerated apps in containers uses existing Docker capabilities

● The use of containers resulted in negligible performance difference for transcode apps in capacity test

● Containers are helpful for reducing conflicts with the host, but this isn’t specific to hardware accelerators

● Dependency on hardware and custom kernels limits portability of container, but the app will have better performance because of the hardware

Links & current workIntel® Media Server Studio: http://intel.ly/MediaServerStudio

Intel ® MediaSDK http://github.com/Intel-Media-SDK

Intel® OTC: http://github.com/vmmqa/dockerGpuStack

twitter: mafrica_chelsea.e.mafrica at intel dot com

Legal InformationTesting by Chelsea Mafrica, January 2017 – June 2017System Configuration:BASELINE: Intel® Xeon® CPU E3-1585L v5, 3.5GHz, 4 cores, turbo and HT on, BIOS AMI 1.0, 32GB total memory, 2 slots / 16GB / 2133MHz / DDR4 DIMM, 480GB total storage / 2 240GB SSDs (2.5”), Intel® I350 Gigabit Network Connection, CentOS Linux* 7.2.1511 kernel 3.10.0-327.13.1.x86_64, Media Server Studio 2017 R1NEW: Baseline configuration, Docker* 1.12.3

DisclaimerSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. *Other names and brands may be claimed as the property of others

top related