shaping open source compute ecosystem with neo and compute ... · shaping open source compute...

1
International Workshop on OpenCL 14-16 May 2018 Open Source Software Stack Out of Order Queue accelerates Neural Networks Shaping open source compute ecosystem with neo and compute libraries Michal Mrozek, Intel® Corp. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. IWOCL '18, May 14–16, 2018, Oxford, United Kingdom © 2018 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-6439-3/18/05. https://doi.org/10.1145/3204919.3207895 NEO Compute Runtime Application GPU Linux Kernel Mode Driver Graphics Compiler Graphics Memory Management Compute Libraries Driver N:1 architecture enables efficient aggregation All software components are available in open source. Driver releases every week with validated package. Everything works on open source kernels without any custom patches. Single unified driver for all BDW+ Intel® GPU devices. New Enhanced OpenCL™ driver. Open Source We are open for feedback and contributions! https:// github.com/intel/compute-runtime 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 GoogleNetb1 GoogleNetb2 GoogleNet2b1 GoogleNet2b2 SSDGoogleNet2b8 InceptionResnetV2b1 Gain with Out Of Order Queue Iris Pro Graphics 580 In Order Queue Out Of Order Queue Inception2 output Inception3a_1x1 Inception3a_3x3 reduction Inception3a_5x5 reduction Inception3a_pool Inception3a_3x3 Inception3a_5x5 Inception3a pool_proj Inception3a output Some topologies contain steps consisting of multiple independent compute pieces. For small batches those kernels may underutilize GPU if they are executed sequentially. clDNN utilizes one Out Of Order Queue and groups independent kernels together. clEnqueueBarrierWithWaitlist with no events ensures that all previous commands are completed. With Out Of Order Queue driver can remove synchronization points between kernels and GPU can execute them together in one shot. That gives performance boosts up to 40%. Unleash the power & performance of Intel® Graphics with Compute Libraries and NEO driver! Compute Library for Deep Neural Networks ( clDNN) Intel® Compute Libraries for GPU DL Deployment Toolkit / CV SDK clDNN GPU NEO Compute Runtime Inference Engine Model Optimizer 2 Developers use clDNN directly Models MX-Net Caffe Tensor- Flow Developers use DL Toolkit 1 clDNN contains kernels-primitives optimized for DNN inference acceleration on Intel® SKL+ GPU devices. It supports most of commonly known latest neural network topologies. Delivered with Intel® Deep Learning (DL) Deployment Toolkit (DT) which supports Caffe, Tensor-Flow and MX-Net models. Intel® DL DT contains Model Optimizer, which converts and optimizes trained models. Inference Engine executes those models on GPU using clDNN. Extensible repository- framework for compute functions. First delivered part is Basic Linear Algebra Subprograms (BLAS) Level 1, 2, 3 single and complex FP32. This is a prototype with ongoing development. Work on other libraries is currently in progress (i.e. FFT, SPARSE). BLAS Executive Framework OpenCL™ adapter NEO Compute Runtime GPU In Order Queue 1 In Order Queue 2 Out Of Order Queue 3 Cmd1 Cmd4 Cmd6 Cmd7 Cmd2 Cmd3 Cmd8 Cmd9 Cmd1 Cmd2 Cmd3 Cmd4 Cmd7 Cmd6 Cmd10 Cmd8 Cmd9 Cmd10 Cmd5 Cmd5 Aggregated Command Buffer Create at least one out of order queue to enable aggregation mechanism, utilize it to submit tasks without dependencies. Prefer round-robin enqueue sequence. Avoid user events. FFT SPARSE Primitive Database Would you like to know MORE? https://github.com/intel/clDNN Feel free to communicate with us: https://github.com/intel/clGPU Check out our samples: https:// github.com/intel/compute-samples

Upload: others

Post on 15-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shaping open source compute ecosystem with neo and compute ... · Shaping open source compute ecosystem with neo and compute libraries Michal Mrozek, Intel® Corp. Permission to make

International Workshop on OpenCL14-16 May 2018

Open Source Software Stack

Out of Order Queue accelerates Neural Networks

Shaping open source compute ecosystem with neo and compute librariesMichal Mrozek, Intel® Corp.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

IWOCL '18, May 14–16, 2018, Oxford, United Kingdom© 2018 Copyright is held by the owner/author(s).ACM ISBN 978-1-4503-6439-3/18/05.https://doi.org/10.1145/3204919.3207895

NEOCompute Runtime

Application

GPU

Linux Kernel Mode Driver

GraphicsCompiler

GraphicsMemory

Management

Compute Libraries

Driver N:1 architecture enables efficient aggregation

• All software components are available in open source.

• Driver releases every week with validated package.

• Everything works on open source kernels without any custom patches.

• Single unified driver for all BDW+ Intel® GPU devices.

• New Enhanced OpenCL™ driver.

OpenSource

We are open for feedback and contributions!

https://github.com/intel/compute-runtime

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

GoogleNetb1 GoogleNetb2 GoogleNet2b1 GoogleNet2b2 SSDGoogleNet2b8 InceptionResnetV2b1

Gain with Out Of Order QueueIris Pro Graphics 580

In Order Queue Out Of Order Queue

Inception2output

Inception3a_1x1Inception3a_3x3

reductionInception3a_5x5

reductionInception3a_pool

Inception3a_3x3 Inception3a_5x5Inception3apool_proj

Inception3aoutput

• Some topologies contain steps consisting of multiple independent compute pieces.

• For small batches those kernels may underutilize GPU if they are executed sequentially.

• clDNN utilizes one Out Of Order Queue and groups independent kernels together.

• clEnqueueBarrierWithWaitlist with no events ensures that all previous commands are completed.

• With Out Of Order Queue driver can remove synchronization points between kernels and GPU can execute them together in one shot.

• That gives performance boosts up to 40%.

Unleash the power & performance of Intel® Graphics with Compute Libraries and NEO driver!

Compute Library for Deep Neural Networks (clDNN) Intel® Compute Libraries for GPU

DL Deployment Toolkit / CV SDK

clDNN

GPU

NEO Compute Runtime

Inference Engine

Model Optimizer

2 Developers use clDNN directly

Models

MX-Net CaffeTensor-

Flow

Developers use DL Toolkit1

• clDNN contains kernels-primitives optimized for DNN inference acceleration on Intel® SKL+ GPU devices.

• It supports most of commonly known latest neural network topologies.

• Delivered with Intel® Deep Learning (DL) Deployment Toolkit (DT) which supports Caffe, Tensor-Flow and MX-Net models.

• Intel® DL DT contains Model Optimizer, which converts and optimizes trained models. Inference Engine executes those models on GPU using clDNN.

• Extensible repository-framework for compute functions.

• First delivered part is Basic Linear Algebra Subprograms (BLAS) Level 1, 2, 3 single and complex FP32.

• This is a prototype with ongoing development.

• Work on other libraries is currently in progress (i.e. FFT, SPARSE).

BLAS

Executive Framework

OpenCL™ adapter

NEO

Compute

Runtime

GPU

In Order Queue 1

In Order Queue 2

Out Of Order Queue 3

Cmd1

Cmd4

Cmd6 Cmd7

Cmd2 Cmd3

Cmd8

Cmd9

Cmd1 Cmd2 Cmd3

Cmd4

Cmd7Cmd6

Cmd10 Cmd8 Cmd9 Cmd10

Cmd5 Cmd5

Aggregated Command Buffer

• Create at least one out of order queue to enable aggregation mechanism, utilize it to submit tasks without dependencies.

• Prefer round-robin enqueue sequence.

• Avoid user events.

FFT SPARSE

PrimitiveDatabase

Would you like to know MORE?https://github.com/intel/clDNN

Feel free to communicate with us:https://github.com/intel/clGPU

Check out our samples:https://github.com/intel/compute-samples