introduction to machine learning on fpgas...fpga ml workflow 21/11/2019 challenge: efficient mapping...

82
Introduction to machine learning on FPGAs Arthur Ruder ¦ Enclustra GmbH ¦ AI seminar EPFL Lausanne & ZHAW Winterthur ¦ 19 & 21/11/2019

Upload: others

Post on 13-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Introduction to machine learning on FPGAs

Arthur Ruder ¦ Enclustra GmbH ¦ AI seminar EPFL Lausanne & ZHAW Winterthur ¦ 19 & 21/11/2019

Page 2: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Quick reminder: neural network

21/11/2019

input layer:

e.g. pixelshidden layer 1

output layer:

e.g. probability

hidden layer 2

𝑤1

𝑤2

𝑤3

𝑥1

𝑥2

𝑥3

𝑎

𝑎𝑎

2

Page 3: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/20193

forward-propagation

Inputs: training set

• Goal: obtain trained weights

untrained network

back-propagation

Machine learning concepts: training phase

But: label says

100 % dog

Outputs: classification

probability

40 % dog,

60 % cat

Page 4: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/20194

forward-propagation

Inputs: e.g. photographsOutputs: classification

probability

99.07 % dog

0.93 % cat

trained network

Machine learning concepts: inference

Page 5: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

AlexNet VGG GoogleNet ResNet

2010 2011 2012 2013 2014 2014 2015

class

ific

ati

on

err

or

[%]

30

25

20

15

10

5

0

Quick reminder: Deep Learning

21/11/2019

Human error

shallow8 layers

19 layers

22 layers

152 layers

Image recognition challenge winner

5

Page 6: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Hardware platform

21/11/20196

What hardware do we need for this?

CPUs, GPUs, FPGAs, ASICs??

Page 7: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/20197

What hardware do we need for this?

CPUs, GPUs, FPGAs, ASICs??

• What are the requirements for…?

Hardware platform

Page 8: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

What hardware do we need for this?

CPUs, GPUs, FPGAs, ASICs??

• What are the requirements for…?

a) training

b) inference

8

Hardware platform

Page 9: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

What hardware do we need for this?

CPUs, GPUs, FPGAs, ASICs??

• What are the requirements for…?

a) training

b) inference

• What type of hardware is best suited for each task?

9

Hardware platform

Page 10: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Neural network training: computational complexity

21/11/2019

forward-propagation

back-propagation

Untrained neural network

ResNet50Result:

50 % cat

50 % dog

Label:

100% dog

For one picture: image classification

Labelled data

10

Page 11: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

forward-propagation

back-propagation

Untrained neural network

ResNet50

For one picture: image classification

7.7 billion operations

~35 MB parameter storage

Labelled data

11

Neural network training: computational complexity

Result:

50 % cat

50 % dog

Label:

100% dog

Page 12: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

23 billion operations

~380 MB parameter storage

forward-propagation

back-propagation

Untrained neural network

ResNet50

For one picture: image classification

7.7 billion operations

~35 MB parameter storage

Labelled data

12

Neural network training: computational complexity

Result:

50 % cat

50 % dog

Label:

100% dog

Page 13: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

23 billion operations

~380 MB parameter storage

forward-propagation

back-propagation

Untrained neural network

ResNet50

For one picture: image classification

7.7 billion operations

~35 MB parameter storage

* for forward propagation only, backward propagation similar

Labelled data

13

Neural network training: computational complexity

Result:

50 % cat

50 % dog

Label:

100% dog

Page 14: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

ResNet50

forward-propagation

back-propagation

23 billion operations

~380 MB for parameter storage

ImageNet: 1.2 Million

pictures

Result?

1 epoch: 1.2𝑀 ∗ 30.7𝐵 ≈ 37 ∗ 1015 operations (majority MAC)

7.7 billion operations

~35 MB parameter storage

For the whole training process:

14

Neural network training: computational complexity

Page 15: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

ResNet50

forward-propagation

back-propagation

23 billion operations

~380 MB for parameter storage

ImageNet: 1.2 Million

pictures

Result?

1 epoch: 1.2𝑀 ∗ 30.7𝐵 ≈ 37 ∗ 1015 operations (majority MAC)

ResNet50 needs 100 epochs for training…

7.7 billion operations

~35 MB parameter storage

For the whole training process:

15

Neural network training: computational complexity

Page 16: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements breakdown: training

21/11/201917

Page 17: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

• Typically not time-critical

18

Requirements breakdown: training

Page 18: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

• Typically not time-critical

• Compute billions of floating point calculations

19

Requirements breakdown: training

Page 19: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

• Typically not time-critical

• Compute billions of floating point calculations

• Handle large data sets (GBs to hundreds of GBs)

20

Requirements breakdown: training

Page 20: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

• Typically not time-critical

• Compute billions of floating point calculations

• Handle large data sets (GBs to hundreds of GBs)

• Flexibility to train a wide variety of neural networks

21

Requirements breakdown: training

Page 21: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

• Typically not time-critical

• Compute billions of floating point calculations

• Handle large data sets (GBs to hundreds of GBs)

• Flexibility to train a wide variety of neural networks

22

Clear answer (for now): GPUs do the heavy lifting

of neural network training

Requirements breakdown: training

Page 22: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/201923

Page 23: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/2019

• Edge requirements

• Cloud requirements

24

Page 24: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/2019

• Edge requirements

• Low (deterministic) latency (e.g. real-time object detection)

• Cloud requirements

25

Page 25: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/2019

• Edge requirements

• Low (deterministic) latency (e.g. real-time object detection)

• Power efficiency (limited battery capacity)

• Cloud requirements

26

Page 26: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/2019

• Edge requirements

• Low (deterministic) latency (e.g. real-time object detection)

• Power efficiency (limited battery capacity)

• Sensor fusion (e.g. industrial surveillance)

• Cloud requirements

27

Page 27: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/2019

• Edge requirements

• Low (deterministic) latency (e.g. real-time object detection)

• Power efficiency (limited battery capacity)

• Sensor fusion (e.g. industrial surveillance)

• Robustness (e.g. temperature)

• Cloud requirements

28

Page 28: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/2019

• Edge requirements

• Low (deterministic) latency (e.g. real-time object detection)

• Power efficiency (limited battery capacity)

• Sensor fusion (e.g. industrial surveillance)

• Robustness (e.g. temperature)

• Cloud requirements

• Low latency, e.g. search engines

29

Page 29: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements: inference

21/11/2019

• Edge requirements

• Low (deterministic) latency (e.g. real-time object detection)

• Power efficiency (limited battery capacity)

• Sensor fusion (e.g. industrial surveillance)

• Robustness (e.g. temperature)

• Cloud requirements

• Low latency, e.g. search engines

• Power efficiency (heat dissipation/cooling cost)

30

Page 30: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/201932

Resource requirements overview

Page 31: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

Image

Classification

33

Resource requirements overview

Page 32: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

Image

Classification

Object

Detection

34

Resource requirements overview

Page 33: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

Image

Classification

Object

Detection

Semantic

Segmentation

35

Resource requirements overview

Page 34: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

Image

Classification

Object

Detection

Semantic

SegmentationOCR

36

Resource requirements overview

Page 35: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

Image

Classification

Object

Detection

Semantic

Segmentation

Speech

RecognitionOCR

37

Resource requirements overview

Page 36: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

21/11/2019

Image

Classification

Object

Detection

Semantic

Segmentation

Speech

RecognitionOCR

Main takeaway points:

• Inference is challenging

• Huge variation in compute and memory

requirements (even within subgroups)

• Models typically don’t fit into local memory

38

Resource requirements overview

Page 37: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Inference Accelerator

Architectural challenges

21/11/2019

DMA

External memory

Buffer Compute Array

Partial Sums

Activation Functions, …

Weight Buffer

input result

39

Page 38: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Inference Accelerator

Architectural challenges

21/11/2019

DMA

External memory

Buffer Compute Array

Partial Sums

Activation Functions, …

Weight Buffer

input result

Huge amount of

computations

Memory bandwidth

Memory bandwidth

40

Page 39: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Performance & Power Efficiency

Fle

xib

ilit

y &

Ease

of

Use

Qualitative hardware comparison

21/11/201941

Page 40: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Performance & Power Efficiency

Fle

xib

ilit

y &

Ease

of

Use

21/11/201942

Qualitative hardware comparison

Page 41: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

Qualitative hardware comparison

21/11/201944

Page 42: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

Qualitative hardware comparison

21/11/201945

Page 43: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Qualitative hardware comparison

21/11/201946

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Page 44: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Qualitative hardware comparison

21/11/201947

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Page 45: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Qualitative hardware comparison

21/11/201948

Page 46: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Qualitative hardware comparison

21/11/201949

Page 47: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Qualitative hardware comparison

21/11/201950

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Page 48: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Qualitative hardware comparison

21/11/201951

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Page 49: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Qualitative hardware comparison

21/11/201952

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Page 50: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Qualitative hardware comparison

21/11/201953

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Page 51: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Qualitative hardware comparison

21/11/201954

Page 52: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Qualitative hardware comparison

21/11/201955

Page 53: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Qualitative hardware comparison

21/11/201956

Page 54: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Qualitative hardware comparison

21/11/201957

Page 55: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Ease-of-use

Qualitative hardware comparison

21/11/201958

Page 56: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Ease-of-use

Qualitative hardware comparison

21/11/201959

Page 57: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Ease-of-use

(Development) cost

Qualitative hardware comparison

21/11/201960

Page 58: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Ease-of-use

(Development) cost

Qualitative hardware comparison

21/11/201961

Page 59: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Ease-of-use

(Development) cost

Qualitative hardware comparison

21/11/201962

Page 60: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Ease-of-use

(Development) cost

Qualitative hardware comparison

21/11/201963

Page 61: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Requirements GPU FPGA ASIC

Low (deterministic) latency

High throughput

Power efficiency

Sensor fusion

Robustness

Programmability

Flexibility

Ease-of-use

(Development) cost

Qualitative hardware comparison

21/11/201964

Page 62: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

FPGA ML workflow

21/11/201965

Page 63: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

FPGA ML workflow

21/11/2019

Challenge: efficient mapping of floating point model to FPGA implementation

without losing accuracy

FP32

Trained network

Floating point model

66

Page 64: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

FPGA ML workflow

21/11/2019

Challenge: efficient mapping of floating point model to FPGA implementation

without losing accuracy

FP32

Trained network

Floating point model

Compression67

Page 65: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

FPGA ML workflow

21/11/2019

Challenge: efficient mapping of floating point model to FPGA implementation

without losing accuracy

FP32

Pruning

Pruned network

Trained network

Floating point model

Compression68

Page 66: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Quick digression

21/11/201969

Page 67: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

FPGA ML workflow

21/11/2019

Challenge: efficient mapping of floating point model to FPGA implementation

without losing accuracy

FP32

Pruning

Pruned network

Quantization

Trained network

Floating point model

Compression70

Page 68: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

FPGA ML workflow

21/11/2019

Challenge: efficient mapping of floating point model to FPGA implementation

without losing accuracy

FP32

Pruning

Pruned network

Quantization

Compilation

Trained network

Floating point model

Compression74

Page 69: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

FPGA ML workflow

21/11/2019

Challenge: efficient mapping of floating point model to FPGA implementation

without losing accuracy

FP32

Pruning

Pruned network

Quantization

Compilation

FPGA implementationTrained network

Floating point model

Compression

Fixed Point

75

Page 70: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Impact of compression

21/11/2019

https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf

76

Page 71: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Impact of compression

21/11/2019

https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf

77

Page 72: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Impact of compression

21/11/2019

https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf

Compression allows using significantly less resources when

deploying a neural network

with minimal impact on network accuracy78

Page 73: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Hardware implementation architectures

21/11/2019

• Streaming architecture

Memory CPU

CO

NV

FPGA

HO

ST

PO

OL

CO

NV

FC

80

Page 74: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Hardware implementation architectures

21/11/2019

• Streaming architecture • Single computation engine

NLCONV/FC POOL

MemoryCPU

HO

ST CONV LAYER

ACTIVATION

POOL

CONV LAYER

ACTIVATION

FC

DMAControl Unit

FP

GA

Memory CPU

CO

NV

FPGA

HO

ST

PO

OL

CO

NV

FC

81

Page 75: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Hardware implementation architectures

21/11/2019

• Streaming architecture • Single computation engine

NLCONV/FC POOL

MemoryCPU

HO

ST CONV LAYER

ACTIVATION

POOL

CONV LAYER

ACTIVATION

FC

DMAControl Unit

FP

GA

Memory CPU

CO

NV

FPGA

HO

ST

PO

OL

CO

NV

FC

Properties Streaming architecture Single computation engine

Customizability

Flexibility

Power efficiency

82

Page 76: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Toolchains for AI on FPGAs

21/11/2019

Provider

Edge Cloud

Computer vision Language processing Computer visionLanguage processing

Xilinx

DNNDK

(Deep Neural Network

Development Kit)

- ML (Machine Learning) Suite

Intel - - OpenVINO

Omnitek DPU (Deep Learning Processing Unit) + software framework

Lattice sensAI -

83

Page 77: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Toolchains for AI on FPGAs

21/11/2019

Provider

Edge Cloud

Computer vision Language processing Computer visionLanguage processing

Xilinx

DNNDK

(Deep Neural Network

Development Kit)

- ML (Machine Learning) Suite

Intel - - OpenVINO

Omnitek DPU (Deep Learning Processing Unit) + software framework

Lattice sensAI -

84

Page 78: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Summary

21/11/201985

Page 79: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Summary

21/11/2019

• Neural network inference is viable on FPGAs

• Low power (~mW – W)

• Sensor integration

• Flexibility

• Low deterministic latency

• Edge examples

86

Page 80: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Summary

21/11/2019

• Neural network inference is viable on FPGAs

• Low power (~mW – W)

• Sensor integration

• Flexibility

• Low deterministic latency

• Edge examples

Xnor.ai: solar powered

person detection

87

• Neural network inference is viable on FPGAs

• Low power (~mW – W)

• Sensor integration

• Flexibility

• Low deterministic latency

• Cloud examples

Page 81: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Summary

21/11/2019

• Neural network inference is viable on FPGAs

• Low power (~mW – W)

• Sensor integration

• Flexibility

• Low deterministic latency

• Edge examples

CERN: sensor data filtering

and classificationXnor.ai: solar powered

person detection

88

• Neural network inference is viable on FPGAs

• Low power (~mW – W)

• Sensor integration

• Flexibility

• Low deterministic latency

• Cloud examples

Page 82: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning

Summary

21/11/2019

• Neural network inference is viable on FPGAs

• Low power (~mW – W)

• Sensor integration

• Flexibility

• Low deterministic latency

• Edge examples

CERN: sensor data filtering

and classificationMicrosoft: Azure cloud AIXnor.ai: solar powered

person detection

89

• Neural network inference is viable on FPGAs

• Low power (~mW – W)

• Sensor integration

• Flexibility

• Low deterministic latency

• Cloud examples