phd forum...2020/01/01 · robust and energy-efficient deep learning systems muhammad abdullah...

Robust and Energy-Efficient

Deep Learning SystemsMuhammad Abdullah Hanif1 (Ph.D. Candidate), Muhammad Shafique2 (Advisor)

1Technische Universität Wien (TU Wien), Vienna, Austria2Division of Engineering, New York University Abu Dhabi (NYUAD), Abu Dhabi, UAE

PHD FORUM

Selected Publications[DAC’17] M. A. Hanif, R. Hafiz, O. Hasan, M. Shafique, “QuAd: Design and analysis of quality-area optimal

low-latency approximate adders”, Design Automation Conference (DAC), pp. 1–6, 2017.

Received a HiPEAC Paper Award.

[DATE’18] M. A. Hanif, R. Hafiz, M. Shafique, “Error resilience analysis for systematically employing approximate

computing in convolutional neural networks”, DATE, pp. 913–916, 2018..

[DAC’19] M. A. Hanif, F. Khalid, M. Shafique, “CANN: Curable approximations for high-performance deep neural

network accelerators”, DAC, pp. 1–6, 2019. Received a HiPEAC Paper Award.

[DAC’20] M. A. Hanif, R. Hafiz, O. Hasan, M. Shafique, “PEMACx: A probabilistic error analysis methodology for

adders with cascaded approximate units”, DAC, pp. 1–6, 2020. Received a HiPEAC Paper Award.

[DATE’21] M. A. Hanif, M. Shafique, “DNN-Life: An energy-efficient aging mitigation framework for improving

the lifetime of on-chip weight memories in deep neural network hardware architectures”, DATE,

pp. 1-6, 2021.

Overview of Our MethodologyProblems and Motivation

❑ CANN: Curable Approximations [DAC’19]❑ SalvageDNN: Fault-Aware Mapping [RSTA’20] ❑ HW/SW Approximations [JOLPE’18]

[JOLPE’18] M. A. Hanif, A. Marchisio, T. Arif, R. Hafiz, S. Rehman, M. Shafique, “X-DNNs: Systematic

cross-layer approximations for energy-efficient deep neural networks”, Journal of Low Power

Electronics (JOLPE), vol. 14, no. 4, pp. 520–534, 2018.

[IOLTS’18] M. A. Hanif, F. Khalid, R. V. W. Putra, S. Rehman, M. Shafique, “Robust machine learning

systems: Reliability and security for deep neural networks”, IEEE International Symposium on

On-Line Testing and Robust System Design (IOLTS), pp. 257–260, 2018.

[IOLTS’20] M. A. Hanif, M. Shafique, “Dependable deep learning: Towards cost-efficient resilience of deep

neural network accelerators against soft errors and perma-nent faults”, IOLTS, pp. 1–4, 2020.

[RSTA’20] M. A. Hanif, M. Shafique, “SalvageDNN: Salvaging deep neural network accelerators with

permanent faults through saliency-driven fault-aware map-ping,” Philosophical Transactions of

the Royal Society A (RSTA), vol. 378, no. 2164, pp. 20190164, 2020.

dblp link: https://dblp.org/pid/191/2451.html

Datasets

Hardware-Induced Reliability Threats

Soft Errors Permanent FaultsAging

DNN

Architecture/

Pre-trained

DNNs

DNN

Hardware

Architecture

Curable Approximations [DAC’19]

Error Resilience Analysis of

DNNs [DATE’18, IOLTS’18]

Analytical Methods for Error

Estimation of Approximate

Modules [DAC’17, DAC’20]

Design Space Exploration of

Approximate Modules[DAC’17, DAC’20]

Designer-Induced Faults for

Energy-Efficient DNN

Inference [JOLPE’18]

Pruning + Quantization

Functional Approximations

Fault-Aware

Mapping [RSTA’20]

Fault-Aware

Training

Range Restriction-

based Soft Error

Mitigation [IOLTS’20]

Software-level Techniques for

Mitigating Reliability Threats [IOLTS’20]

Fault-Aware

Pruning

Optimized

DNNs

Optimized

DNN

Hardware

Robust and Energy-

Efficient DNN Inference

Permanent-Fault

Mitigation Support [RSTA’20]

Aging Mitigation [DATE’21]

Soft Error Mitigation

Support

Hardware Modifications

for Mitigating Reliability

Threats [IOLTS’20]

❑ Applications

𝐴𝑐𝑐𝑢𝑟𝑎𝑡𝑒 𝑆𝑦𝑠𝑡𝑒𝑚Stage NStage 2

Accu.Module

…𝐼𝑛𝑝𝑢𝑡𝑂1 𝑂2

Accu.Module

Accu.Module

𝑂𝑁

Stage 1

𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑆𝑦𝑠𝑡𝑒𝑚Stage NStage 2Stage 1

Approx.Module

𝐼𝑛𝑝𝑢𝑡

≈ 𝑂1 ≈ 𝑂2Approx.Module

Approx.Module

𝑂𝑁 + 𝜖…

𝑃𝑟𝑜𝑝𝑜𝑠𝑒𝑑 𝑆𝑦𝑠𝑡𝑒𝑚Stage N+1

CuModule

𝑂𝑁

Stage 2

…

𝑂2 + 𝜖2

𝑓(𝜖2)

C&DAxModule

Stage N

𝑓(𝜖𝑁)

C&DAxModule

𝑂𝑁 + 𝜖𝑁

𝑓(𝜖1)

Stage 1

DAxModule

𝐼𝑛𝑝𝑢𝑡

𝑂1 + 𝜖1

Proposed Array

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

Cu Cu Cu Cu

Act

ivat

ion

s

Outputs

Weights

Conventional Array

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

Outputs

Weights

Act

ivat

ion

s

PE

PE Cu

LEGEND:PE with DAx MAC

PE with C&DAx MAC

Curing Module

PEPE with Accurate MAC

LEGEND:Un-complemented partial product bits

1’b1 Discarded bitsPartial sum bits

Complemented partial product bits

Error bit from previous stage

𝑓(𝜖) Error signal to the next stage

Accurate C&DAx

Proposed MAC Designs

Step 1

Step 4

Step 5

𝑓(𝜖)Addition Addition

Step 5

Step 1

Step 4

Addition

… …

Error bit from the previous

stage

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100Te

st A

ccura

cy

[%ag

e]

Model Size Reduction [%age]

a b c de

1. Structured Pruning (VGG11, Cifar10)

0

50

100

4 5 6 7 8 9 10Test

Acc

ura

cy

[%ag

e]

Bit Width

abcde

2. Uniform Quantization (VGG11, Cifar10)

Accuracy starts decreasing

at the same bit-width

0

50

100

Accurate Config. 1 Config. 2 Config. 3 Config. 4 Config. 5

Multiplier Configuration

Test

Acu

racy

[%

age]

abcde

3. Hardware Approximation (VGG11, Cifar10, 8-bit)

PDP [fJ] 88.40 83.66 77.71 70.60 70.90 67.40

MSE 0 0.3 9.8 266.3 3102.3 24806

Filter 1𝑊11 = −0.63 𝑊12 = 0.25

𝑊21 = −0.22 𝑊22 = 0.17

Filter 2𝑊11 = 0.25 𝑊12 = −0.41

𝑊21 = −0.43 𝑊22 = 0.37

Filter 3𝑊11 = 0.59 𝑊12 = −0.39

𝑊21 = 0.45 𝑊22 = 0.67

Filter 4𝑊11 = −0.71 𝑊12 = 0.55

𝑊21 = 0.34 𝑊22 = −0.66

Conventional Mapping of filters

onto a Faulty Systolic Array

-0.63 0.25 0.59 -0.71

0.25 -0.41 -0.39 0.55

-0.22 -0.43 0.45 0.34

0.17 0.37 0.67 -0.66

𝑊11

𝑊12

𝑊21

𝑊22

Fault-Aware Mapping

0.25 -0.63 -0.71 0.59

-0.41 0.25 0.55 -0.39

-0.43 -0.22 0.34 0.45

0.37 0.17 -0.66 0.67

𝑊11

𝑊12

𝑊21

𝑊22

Saliency Computation of

the weights of the DNN

Rearrangement of the

Parameters to avoid

extra run time overheads

Hardware

Modification

Dataflow/

Mapping

Scheme

DNN

Accelerator

Post fabrication testing

Fault Map of the

Hardware

Modified Network

For each layer Optimized DNN

AcceleratorMinimization of the Sum

of Saliency of weights

that must be pruned due

to faulty MACs

Deep Neural

Network

Techniques for Robust and Energy-Efficient Deep Neural Network Inference

Natural

Language

Processing

Image Classification,

Object Detection &

Localization, etc.

❑ Compute Cost [OPs]❑ AlexNet → 1.5B❑ VGGNet → 19.6B❑ Inception → 2B❑ ResNet-152 → 11B

❑ Design Methods to Improve Robustness and Energy-Efficiency of DNNs❑ Cost-effective techniques for mitigating reliability threats❑ HW/SW approximations for achieving high energy-efficiency

Man

ifesta

tio

nP

hys

ical

Level

Soft Errors Process VariationAging

Core1 Core2

Core9 Core10

Core17 Core18

Core25 Core26

Core33 Core34

Core41 Core42

Core49 Core50

Core57 Core58

Core3 Core4 Core5 Core6 Core7 Core8

Core11 Core12 Core13 Core14Core15 Core16






Core59 Core60 Core61 Core62 Core63 Core64

Faster Core

SlowerCore

DrainSource

p+

n – substrate

Gate

Oxide Layer

Vg= – Vdd

Si HTRA

P OH+

p+

n+ n+

P-Well

P-Substrate

Isolation

Gate

+-+-

+-+- +-

+- +- +-

+-

+-

[adap

ted fro

m K

. Kan

g et al., A

SP

-DA

C’0

8]

[adap

ted fro

m R

. Bau

man

n, T

DM

R’0

5]

[adap

ted fro

m B

. Rag

hu

nath

an et al., D

AT

E’1

3]

0000 0000

0000 1111

1111 1111

0000 1010

1000 0010

1100 1111

0111 0111

0100 0010

R0

R1

R2

R3 Time

Failu

rera

te

Wearout

0 10 20 30 40 50 60 70 80

Fre

qu

en

cy

(G

Hz) 7

6

5

4

3

2

7.3 GHz

5.7 GHz

Different Cores

Intel 80-core Processor

Bit Flips Aging → Wearout Performance Loss

[adap

ted fro

m S

. Dig

he

et al., ISS

CC

’10]

On-chip

Buffer

Accumulators

PE

Control

Unit PE PEProcessing Array

...

PE PE PE...

...

...

...

Off-Chip

Memory

Hardware

for DNN

Inference PE PE PE...

Expected

Output

Output

with Faults

Stop Sign

40 km/hDNN

Input

For VGG11 with

ImageNet dataset, our

SalvageDNN method

takes ≈2 mins on a

CPU while one epoch

of re-training takes

≈100 mins with a GPU

The method offers 1.5x

PDP reduction using the

proposed MAC designs

with no accuracy loss

[Imag

e Source: G

oogle Im

ages]
https://dblp.org/pid/191/2451.html

phd forum...2020/01/01 · robust and energy-efficient deep learning systems muhammad abdullah...

Documents