phd forum...2020/01/01 · robust and energy-efficient deep learning systems muhammad abdullah...
TRANSCRIPT
-
Robust and Energy-Efficient
Deep Learning SystemsMuhammad Abdullah Hanif1 (Ph.D. Candidate), Muhammad Shafique2 (Advisor)
1Technische Universität Wien (TU Wien), Vienna, Austria2Division of Engineering, New York University Abu Dhabi (NYUAD), Abu Dhabi, UAE
PHD FORUM
Selected Publications[DAC’17] M. A. Hanif, R. Hafiz, O. Hasan, M. Shafique, “QuAd: Design and analysis of quality-area optimal
low-latency approximate adders”, Design Automation Conference (DAC), pp. 1–6, 2017.
Received a HiPEAC Paper Award.
[DATE’18] M. A. Hanif, R. Hafiz, M. Shafique, “Error resilience analysis for systematically employing approximate
computing in convolutional neural networks”, DATE, pp. 913–916, 2018..
[DAC’19] M. A. Hanif, F. Khalid, M. Shafique, “CANN: Curable approximations for high-performance deep neural
network accelerators”, DAC, pp. 1–6, 2019. Received a HiPEAC Paper Award.
[DAC’20] M. A. Hanif, R. Hafiz, O. Hasan, M. Shafique, “PEMACx: A probabilistic error analysis methodology for
adders with cascaded approximate units”, DAC, pp. 1–6, 2020. Received a HiPEAC Paper Award.
[DATE’21] M. A. Hanif, M. Shafique, “DNN-Life: An energy-efficient aging mitigation framework for improving
the lifetime of on-chip weight memories in deep neural network hardware architectures”, DATE,
pp. 1-6, 2021.
Overview of Our MethodologyProblems and Motivation
❑ CANN: Curable Approximations [DAC’19]❑ SalvageDNN: Fault-Aware Mapping [RSTA’20] ❑ HW/SW Approximations [JOLPE’18]
[JOLPE’18] M. A. Hanif, A. Marchisio, T. Arif, R. Hafiz, S. Rehman, M. Shafique, “X-DNNs: Systematic
cross-layer approximations for energy-efficient deep neural networks”, Journal of Low Power
Electronics (JOLPE), vol. 14, no. 4, pp. 520–534, 2018.
[IOLTS’18] M. A. Hanif, F. Khalid, R. V. W. Putra, S. Rehman, M. Shafique, “Robust machine learning
systems: Reliability and security for deep neural networks”, IEEE International Symposium on
On-Line Testing and Robust System Design (IOLTS), pp. 257–260, 2018.
[IOLTS’20] M. A. Hanif, M. Shafique, “Dependable deep learning: Towards cost-efficient resilience of deep
neural network accelerators against soft errors and perma-nent faults”, IOLTS, pp. 1–4, 2020.
[RSTA’20] M. A. Hanif, M. Shafique, “SalvageDNN: Salvaging deep neural network accelerators with
permanent faults through saliency-driven fault-aware map-ping,” Philosophical Transactions of
the Royal Society A (RSTA), vol. 378, no. 2164, pp. 20190164, 2020.
dblp link: https://dblp.org/pid/191/2451.html
Datasets
Hardware-Induced Reliability Threats
Soft Errors Permanent FaultsAging
DNN
Architecture/
Pre-trained
DNNs
DNN
Hardware
Architecture
Curable Approximations [DAC’19]
Error Resilience Analysis of
DNNs [DATE’18, IOLTS’18]
Analytical Methods for Error
Estimation of Approximate
Modules [DAC’17, DAC’20]
Design Space Exploration of
Approximate Modules[DAC’17, DAC’20]
Designer-Induced Faults for
Energy-Efficient DNN
Inference [JOLPE’18]
Pruning + Quantization
Functional Approximations
Fault-Aware
Mapping [RSTA’20]
Fault-Aware
Training
Range Restriction-
based Soft Error
Mitigation [IOLTS’20]
Software-level Techniques for
Mitigating Reliability Threats [IOLTS’20]
Fault-Aware
Pruning
Optimized
DNNs
Optimized
DNN
Hardware
Robust and Energy-
Efficient DNN Inference
Permanent-Fault
Mitigation Support [RSTA’20]
Aging Mitigation [DATE’21]
Soft Error Mitigation
Support
Hardware Modifications
for Mitigating Reliability
Threats [IOLTS’20]
❑ Applications
𝐴𝑐𝑐𝑢𝑟𝑎𝑡𝑒 𝑆𝑦𝑠𝑡𝑒𝑚Stage NStage 2
Accu.Module
…𝐼𝑛𝑝𝑢𝑡𝑂1 𝑂2
Accu.Module
Accu.Module
𝑂𝑁
Stage 1
𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑆𝑦𝑠𝑡𝑒𝑚Stage NStage 2Stage 1
Approx.Module
𝐼𝑛𝑝𝑢𝑡
≈ 𝑂1 ≈ 𝑂2Approx.Module
Approx.Module
𝑂𝑁 + 𝜖…
𝑃𝑟𝑜𝑝𝑜𝑠𝑒𝑑 𝑆𝑦𝑠𝑡𝑒𝑚Stage N+1
CuModule
𝑂𝑁
Stage 2
…
𝑂2 + 𝜖2
𝑓(𝜖2)
C&DAxModule
Stage N
𝑓(𝜖𝑁)
C&DAxModule
𝑂𝑁 + 𝜖𝑁
𝑓(𝜖1)
Stage 1
DAxModule
𝐼𝑛𝑝𝑢𝑡
𝑂1 + 𝜖1
Proposed Array
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
Cu Cu Cu Cu
Act
ivat
ion
s
Outputs
Weights
Conventional Array
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
Outputs
Weights
Act
ivat
ion
s
PE
PE Cu
LEGEND:PE with DAx MAC
PE with C&DAx MAC
Curing Module
PEPE with Accurate MAC
LEGEND:Un-complemented partial product bits
1’b1 Discarded bitsPartial sum bits
Complemented partial product bits
Error bit from previous stage
𝑓(𝜖) Error signal to the next stage
Accurate C&DAx
Proposed MAC Designs
Step 1
Step 4
Step 5
𝑓(𝜖)Addition Addition
Step 5
Step 1
Step 4
Addition
… …
Error bit from the previous
stage
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100Te
st A
ccura
cy
[%ag
e]
Model Size Reduction [%age]
a b c de
1. Structured Pruning (VGG11, Cifar10)
0
50
100
4 5 6 7 8 9 10Test
Acc
ura
cy
[%ag
e]
Bit Width
abcde
2. Uniform Quantization (VGG11, Cifar10)
Accuracy starts decreasing
at the same bit-width
0
50
100
Accurate Config. 1 Config. 2 Config. 3 Config. 4 Config. 5
Multiplier Configuration
Test
Acu
racy
[%
age]
abcde
3. Hardware Approximation (VGG11, Cifar10, 8-bit)
PDP [fJ] 88.40 83.66 77.71 70.60 70.90 67.40
MSE 0 0.3 9.8 266.3 3102.3 24806
Filter 1𝑊11 = −0.63 𝑊12 = 0.25
𝑊21 = −0.22 𝑊22 = 0.17
Filter 2𝑊11 = 0.25 𝑊12 = −0.41
𝑊21 = −0.43 𝑊22 = 0.37
Filter 3𝑊11 = 0.59 𝑊12 = −0.39
𝑊21 = 0.45 𝑊22 = 0.67
Filter 4𝑊11 = −0.71 𝑊12 = 0.55
𝑊21 = 0.34 𝑊22 = −0.66
Conventional Mapping of filters
onto a Faulty Systolic Array
-0.63 0.25 0.59 -0.71
0.25 -0.41 -0.39 0.55
-0.22 -0.43 0.45 0.34
0.17 0.37 0.67 -0.66
𝑊11
𝑊12
𝑊21
𝑊22
Fault-Aware Mapping
0.25 -0.63 -0.71 0.59
-0.41 0.25 0.55 -0.39
-0.43 -0.22 0.34 0.45
0.37 0.17 -0.66 0.67
𝑊11
𝑊12
𝑊21
𝑊22
Saliency Computation of
the weights of the DNN
Rearrangement of the
Parameters to avoid
extra run time overheads
Hardware
Modification
Dataflow/
Mapping
Scheme
DNN
Accelerator
Post fabrication testing
Fault Map of the
Hardware
Modified Network
For each layer Optimized DNN
AcceleratorMinimization of the Sum
of Saliency of weights
that must be pruned due
to faulty MACs
Deep Neural
Network
Techniques for Robust and Energy-Efficient Deep Neural Network Inference
Natural
Language
Processing
Image Classification,
Object Detection &
Localization, etc.
❑ Compute Cost [OPs]❑ AlexNet → 1.5B❑ VGGNet → 19.6B❑ Inception → 2B❑ ResNet-152 → 11B
❑ Design Methods to Improve Robustness and Energy-Efficiency of DNNs❑ Cost-effective techniques for mitigating reliability threats❑ HW/SW approximations for achieving high energy-efficiency
Man
ifesta
tio
nP
hys
ical
Level
Soft Errors Process VariationAging
Core1 Core2
Core9 Core10
Core17 Core18
Core25 Core26
Core33 Core34
Core41 Core42
Core49 Core50
Core57 Core58
Core3 Core4 Core5 Core6 Core7 Core8
Core11 Core12 Core13 Core14Core15 Core16
Core19 Core20 Core21 Core22Core23 Core24
Core27 Core28 Core29 Core30Core31 Core32
Core35 Core36 Core37 Core38Core39 Core40
Core43 Core44 Core45 Core46Core47 Core48
Core51 Core52 Core53 Core54Core55 Core56
Core59 Core60 Core61 Core62 Core63 Core64
Faster Core
SlowerCore
DrainSource
p+
n – substrate
Gate
Oxide Layer
Vg= – Vdd
Si HTRA
P OH+
p+
n+ n+
P-Well
P-Substrate
Isolation
Gate
+-+-
+-+- +-
+- +- +-
+-
+-
[adap
ted fro
m K
. Kan
g et al., A
SP
-DA
C’0
8]
[adap
ted fro
m R
. Bau
man
n, T
DM
R’0
5]
[adap
ted fro
m B
. Rag
hu
nath
an et al., D
AT
E’1
3]
0000 0000
0000 1111
1111 1111
0000 1010
1000 0010
1100 1111
0111 0111
0100 0010
R0
R1
R2
R3 Time
Failu
rera
te
Wearout
0 10 20 30 40 50 60 70 80
Fre
qu
en
cy
(G
Hz) 7
6
5
4
3
2
7.3 GHz
5.7 GHz
Different Cores
Intel 80-core Processor
Bit Flips Aging → Wearout Performance Loss
[adap
ted fro
m S
. Dig
he
et al., ISS
CC
’10]
On-chip
Buffer
Accumulators
PE
Control
Unit PE PEProcessing Array
...
PE PE PE...
...
...
...
Off-Chip
Memory
Hardware
for DNN
Inference PE PE PE...
Expected
Output
Output
with Faults
Stop Sign
40 km/hDNN
Input
For VGG11 with
ImageNet dataset, our
SalvageDNN method
takes ≈2 mins on a
CPU while one epoch
of re-training takes
≈100 mins with a GPU
The method offers 1.5x
PDP reduction using the
proposed MAC designs
with no accuracy loss
[Imag
e Source: G
oogle Im
ages]
https://dblp.org/pid/191/2451.html