an analytical method to determine minimum per-layer...
TRANSCRIPT
An Analytical Method to Determine Minimum Per-Layer Precision of Deep
Neural Networks
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
Charbel Sakr, Naresh Shanbhag
Code: https://github.com/charbel-sakr/Precision-Analysis-of-Neural-Networks
Also found on sakr2.web.engr.Illinois.edu
Machine Learning ASICs
How are they choosing these precisions?
Why is it working?
Can it be determined analytically?
[Sze’16, ISSCC]
AlexNet accelerator
16b fixed-point
[Chen’15, ASPLOS]
ML accelerator
16b fixed-point
[Google’17, ISCA]
Tensorflow accelerator
8b fixed-point
Eyeriss PuDianNao TPU
Current Approaches• Stochastic Rounding during training [Gupta, ICML’15 – Hubara, NIPS’16]
→ Difficulty of training in a discrete space
• Trial-and-error approach [Sung, SiPS’14]
→ Exhaustive search is expensive
• SQNR based precision allocation [Lin, ICML’16]
→ Lack of precision/accuracy understanding
Fixed-point Quantization
No accuracy
vs. precision
understanding
International Conference on Machine Learning (ICML) - 2017
Precision in Neural Networks
[Sakr et al., ICML’17]
Classification Output Quantization
Mismatch Probability
Floating-point Network
Fixed-point Network
𝑝𝑚: “Mismatch Probability”
Second Order Bound on 𝒑𝒎
[Sakr et al., ICML’17]
• Input/Weight precision trade-off:
→ Optimal precision allocation by balancing the sum
• Data dependence (compute once and reuse)
→ Derivatives obtained in last step of backprop
→ Only one forward-backward pass needed
activation quantization noise gain weight quantization noise gain
Proof Sketch• For one input, when do we have a mismatch?
→ If FL network predicts label “𝑗”→ But FX network predicts label “𝑖” where 𝑖 ≠ 𝑗
→ This happens with some probability computed as follows:
→ But we already know
→ Whose variance is
→ Applying Chebyshev + LTP yields the result
(Output re-ordering due to quantization)
(Symmetry of
quantization noise)
[Sakr et al., ICML’17]
Tighter Bound on 𝒑𝒎
𝑀: Number of Classes
𝑆: Signal to quantization noise ratio
𝑃1 & 𝑃2: Correction factors
• Mismatch probability decreases double exponentially with precision
→ Theoretically stronger than Theorem 1
→ Unfortunately, less practical
[Sakr et al., ICML’17]
Per-layer Precision Assignment
• Per-layer precision allows for more
aggressive complexity reductions
• Search space is huge: 2L
dimensional grid
• Example: 5 layer network, precision
considered up to 16 bits → 1610 ≈
one million millions design points
• We need an analytical way to reduce
the search space → maybe using the
analysis of Sakr et al.
Layer 1
Layer L
(input features)(activations)
(weights)
key idea: equalization of
reflected quantization
noise variances
Fine-grained Precision Analysis
[Sakr et al., ICML’17]
[Sakr et al., ICASSP’18]
per-layer quantization noise gains
Per-layer Precision Assignment Method
equalization of quantization noise variances
Search space is reduced to only a one
dimensional axis of reference precision min
Step 1: compute least
quantization noise gain
Step 2: select precision
to equalize all
quantization noise
variances
Comparison with related works• Simplified but meaningful model of complexity
→ Computational cost
→ Total number of FAs used assuming folded MACs with bit growth allowed
→ Number of MACs is equal to the number of dot products computed
→ Number of FAs per MAC:
→ Representational cost
→ Total number of bits needed to represent weights and activations
→ High level measure of area and communications cost (data movement)
• Other works considered
→ Stochastic quantization (SQ) [Gupta’15, ICML]
→ 784 – 1000 – 1000 – 10 (MNIST)
→ 64C5 – MP2 – 64C5 – MP2 – 64FC – 10 (CIFAR10)
→ BinaryNet (BN) [Hubara’16, NIPS]
→ 784 – 2048 – 2048 – 2048 – 10 (MNIST) & VGG (CIFAR10)
Precision Profiles – CIFAR-10VGG-11 ConvNet on CIFAR-10:
32C3-32C3-MP2-64C3-64C3-MP2-128C3-128C3-256FC-256FC-10Precision chosen such that 𝑝𝑚 ≤ 1%
• Weight precision requirements greater [Sakr et al., ICML’17]
• Precision decreases with depth – more sensitivity to perturbations in
early layers [Raghu et al., ICML’17]
Precision – Accuracy Trade-offs
• Precision reduction is the greatest when using the proposed fine-
grained precision assignment
Precision – Accuracy – Complexity Trade-offs
• Best complexity reduction when the proposed precision assignment
• Complexity even much smaller than a BinaryNet for same accuracy due
to much higher network complexity
Conclusion & Future Work• Presented an analytical method to determine per-layer precision of neural
networks
• Method based on equalization of reflected quantization noise variances
• Per-layer precision assignment reveals interesting properties of neural
networks
• Method leads to significant overall complexity reduction, e.g., reduction of
minimum precision to 2 bits and lesser cost of implementation than
BinaryNets
• Future work:
• Trade-offs between precision vs. depth vs. width
• Trade-offs between precision and pruning
Thank you!
Acknowledgment:
This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six
SRC STARnet Centers, sponsored by MARCO and DARPA.