(based on andres rodriguez’ course “deep learning 101 ... · 3 deep learning • a branch of...
TRANSCRIPT
![Page 1: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/1.jpg)
Prepared for Intel Delta 7 series.
(based on Andres Rodriguez’ course “Deep Learning 101” https://software.intel.com/en-us/ai/academy )
![Page 2: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/2.jpg)
2
• Deep learning overview and usages
• Intel optimized deep learning environment
• hardware, software, tools
• Training models
• Getting started resources
Content Outline
![Page 3: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/3.jpg)
3
Deep Learning
• A branch of machine learning
• Data is passed through multiple non-linear transformations
• Objective: Learn the parameters of the transformation that minimize a cost function
![Page 4: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/4.jpg)
4
Types of Deep Learning
• Supervised learning
• Data -> Labels
• Unsupervised learning
• No labels; Clustering; Reducing dimensionality
• Reinforcement learning
• Reward actions (e.g., robotics)
![Page 5: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/5.jpg)
5
Types of Deep Learning
• Supervised learning
• Data -> Labels
• Unsupervised learning
• No labels; Clustering; Reducing dimensionality
• Reinforcement learning
• Reward actions (e.g., robotics)
![Page 6: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/6.jpg)
6
Step 1: Training (Over Hours/Days/Weeks)
Supervised Learning
Person
90% person 8% traffic light
Input data
Output Classification
Create Deep network
Step 2: Inference (Real Time)
New input from camera and
sensors
Output Classification
Trained neural network model
97% person
Trained Model
![Page 7: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/7.jpg)
Bigger Data Better Hardware Smarter Algorithms
7
Why Now?
Image: 1000 KB / picture
Audio: 5000 KB / song
Video: 5,000,000 KB / movie
Transistor density doubles every 18 months
Cost / GB in 1995: $1000.00
Cost / GB in 2015: $0.03
Advances in algorithm innovation, including neural networks, leading to better accuracy in training models
![Page 8: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/8.jpg)
Why machine learning?
• The explosive growth of smart and connected devices. – agriculture, retail, industrial and automated driving (by 2020 all cars sold in USA will
have automated control regime )
• computer vision, using technology like Intel® RealSense™,
• Exponential growth of data generated: 1.5GB/day/human and upto 4 TB/day/device by 2020
• Cloud services running on customer data : Amazon, Google, Picasa etc etc.
8 Intel Confidential
![Page 9: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/9.jpg)
Classification
Label the image
• Person
• Motorcyclist
• Bike https://people.eecs.berkeley.edu/~jhoffman/talks/lsda-baylearn2014.pdf
9
![Page 10: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/10.jpg)
Detection
Detect and
label objects
https://people.eecs.berkeley.edu/~jhoffman/talks/lsda-baylearn2014.pdf
10
![Page 11: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/11.jpg)
Semantic Segmentation
Label every pixel
https://people.eecs.berkeley.edu/~jhoffman/talks/lsda-baylearn2014.pdf
11
![Page 12: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/12.jpg)
Natural Language Object Retrieval
http://arxiv.org/pdf/1511.04164v3.pdf
12
![Page 13: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/13.jpg)
Visual and Textual Question Answering
https://arxiv.org/pdf/1603.01417v1.pdf
13
![Page 14: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/14.jpg)
Visuomotor Control
https://arxiv.org/pdf/1504.00702v5.pdf
…
…
14
![Page 15: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/15.jpg)
Speech Recognition
The same architecture is used for English and Mandarin Chinese speech recognition
http://svail.github.io/mandarin/
15
![Page 16: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/16.jpg)
Q&A Natural Language Understanding
https://arxiv.org/pdf/1506.07285.pdf
16
![Page 17: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/17.jpg)
Personal Assistant
17
![Page 18: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/18.jpg)
18
Deep Learning Use Cases
Cloud Service Providers
Financial Services
Healthcare
Automotive
• Personal assistant
• Image & Video recognition/tagging
• Natural language processing
• Automatic Speech recognition
• Targeted Ads
• Fraud / face detection
• Gaming, check processing
• Computer server monitoring
• Financial forecasting and prediction
• Network intrusion detection
• Recommender Systems
![Page 19: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/19.jpg)
19
Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver best single node and multi-node performance
Accelerate design, training, and deployment
Drive optimizations across open source deep learning frameworks
Intel® Deep Learning SDK
Intel® Omni-Path Architecture (Intel® OPA)
Maximum performance on Intel architecture Intel® Math Kernel
Library (Intel® MKL)
+
Training Inference
Intel® MKL-DNN
+
![Page 20: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/20.jpg)
20
Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver best single node and multi-node performance
Accelerate design, training, and deployment
Drive optimizations across open source machine learning frameworks
Intel® Deep Learning SDK
Maximum performance on Intel architecture Intel® Math Kernel
Library (Intel® MKL)
Training
Intel® MKL-DNN
Inference
+
Intel® Omni-Path Architecture (Intel® OPA)
+
![Page 21: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/21.jpg)
21
• Knights Landing
• Up to >6 peak SP TFLOPs per socket
• Binary-compatible with Intel® Xeon® processors
• Up to 72 cores, 512-bit SIMD vectors with 2 VPU/core
• Integrated memory delivers superior bandwidth
• Integrated Intel® Omni-Path fabric (dual-port; 50 Gb/s )
• Distributes the training workload
Intel® Xeon Phi™ + Omni-Path Deep Learning Training Scalable System
48p Sw
48p Sw
48p Sw
48p Sw
48p Sw
4x KNL
2
48p Sw
L1
L2
“12-12"
48p Sw
4x KNL
4x KNL
0 1 11
48p Sw
12 13
48p Sw
*
. .
48p Sw
“4-4-4-4-4-4"
4x KNL
5
4x KNL
4x KNL
3 4
4x KNL
4x KNL
4x KNL
4x KNL
4x KNL
4x KNL
14 15 16
Storage
server
“2-2"
![Page 22: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/22.jpg)
22
2013 2016
1ST GENERATION
XEON PHI
2ND GENERATION
XEON PHI
2017
Sin
gle
-Pre
cisi
on
Te
rafl
op
s
Knights Mill
Optimized for Deep Learning
Optimized for scale-out
Flexible, high capacity memory
Enhanced variable precision
Improved efficiency
Knights Mill: Next Gen Intel® Xeon Phi™ COMING 2017
![Page 23: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/23.jpg)
23
Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver best single node and multi-node performance
Accelerate design, training, and deployment
Drive optimizations across open source machine learning frameworks
Intel® Deep Learning SDK
Intel® Omni-Path Architecture (Intel® OPA)
Maximum performance on Intel architecture Intel® Math Kernel
Library (Intel® MKL)
+
Training Inference
Intel® MKL-DNN
+
![Page 24: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/24.jpg)
Diversity in Deep Networks
VVariety in Network Topology
Recurrent NNs common for NLP/ASR, DAG for GoogLeNet, Networks with memory…
BBut there are a few well defined building blocks
Convolutions common for image recognition tasks
GEMMs for recurrent network layers—could be sparse
ReLU, tanh, softmax
GoogLeNet
Recurrent NN
CNN - AlexNet
24
![Page 25: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/25.jpg)
25
![Page 26: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/26.jpg)
26
Optimized Deep Learning Environment
Intel® Omni-Path Architecture (Intel® OPA)
Intel® Math Kernel Library (Intel® MKL)
+
Intel® MKL-DNN
Intel® MKL-DNN – free open source DNN functions designed for max Intel HW performance and high-velocity integration with DL frameworks
• Open source DNN functions included in MKL 2017
• IA optimizations contributed by community
• Binary GEMM functions
• Apache* 2 license
Targeted release: Q4’ 2016 (APIs and preview Q3’ 2016)
+
![Page 27: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/27.jpg)
Intel® Math Kernel Library (Intel® MKL)
• Optimized AVX-2 and AVX-512 instructions
• Intel® Xeon® and Intel® Xeon Phi™ processors
• Supports all common layers types
• Coming soon: Winograd-based convolutions
27
![Page 28: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/28.jpg)
Naïve Convolution
https://en.wikipedia.org/wiki/Convolutional_neural_network
28
![Page 29: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/29.jpg)
Cache Friendly Convolution
arxiv.org/pdf/1602.06709v1.pdf
29
![Page 30: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/30.jpg)
30
Recent MKL SGEMM Improvements
• Up to 4× performance improvements for small sizes
• New APIs to eliminate data packing overheads if A or B matrices are re-used
• Pack once and use multiple times
• Up to 1.2× additional performance improvements over SGEMM
• Available in Intel MKL 2017 (released: September 6, 2016)
![Page 31: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/31.jpg)
31
Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver best single node and multi-node performance
Accelerate design, training, and deployment
Drive optimizations across open source machine learning frameworks
Intel® Deep Learning SDK
Intel® Omni-Path Architecture (Intel® OPA)
Maximum performance on Intel architecture Intel® Math Kernel
Library (Intel® MKL)
Intel® Data Analytics Acceleration Library
(Intel® DAAL)
+
Training Inference
Intel® MKL-DNN
Drive optimizations across open source machine learning frameworks
+
![Page 32: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/32.jpg)
Deep Learning Tools
Programming languages
Top Frameworks
Caffe
C/C++
32
MLlib
![Page 33: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/33.jpg)
INTEL OPTIMIZED Caffe
•All the goodness of BVLC Caffe* + Integrated with Intel® MKL 2017 Multi-node distributed training
Forrest Iandola, et al., “Scaling DNN Training on Intel Platforms.” 2016
33
![Page 34: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/34.jpg)
34
Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver best single node and multi-node performance
Accelerate design, training, and deployment
Drive optimizations across open source machine learning frameworks
Intel® Deep Learning SDK
Intel® Omni-Path Architecture (Intel® OPA)
Maximum performance on Intel architecture Intel® Math Kernel
Library (Intel® MKL)
Intel® Data Analytics Acceleration Library
(Intel® DAAL)
+
Training Inference
Intel® MKL-DNN
Drive optimizations across open source machine learning frameworks
![Page 35: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/35.jpg)
35 Real-time graphs to see how the accuracy rate is trending
![Page 36: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/36.jpg)
36
High-Level Workflow
.prototxt
.caffemodel
Trained Model
Model Optimizer
FP Quantize
Model Compress
End-Point
SW Developer
Import
Inference Run-Time
OpenVX
Application Logic
Forward Result
Real-time Data Validation Data
Model Analysis
MKL-DNN
Deploy-ready model
INFERENCE ON THE END POINT
![Page 37: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/37.jpg)
37
Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver best single node and multi-node performance
Accelerate design, training, and deployment
Drive optimizations across open source machine learning frameworks
Intel® Deep Learning SDK
Intel® Omni-Path Architecture (Intel® OPA)
Maximum performance on Intel architecture Intel® Math Kernel
Library (Intel® MKL)
+
Training Inference
Intel® MKL-DNN
+
![Page 38: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/38.jpg)
38
Training
• Gradient descent and variants
• Batch sizes
• Distributed training
• Challenges
![Page 39: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/39.jpg)
Gradient Descent 𝐽 𝒘(0) = 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(0)
![Page 40: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/40.jpg)
Gradient Descent 𝐽 𝒘(0) = 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(0)
𝑑𝐽 𝒘(0)
𝑑𝒘
![Page 41: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/41.jpg)
Gradient Descent 𝐽 𝒘(0) = 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(0)
𝒘(1) = 𝒘(0) − 𝑑𝐽 𝒘(0)
𝑑𝒘
![Page 42: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/42.jpg)
Gradient Descent 𝐽 𝒘(0) = 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(0)
𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)
𝑑𝒘
learning rate
![Page 43: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/43.jpg)
Gradient Descent 𝐽 𝒘(0) = 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(0)
𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)
𝑑𝒘
𝒘(1)
too small
![Page 44: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/44.jpg)
Gradient Descent 𝐽 𝒘(0) = 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(0)
𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)
𝑑𝒘
𝒘(1)
too large
![Page 45: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/45.jpg)
Gradient Descent 𝐽 𝒘(0) = 𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(0)
𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)
𝑑𝒘
𝒘(1)
good enough
![Page 46: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/46.jpg)
Gradient Descent 𝐽 𝒘(1) = 𝑐𝑜𝑠𝑡(𝒘(1), 𝒙𝑖)
𝑁
𝑖=1
𝒘 𝒘(2)
𝒘(2) = 𝒘(1) − 𝛼𝑑𝐽 𝒘(1)
𝑑𝒘
𝒘(1)
![Page 47: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/47.jpg)
Gradient Descent 𝐽 𝒘(2) = 𝑐𝑜𝑠𝑡(𝒘(2), 𝒙𝑖)
𝑁
𝑖=1
𝒘
𝒘(3) = 𝒘(2) − 𝛼𝑑𝐽 𝒘(2)
𝑑𝒘
𝒘(2)
𝒘(3)
![Page 48: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/48.jpg)
Gradient Descent 𝐽 𝒘(3) = 𝑐𝑜𝑠𝑡(𝒘(3), 𝒙𝑖)
𝑁
𝑖=1
𝒘
𝒘(4) = 𝒘(3) − 𝛼𝑑𝐽 𝒘(3)
𝑑𝒘
𝒘(4)
𝒘(3)
![Page 49: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/49.jpg)
Gradient Descent
𝒘
Saddle Point
![Page 50: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/50.jpg)
Gradient Descent
𝒘
Saddle Point
![Page 51: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/51.jpg)
Gradient Descent
𝒘
Saddle Point
![Page 52: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/52.jpg)
Gradient Descent
𝒘
Saddle Point
![Page 53: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/53.jpg)
Gradient Descent
𝒘
Saddle Point
![Page 54: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/54.jpg)
Gradient Descent
𝒘
Saddle Point
![Page 55: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/55.jpg)
Gradient Descent
𝒘
Saddle Point
![Page 56: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/56.jpg)
Gradient Descent
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
![Page 57: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/57.jpg)
Gradient Descent
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
![Page 58: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/58.jpg)
Gradient Descent
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
![Page 59: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/59.jpg)
Gradient Descent
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
![Page 60: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/60.jpg)
Gradient Descent
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
![Page 61: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/61.jpg)
Gradient Descent
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
![Page 62: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/62.jpg)
Gradient Descent
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
![Page 63: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/63.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 64: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/64.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 65: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/65.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 66: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/66.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 67: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/67.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 68: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/68.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 69: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/69.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 70: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/70.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 71: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/71.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 72: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/72.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 73: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/73.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 74: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/74.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 75: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/75.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 76: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/76.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 77: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/77.jpg)
Stochastic Gradient Descent (SGD)
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
![Page 78: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/78.jpg)
Stochastic Gradient Descent (SGD) + Momentum
𝐽 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑁
𝑖=1
𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮ 𝑀 = 𝑁/(𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒)
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
𝒘(𝑡+1) = 𝒘(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
(𝑡)
𝑑𝒘
SGD SGD + Momentum
![Page 79: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/79.jpg)
Momentum
79
gradient
velocity 𝒘
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 80: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/80.jpg)
Momentum
80
gradient
velocity 𝒘
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 81: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/81.jpg)
Momentum
81
gradient
velocity 𝒘
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 82: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/82.jpg)
Momentum
82
gradient
velocity 𝒘
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 83: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/83.jpg)
Momentum
83
gradient
velocity 𝒘
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 84: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/84.jpg)
Momentum
84
gradient
velocity 𝒘
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 85: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/85.jpg)
Batch Size 𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑇𝑖𝑚𝑒 𝑇𝑜 𝑇𝑟𝑎𝑖𝑛 (𝑇𝑇𝑇)
𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒
![Page 86: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/86.jpg)
Batch Size 𝐽𝑏1 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏1
𝐽𝑏𝑀 𝒘 = 𝑐𝑜𝑠𝑡(𝒘, 𝒙𝑖)
𝑖∈𝑏𝑀
⋮
𝑇𝑖𝑚𝑒 𝑇𝑜 𝑇𝑟𝑎𝑖𝑛 (𝑇𝑇𝑇)
𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒
sweet spot
![Page 87: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/87.jpg)
Overfitting
• Larger networks have a lot of weights
• The network can memorize the weights and do excellent on the training data and very poor on the validation data
• The networks does not generalize
![Page 88: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/88.jpg)
Overfitting
• Larger networks have a lot of weights
• The network can memorize the weights and do excellent on the training data and very poor on the validation data
• The networks does not generalize
![Page 89: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/89.jpg)
Overfitting
• Larger networks have a lot of weights
• The network can memorize the weights and do excellent on the training data and very poor on the validation data
• The networks does not generalize
![Page 90: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/90.jpg)
Overfitting
• Larger networks have a lot of weights
• The network can memorize the weights and do excellent on the training data and very poor on the validation data
• The networks does not generalize
![Page 91: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/91.jpg)
Overfitting
• Larger networks have a lot of weights
• The network can memorize the weights and do excellent on the training data and very poor on the validation data
• The networks does not generalize
![Page 92: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/92.jpg)
Overfitting -- Solutions
• More training data
• Stop training when validation performance gets worse
• Penalize large weights – weight_decay
• Randomly ignored some weights in fully connected layers – dropout_ratio
• Training with single precision floating points
92
![Page 93: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/93.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
93
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 94: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/94.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
94
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 95: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/95.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
95
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 96: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/96.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
96
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
![Page 97: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/97.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
97
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
1 epoch = 1 cycle through all training samples
![Page 98: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/98.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
98
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
1 epoch = 1 cycle through all training samples
=10M iter ∗ 32 imgs/iter
1.28M imgs/epoch = 250 epochs
![Page 99: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/99.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
99
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 100: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/100.jpg)
Caffe solver.prototxt
net: "models/bvlc_googlenet/train_val.prototxt"
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
100
https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/solver.prototxt
batch_size: 32
𝒗(𝑡+1) = 𝜇𝒗(𝑡) − 𝛼𝑑𝐽𝑏𝑚 𝒘
𝑡
𝑑𝒘
𝒘(𝑡+1) = 𝒘(𝑡) + 𝒗(𝑡+1)
![Page 101: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/101.jpg)
Caffe solver.prototxt cont’…
test_iter: 1000
test_interval: 4000
snapshot: 40000
snapshot_prefix: "models/bvlc_googlenet/bvlc_googlenet“
solver_mode: CPU
101
![Page 102: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/102.jpg)
Caffe solver.prototxt cont’…
test_iter: 1000
test_interval: 4000
snapshot: 40000
snapshot_prefix: "models/bvlc_googlenet/bvlc_googlenet“
solver_mode: CPU
102
![Page 103: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/103.jpg)
Caffe solver.prototxt cont’…
test_iter: 1000
test_interval: 4000
snapshot: 40000
snapshot_prefix: "models/bvlc_googlenet/bvlc_googlenet“
solver_mode: CPU
103
![Page 104: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/104.jpg)
Choosing hyperparams
• Experience. Experience. Experience
• Look through examples and models and practice modifying them
104
![Page 105: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/105.jpg)
105
Multi-node Distributed training
Model Parallelism
• Break the model into N nodes
• The same data is in all the nodes
Data Parallelism
• Break the dataset into N nodes
• The same model is in all the nodes
• Good for networks with few weights, e.g., GoogleNet
• Intel Optimized Caffe uses this. Model+Data Parallelism is work in progress.
![Page 106: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/106.jpg)
106
Data Parallelism
Forrest Iandola, et al., “Scaling DNN Training on Intel Platforms.” 2016
![Page 107: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/107.jpg)
107
Scaling Efficiency: Intel® Xeon Phi™ Processor Deep Learning Image Classification Training Performance - MULTI-NODE Scaling
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance . *Other names and brands may be property of others Configurations: • Intel® Xeon Phi™ Processor 7250 (68 Cores, 1.4 GHz, 16GB MCDRAM), 128 GB memory, Red Hat* Enterprise Linux 6.7, Intel® Optimized Framework
0
10
20
30
40
50
60
70
80
90
100
1 2 4 8 16 32 64 128
SCA
LIN
G E
FFIC
IEN
CY
%
# OF INTEL® XEON PHI™ PROCESSOR 7250 (68-CORES, 1.4 GHZ, 16 GB) NODES
OverFeat AlexNet VGG-A GoogLeNet
62
87
![Page 108: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/108.jpg)
108
Challenges • Very large batch sizes train too slow and may
not reach the same accuracy
• GoogleNet performance decreases for batch sizes > 1024 𝑇𝑖𝑚𝑒 𝑇𝑜 𝑇𝑟𝑎𝑖𝑛 (𝑇𝑇𝑇)
𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒
sweet spot
![Page 109: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/109.jpg)
109
Challenges • Total batch size = 1024:
• 1024 nodes each w/node batch size 1 (too much communication)
• 256 nodes each w/node batch size 4
• 64 nodes each w/node batch size 16 (most communication is hidden in the computation)
𝑇𝑖𝑚𝑒 𝑇𝑜 𝑇𝑟𝑎𝑖𝑛 (𝑇𝑇𝑇)
𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒
sweet spot
![Page 110: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/110.jpg)
110
Challenges • Total batch size = 1024:
• 1024 nodes each w/node batch size 1 (too much communication)
• 256 nodes each w/node batch size 4
• 64 nodes each w/node batch size 16 (most communication is hidden in the computation)
• Guideline: Batch size is correlated with learning rate (not always linearly but it’s a good place to start)
𝑇𝑖𝑚𝑒 𝑇𝑜 𝑇𝑟𝑎𝑖𝑛 (𝑇𝑇𝑇)
𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒
sweet spot
![Page 111: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/111.jpg)
111
https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-optimized-for-intel-architecture
![Page 112: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/112.jpg)
112
Resources • Getting started with Caffe tutorial
• https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-optimized-for-intel-architecture
• Deep learning SDK
• https://software.intel.com/en-us/machine-learning/deep-learning/sdk-signup
• Intel Optimized Frameworks
• https://github.com/intel/caffe
• https://github.com/intel/theano
• https://github.com/intel/torch
• other frameworks coming soon…
For questions contact Nadya Plotnikova, Sr SW Engineer, CRT/DCG Russia
![Page 113: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/113.jpg)
113
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
Legal Notices & Disclaimers
![Page 114: (based on Andres Rodriguez’ course “Deep Learning 101 ... · 3 Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations •](https://reader034.vdocument.in/reader034/viewer/2022042417/5f325e4404de8759e501bf9b/html5/thumbnails/114.jpg)