Download - Machine Learning and Data Mining I
![Page 1: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/1.jpg)
DS 4400
Alina Oprea
Associate Professor, CCIS
Northeastern University
April 4 2019
Machine Learning and Data Mining I
![Page 2: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/2.jpg)
Logistics
• Final project presentations
– Thursday, April 11
– Tuesday, April 16 in ISEC 655
– 8 minute slot – 5 min presentation and 3 min questions
• Final report due on Tuesday, April 23
– Template in Piazza
– Schedule on Piazza
2
![Page 3: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/3.jpg)
Training
• Training data 𝑥(1) , y(1), … 𝑥(𝑁) , y(N)
• One training example 𝑥(𝑖) = 𝑥1(𝑖), … 𝑥𝑑
(𝑖), label 𝑦(𝑖)
• One forward pass through the network– Compute prediction ො𝑦(𝑖)
• Loss function for one example
– 𝐿 ො𝑦, 𝑦 = −[ 1 − 𝑦 log 1 − ො𝑦 + 𝑦 log ො𝑦]
• Loss function for training data
– 𝐽 𝑊,𝑏 =1
𝑁σ𝑖 𝐿 ( ො𝑦
(𝑖), 𝑦(𝑖)) + 𝜆𝑅(𝑊, 𝑏)
3
Cross-entropy loss
![Page 4: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/4.jpg)
Mini-batch Gradient Descent
• Initialization– For all layers ℓ
• Set 𝑊 [ℓ], 𝑏[ℓ] at random
• Backpropagation– Fix learning rate 𝛼
– For all layers ℓ (starting backwards)• For all batches b of size B with training examples 𝑥(𝑖𝑏), 𝑦(𝑖𝑏)
–𝑊[ℓ] = 𝑊[ℓ] − 𝛼σ𝑖=1𝐵 𝜕𝐿( ො𝑦(𝑖𝑏),𝑦(𝑖𝑏))
𝜕𝑊 ℓ
– 𝑏[ℓ] = 𝑏[ℓ] − 𝛼σ𝑖=1𝐵 𝜕𝐿( ො𝑦(𝑖𝑏),𝑦(𝑖𝑏))
𝜕𝑏 ℓ
4
![Page 5: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/5.jpg)
Given training set 𝑥1, 𝑦1 , … , 𝑥𝑁, 𝑦𝑁Initialize all parameters 𝑊 [ℓ], 𝑏[ℓ] randomly, for all layers ℓLoop
Training NN with Backpropagation
5
Update weights via gradient step
• 𝑊𝑖𝑗[ℓ]
= 𝑊𝑖𝑗[ℓ]− 𝛼
Δ𝑖𝑗[ℓ]
𝑁
• Similar for 𝑏𝑖𝑗[ℓ]
Until weights converge or maximum number of epochs is reached
EPOCH
![Page 6: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/6.jpg)
Gradient Descent Variants
6
![Page 7: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/7.jpg)
Training Neural Networks
• Randomly initialize weights
• Implement forward propagation to get prediction ෝ𝑦𝑖 for any training instance 𝑥𝑖
• Compute loss function 𝐿 ො𝑦𝑖 , 𝑦𝑖• Implement backpropagation to compute partial
derivatives 𝜕𝐿( ො𝑦(𝑖) ,𝑦(𝑖))
𝜕𝑊 ℓ and𝜕𝐿( ො𝑦(𝑖),𝑦(𝑖))
𝜕𝑏 ℓ
• Use gradient descent with backpropagation to compute parameter values that optimize loss
• Can be applied to both feed-forward and convolutional nets
7
![Page 8: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/8.jpg)
Neural Network Architectures
8
Feed-Forward Networks• Neurons from each layer
connect to neurons from next layer
Convolutional Networks• Includes convolution layer
for feature reduction• Learns hierarchical
representations
Recurrent Networks• Keep hidden state• Have cycles in
computational graph
![Page 9: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/9.jpg)
Outline
• Recurrent Neural Networks (RNNs)– One-to-one, one-to-many, many-to-one, many-to-
many
– Blog by Andrej Karpathy• http://karpathy.github.io/2015/05/21/rnn-
effectiveness/
• Unsupervised learning
• Dimensionality reduction– PCA
• Clustering
9
![Page 10: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/10.jpg)
RNN Architectures
10
![Page 11: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/11.jpg)
RNN Architectures
11
![Page 12: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/12.jpg)
RNN Architectures
12
![Page 13: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/13.jpg)
RNN Architectures
13
![Page 14: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/14.jpg)
Recurrent Neural Network
14
![Page 15: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/15.jpg)
RNN: Computational Graph
15
![Page 16: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/16.jpg)
RNN: Computational Graph
16
![Page 17: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/17.jpg)
One-to-Many
17
![Page 18: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/18.jpg)
Many-to-Many
18
![Page 19: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/19.jpg)
Many-to-One
19
![Page 20: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/20.jpg)
Example: Language Model
20
![Page 21: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/21.jpg)
Example: Language Model
21
![Page 22: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/22.jpg)
Example: Language Model
22
![Page 23: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/23.jpg)
Training RNNs
23
![Page 24: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/24.jpg)
Training RNNs
24
![Page 25: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/25.jpg)
Writing poetry
25
![Page 26: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/26.jpg)
Writing poetry
26
![Page 27: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/27.jpg)
Writing geometry proofs
27
![Page 28: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/28.jpg)
Writing geometry proofs
28
![Page 29: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/29.jpg)
Example RNN: LSTM
29
Capture long-term dependencies by using “memory cells”
![Page 30: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/30.jpg)
LSTM
30
Hidden State
Memory Cell
![Page 31: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/31.jpg)
LSTM vs Standard RNN
31
RNN
LSTM
![Page 32: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/32.jpg)
Summary RNNs
• RNNs maintain state and have flexible design
– One-to-many, many-to-one, many-to-many
• Applicable to sequential data
• LSTM maintains both short-term and long-term memory
• Better and simpler architectures are a topic of active research
32
![Page 33: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/33.jpg)
Unsupervised Learning
33
![Page 34: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/34.jpg)
Unsupervised Learning
• Different learning tasks
• Dimensionality reduction– Project the data to lower dimensional space
– Example: PCA (Principal Component Analysis)
• Feature learning– Find feature representations
– Example: Autoencoders
• Clustering– Group similar data points into clusters
– Example: k-means, hierarchical clustering
34
![Page 35: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/35.jpg)
Supervised vs Unsupervised Learning
35
Standard metrics for evaluation
Difficult to evaluate
![Page 36: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/36.jpg)
How Can we Visualize High-Dimensional Data?
36
![Page 37: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/37.jpg)
Data Visualization
37
![Page 38: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/38.jpg)
Principal Component Analysis
38
![Page 39: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/39.jpg)
The Principal Components
39
![Page 40: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/40.jpg)
2D Gaussian Data
40
![Page 41: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/41.jpg)
1st PCA Axis
41
![Page 42: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/42.jpg)
2nd PCA Axis
42
![Page 43: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/43.jpg)
PCA Algorithm
43
Σ𝑥 = 𝜆𝑥
![Page 44: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/44.jpg)
PCA
44
![Page 45: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/45.jpg)
PCA
45
![Page 46: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/46.jpg)
PCA
46
![Page 47: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/47.jpg)
Visualizing data
47
![Page 48: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/48.jpg)
PCA for image compression
d=1 d=2 d=4 d=8
d=16 d=32 d=64 d=100Original Image
48
![Page 49: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/49.jpg)
Summary: PCA
• PCA creates a lower-dimensional feature representation– Linear transformation
• Can be used for visualization
• Can be used with supervised on unsupervised learning– Very common to use classification after PCA
transformation
• Main drawback– No interpretability of resulting features
49
![Page 50: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/50.jpg)
Clustering
• Goal: Automatically segment data into groups of similar points
• Question: When and why would we want to do this?
• Useful for:– Automatically organizing data
– Understanding hidden structure in data and data distribution
– Detect similar points in data and generate representative samples
50
![Page 51: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/51.jpg)
Clustering Examples
• Social networks– Facebook user group according to their interests and
profiles
• Image search– Retrieve similar images to input image
• NLP– Topic discovery in articles
• Medicine– Patients with similar disease and symptoms
• Cyber security– Machine with same malware infection – New attack has no label
51
![Page 52: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/52.jpg)
Setup
d
< 𝑥𝑛,1,… , 𝑥𝑛,𝑑 >
Assignment from each point to cluster index52
![Page 53: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/53.jpg)
What is a good distance function?
Partition this data into k groups
53Euclidean distance: 𝑑 𝑥, 𝑦 = √
𝑗=1
𝑑
𝑥𝑗 − 𝑦𝑗2
![Page 54: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/54.jpg)
K means Algorithm
• Fix a number of desired clusters k
• Key insight: describe each cluster by its mean value (called cluster representative)
• Algorithm– Select k cluster means at random
– Assign points to “closest cluster”
– Re-compute cluster means based on new assignment
– Refine assignment iteratively until convergence
54
![Page 55: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/55.jpg)
Example: Start
55
![Page 56: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/56.jpg)
56
Example: Iteration 1
![Page 57: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/57.jpg)
57
Example: Iteration 2
![Page 58: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/58.jpg)
58
Example: Iteration 3
![Page 59: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/59.jpg)
59
Example: Iteration 4
![Page 60: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/60.jpg)
60
Example: Iteration 5
![Page 61: Machine Learning and Data Mining I](https://reader030.vdocument.in/reader030/viewer/2022012102/616a03ba11a7b741a34ddf13/html5/thumbnails/61.jpg)
Acknowledgements
• Slides made using resources from:
– Yann LeCun
– Andrew Ng
– Eric Eaton
– David Sontag
– Andrew Moore
• Thanks!