joint unsupervised learning of deep representations and...
TRANSCRIPT
Joint Unsupervised Learning of Deep Representations and Image ClustersJianwei Yang, Devi Parikh, Dhruv Batra
Virginia Tech
IEEE 2016 Conference on
Computer Vision and Pattern
Recognition
Overview
a) Meaningful clusters can provide supervisory signals to
learn image representations.
Intuitions
Recurrent Framework
Methodology
Divide T timesteps into P partially unrolled periods. A forward pass and backward pass
are implemented at each period.
Goal
Analysis and Visualizations
https://github.com/jwyang/joint-unsupervised-learning
MNIST-test
Initial (1762) Middle (17) Final (10)
COIL-20
Initial (421) Middle (42) Final (20)
Cluster Labels
CNN Parameters
Agglomerative Clustering
CNN Training (backprop)
Cluster and learn deep representations for unlabeled images
1-nearest neighbor classification error for different methods on MNIST test set.
PCA Autoencoder
Parametric t-SNE
Visualizing MNIST test set in 2D. The first three figures are copied from param. t-SNE paper.
Ours
Objective Function Quantitative Results
Forward pass: merge clusters based
on local structure (right), and
conventional merging strategy (left).
Proposed recurrent framework for unsupervised learning of
deep representations and image clusters.
Backward pass: learn representation
to further reduce dissimilarity among
samples in merged clusters.
Test datasets and methods Information.
Testing generalization of our learnt (unsupervised) representation to LFW face verification.
Evaluation on CIFAR-10 classification
Minimizing overall loss for T time steps:
Loss in forward pass in period p (merge clusters): pass
Loss in backward pass in period p (train CNN): pass
Approximated weighted triplet-loss in backward pass:
pass
b) Good representations help to get meaningful clusters.
e) Cluster and learn rep. iteratively & progressively
d) Learn rep. first, and then cluster image based on that
c) Cluster first, and then learn representations (rep.)
Iterative optimization
Our clustering performance vs. that of existing clustering approaches using raw image data.
Clustering performance using our representation fed to existing clustering algorithms.
Back-propagate loss
Images
The clusters are more accurate and representations are more discriminative
from initial stage to final stage.
Average 1.97% error reduction.
Our approach can be potentially
used as a visualization tool.
The KNN purity is significantly improved compared with raw image data
(see arrows). This explains the quantitative improvements of our method.
,
arg min ( , | )y
L y I
arg min ( | , )y
L y I
arg min ( | , )L y I
Average 0.3% lower than supervised method 1.96% over K-means
Average +21.5% on NMI
+22.2% on AC
+14.1% on NMIThe stats below are average across all datasets. See paper for details
Metric:
Normalized Mutual Info. (NMI)
Clustering Accuracy (AC)
This triplet-loss can be optimized via backprop.
Visualization of clusters and learned representations at different learning stages.
KNN class purity of raw image data (left) and our learned representations (right).
Cluster Image
Learn Representation
Average +25.7% on AC
+6.43% on NMI; +12.76% on AC to best performance
of existing approaches averaged over all datasets