joint unsupervised learning of deep representations and...

1
Joint Unsupervised Learning of Deep Representations and Image Clusters Jianwei Yang, Devi Parikh, Dhruv Batra Virginia Tech IEEE 2016 Conference on Computer Vision and Pattern Recognition Overview a) Meaningful clusters can provide supervisory signals to learn image representations. Intuitions Recurrent Framework Methodology Divide T timesteps into P partially unrolled periods. A forward pass and backward pass are implemented at each period. Goal Analysis and Visualizations https://github.com/jwyang/joint-unsupervised-learning MNIST-test Initial (1762) Middle (17) Final (10) COIL-20 Initial (421) Middle (42) Final (20) Cluster Labels CNN Parameters Agglomerative Clustering CNN Training (backprop) Cluster and learn deep representations for unlabeled images 1-nearest neighbor classification error for different methods on MNIST test set. PCA Autoencoder Parametric t-SNE Visualizing MNIST test set in 2D. The first three figures are copied from param. t-SNE paper. Ours Objective Function Quantitative Results Forward pass: merge clusters based on local structure (right), and conventional merging strategy (left). Proposed recurrent framework for unsupervised learning of deep representations and image clusters. Backward pass: learn representation to further reduce dissimilarity among samples in merged clusters. Test datasets and methods Information. Testing generalization of our learnt (unsupervised) representation to LFW face verification. Evaluation on CIFAR-10 classification Minimizing overall loss for T time steps: Loss in forward pass in period p (merge clusters): pass Loss in backward pass in period p (train CNN): pass Approximated weighted triplet-loss in backward pass: pass b) Good representations help to get meaningful clusters. e) Cluster and learn rep. iteratively & progressively d) Learn rep. first, and then cluster image based on that c) Cluster first, and then learn representations (rep.) Iterative optimization Our clustering performance vs. that of existing clustering approaches using raw image data. Clustering performance using our representation fed to existing clustering algorithms. Back-propagate loss Images The clusters are more accurate and representations are more discriminative from initial stage to final stage. Average 1.97% error reduction. Our approach can be potentially used as a visualization tool. The KNN purity is significantly improved compared with raw image data (see arrows). This explains the quantitative improvements of our method. , arg min (, | ) y Ly I arg min ( | ,) y Ly I arg min ( | ,) L yI Average 0.3% lower than supervised method 1.96% over K-means Average +21.5% on NMI +22.2% on AC +14.1% on NMI The stats below are average across all datasets. See paper for details Metric: Normalized Mutual Info. (NMI) Clustering Accuracy (AC) This triplet-loss can be optimized via backprop. Visualization of clusters and learned representations at different learning stages. KNN class purity of raw image data (left) and our learned representations (right). Cluster Image Learn Representation Average +25.7% on AC +6.43% on NMI; +12.76% on AC to best performance of existing approaches averaged over all datasets

Upload: others

Post on 19-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Joint Unsupervised Learning of Deep Representations and ...jyang375/Jianwei_Yang_files/cvpr16-jule... · Joint Unsupervised Learning of Deep Representations and Image Clusters Jianwei

Joint Unsupervised Learning of Deep Representations and Image ClustersJianwei Yang, Devi Parikh, Dhruv Batra

Virginia Tech

IEEE 2016 Conference on

Computer Vision and Pattern

Recognition

Overview

a) Meaningful clusters can provide supervisory signals to

learn image representations.

Intuitions

Recurrent Framework

Methodology

Divide T timesteps into P partially unrolled periods. A forward pass and backward pass

are implemented at each period.

Goal

Analysis and Visualizations

https://github.com/jwyang/joint-unsupervised-learning

MNIST-test

Initial (1762) Middle (17) Final (10)

COIL-20

Initial (421) Middle (42) Final (20)

Cluster Labels

CNN Parameters

Agglomerative Clustering

CNN Training (backprop)

Cluster and learn deep representations for unlabeled images

1-nearest neighbor classification error for different methods on MNIST test set.

PCA Autoencoder

Parametric t-SNE

Visualizing MNIST test set in 2D. The first three figures are copied from param. t-SNE paper.

Ours

Objective Function Quantitative Results

Forward pass: merge clusters based

on local structure (right), and

conventional merging strategy (left).

Proposed recurrent framework for unsupervised learning of

deep representations and image clusters.

Backward pass: learn representation

to further reduce dissimilarity among

samples in merged clusters.

Test datasets and methods Information.

Testing generalization of our learnt (unsupervised) representation to LFW face verification.

Evaluation on CIFAR-10 classification

Minimizing overall loss for T time steps:

Loss in forward pass in period p (merge clusters): pass

Loss in backward pass in period p (train CNN): pass

Approximated weighted triplet-loss in backward pass:

pass

b) Good representations help to get meaningful clusters.

e) Cluster and learn rep. iteratively & progressively

d) Learn rep. first, and then cluster image based on that

c) Cluster first, and then learn representations (rep.)

Iterative optimization

Our clustering performance vs. that of existing clustering approaches using raw image data.

Clustering performance using our representation fed to existing clustering algorithms.

Back-propagate loss

Images

The clusters are more accurate and representations are more discriminative

from initial stage to final stage.

Average 1.97% error reduction.

Our approach can be potentially

used as a visualization tool.

The KNN purity is significantly improved compared with raw image data

(see arrows). This explains the quantitative improvements of our method.

,

arg min ( , | )y

L y I

arg min ( | , )y

L y I

arg min ( | , )L y I

Average 0.3% lower than supervised method 1.96% over K-means

Average +21.5% on NMI

+22.2% on AC

+14.1% on NMIThe stats below are average across all datasets. See paper for details

Metric:

Normalized Mutual Info. (NMI)

Clustering Accuracy (AC)

This triplet-loss can be optimized via backprop.

Visualization of clusters and learned representations at different learning stages.

KNN class purity of raw image data (left) and our learned representations (right).

Cluster Image

Learn Representation

Average +25.7% on AC

+6.43% on NMI; +12.76% on AC to best performance

of existing approaches averaged over all datasets