graph-based techniques for searching large-scale noisy …sfchang/papers/ucla ipam chang...
TRANSCRIPT
![Page 1: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/1.jpg)
Graph-based Techniques for Searching Large-Scale Noisy
Multimedia Data
Shih-Fu ChangDepartment of Electrical Engineering
Department of Computer Science
Columbia University
Joint work with Jun Wang (IBM), Tony Jebara (Columbia U), Wei Liu, Junfeng He, and Yu-Gang Jiang (Fudan U)
1
![Page 2: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/2.jpg)
Graph-based Semi-Supervised Learning• Given a small set of labeled data and a large number of
unlabeled data in a high-dimensional feature space– Build sparse graphs with local connectivity– Propagate information over graphs of large data sets– Hopefully robust to noise and scalable to gigantic sets
2
Input samples with sparse labelsInput samples with sparse labels Label propagation on graphLabel propagation on graph Label inference resultsLabel inference results
Unlabeled Positive NegativeNegativePositive
![Page 3: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/3.jpg)
Intuition Capture local structures via sparse graph
Linear Classifier
Nonlinear ClassifierGraph Semi-Supervised Learning
Through Spare Graph Construction (e.g., kNN)
![Page 4: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/4.jpg)
Graph Construction
Label Propagation
Image\Video dataProcessing (denoising, cropping …)
Possible Applications:Propagating Labels in Interactive Search & Auto Re‐ranking
Feature Extraction
Compute Similarity
Applications
Search, Browsing
User Interface
Interactive browse / label
Interactive Mode
4S.-F. Chang, Columbia U.
Top-rank resultsExisting
Ranking/filtering System
Automatic Mode
Re-ranking over large set
No predefined Category
![Page 5: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/5.jpg)
Example: Web Search Reranking
Keyword Search
Web Search
Top images as +Bottom imgs as ‐
Label Diagnosis Diffusion
Google Search “Statue of Liberty”
![Page 6: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/6.jpg)
Application: Web Search Reranking
Keyword Search
Web Images
Top images as +Bottom imgs as ‐
Label Diagnosis Diffusion
Rerank
![Page 7: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/7.jpg)
Application: Web Search Reranking
Keyword Search
Web Images
Top images as +Bottom imgs as ‐
Label Diagnosis Diffusion
Google Search “Tiger”
![Page 8: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/8.jpg)
Application: Web Search Reranking
Keyword Search
Web Images
Top images as +Bottom imgs as ‐
Label Diagnosis Diffusion
Rerank
How to Handle Noisy Labels before Propagation?Scalability?
![Page 9: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/9.jpg)
Background ReviewGiven a dataset of labeled samples , and unlabeled samplesundirected graph of samples as vertices and edges weighted by sample similarity
Define weight matrix ; vertex degree
![Page 10: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/10.jpg)
Example1
1
22
1
Weight matrix Node degree
Label matrix
classes
samples
1 0 00 1 00 0 10 0 1? ? ?
F
1 0 00 1 00 0 10 0 10.1 0.2 0.9
Label predictionGraph-based
SSL
D
1 0 0 0 00 3 0 0 00 0 3 0 00 0 0 4 00 0 0 0 3
![Page 11: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/11.jpg)
Some Options of Constructing Sparse Graph
Distance Threshold K-Nearest Neighbor Graph
B-Matched Graph
1 and
(Huang and Jebara, AISTATS 2007)(Jebara, Wang, and Chang, ICML 2009)
max
max
![Page 12: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/12.jpg)
Several Ways of Constructing Sparse Graphs
Distance threshold Rank threshold (kNN) B-Match
k,b=4
k,b=6
![Page 13: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/13.jpg)
Examples of Graph Construction
(KNN) (B-Matching)
k = 4 b = 4
![Page 14: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/14.jpg)
Graph Construction – Edge Weighting
Binary Weighting
Gaussian Kernel Weighting
Locally Linear Reconstruction Weighting
![Page 15: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/15.jpg)
Measure Smoothness: Graph Laplacian
Graph Laplacian , and normalized Laplacian
smoothness of function f over graph,
Multi-class
![Page 16: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/16.jpg)
Classical Methods:
• Predict a graph function (F) via cost optimization
Local and Global Consistency - LGC (Zhou et al, NIPS 04)
Gaussian Random Fields – GRF (Zhu et al, ICML03)
prediction function function smoothness empirical loss
0
(Zhu et al ICML03, Zhou et al NIPS04, Joachim ICML03)
∗→
![Page 17: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/17.jpg)
Compare method-graphs-weights
B-matching tends to outperform kNN
B-Matching particularly good for GTAM + local linear (LLR) weight
Empirical Observations
17
GTAMGTAMGTAMGTAMGTAMGTAM
(Jebara, Wang, and Chang, ICML 2009)
![Page 18: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/18.jpg)
18
Noisy Label and other Challenges
Unbalanced Labels
Ill Label Locations
Noisy Data and Labels
LGC Propagation
GRF Propagation
![Page 19: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/19.jpg)
Label Unbalance ‐ A Quick FixNormalize labels within each class based on node degrees
Example:
Node degree matrix
Label matrix
classes
samples
![Page 20: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/20.jpg)
Change uni‐variate optimization to bi‐variateformulation:
Dealing with Noisy Labels‐‐ Graph Transduction via Alternate Minimization
( GTAM, Wang, Jebara, & Chang, ICML, 2008) ( LDST, Wang, Jiang, & Chang, CVPR, 2009)
![Page 21: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/21.jpg)
Alternate Optimization
Then, search optimal integer Y given F*
First, given Y solve continuous valued
Gradient decent search
![Page 22: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/22.jpg)
Alternate Minimization for Label Tuning
Iteratively repeat the above procedure
Add label:Delete label:
Q =
0.8 0.10.23 0.250.31 0.070.17 0.04
(1,1)
(3,1)
=
1 00 10 00 0
Example:
=
0 00 11 00 0
![Page 23: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/23.jpg)
Convergence procedure(non-monotonic due to discrete step size)
Example – Toy Data
Unlabeled Positive Negative
Label propagation by GTAM
Consider adding label only
![Page 24: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/24.jpg)
Decline of the cost function Q over iterations (with vs. without label tuning)
Iteration # 2
Iteration # 6
Initial Labels
Label Diagnosis and Self Tuning( LDST, Wang, Jian, & Chang, CVPR, 2009)
Add label:
Delete label:
![Page 25: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/25.jpg)
Application: Web Search Reranking
Keyword Search
Web Images
Top images as +Bottom imgs as ‐
Label Diagnosis Diffusion
Google Search “Tiger”
![Page 26: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/26.jpg)
Application: Web Search Reranking
Keyword Search
Web Images
Top images as +Bottom imgs as ‐
Label Diagnosis Diffusion
Rerank
![Page 27: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/27.jpg)
Figure 4. Example images of text search results from flickr.com. A total of nine text queries are used: dog, tiger, panda, bird, flower, airplane, forbidden city, statue of liberty, golden bridge.
![Page 28: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/28.jpg)
Effects of Graph‐based reranking
VisualRank: Jing & Baluja, 08
![Page 29: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/29.jpg)
Graph Construction
Label Propagation
Image\Video dataProcessing (denoising, cropping …)
Possible Applications:Propagating Labels in Interactive Search & Auto Re‐ranking
Feature Extraction
Compute Similarity
Applications
Search, Browsing
User Interface
Interactive browse / label
Interactive Mode
29S.-F. Chang, Columbia U.
No predefined Category
![Page 30: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/30.jpg)
Use image graph to tune & propagate information
Use EEG brain signals to detect target of interest
(joint work with Sajda et al, ACMMM 2009, J. of Neural Engineering, May 2011)
Application: Brain Machine Interface for Image Retrieval-- denoise unreliable labels from brain signal decoding
![Page 31: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/31.jpg)
31
The ParadigmDatabase (any target that may interest users)
![Page 32: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/32.jpg)
32
Database
Neural (EEG) decoder
EEG-scores
The Paradigm
![Page 33: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/33.jpg)
33
Database
Neural (EEG) decoder
Exemplar labels (noisy)
Graph-based Semi-Supervised
Learning
The Paradigm
image features
prediction score
![Page 34: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/34.jpg)
34
Pre-triage Post-triage
The Paradigm
![Page 35: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/35.jpg)
35
Pre-triage Post-triage
The Paradigm
Human inspects only a small
sample set via BCI
Machine filters out noise and retrieves targets from very
large DB
• General: no predefined target models, no keyword
• High Throughput: neuro‐vision as bootstrap of fast computer vision
![Page 36: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/36.jpg)
36
The Neural Signatures of “Recognition”D. Linden, Neuroscientist, 2005, the Oddball Effect
Novel (P3a)
NovelTarget Standard
Target (P3b)
time
Standard
Target
Novel
![Page 37: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/37.jpg)
37
Effect of graph-based reranking (BCI test)
Top (noisy) results of Brain EEG signal detection
Top results after graph‐based label denoising
& propagation
P‐R curve significantly improved
![Page 38: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/38.jpg)
38
More Example Results
Top 20 results of EEG detection
Top 20 results of Hybrid System (BCI‐VPM)
Top 20 results of EEG detection
Top 20 results of Hybrid System (BCI‐VPM)
![Page 39: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/39.jpg)
Graph over million points and more• k-NN graph construction + label prediction
• infeasible for large-scale tasks• Idea: AnchorGraph Regularization
complexity: # anchors m << n
(W. Liu, J. He, S.-F. Chang, ICML2010) 0 1000 2000 3000 4000 5000
0
2
4
6
8
10
12
14x 10
10 Time Complexity
data size n
time
![Page 40: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/40.jpg)
Active topic in research• Large‐scale spectral analysis (Fergus et al, ‘09)
– Approximate solutions as linear combinations of a small number of eigenfucntions of graph Laplacian
– Elegant solutions with linear complexity– But only applicable to ideal data distributions(separable uniform or Gaussian)
• Matrix approximation via Nyström (Zhang et al, ‘09)
– Complexity – But may not be positive semidefinite ‐> non‐convex
W =
![Page 41: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/41.jpg)
Idea: Build low-rank graph via anchors• Use anchor points to “abstract” the graph structure• Compute data-to-anchor similarity: sparse local embedding
• Data-to-data similarity W = inner product in the embedded space
data points
anchor points
x8
x4
u1
u2
u5
u4
u6
u3
x1
Z11
Z12
Z16
W14=0
W18>0
(Liu, He, Chang, ICML10)
![Page 42: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/42.jpg)
Probabilistic Intuition• Affinity between samples i and j, Wij
= probability of two‐step Markov random walk
AnchorGraph: sparse, positive semi-definite
, where = diag( ), m<< n
![Page 43: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/43.jpg)
AnchorGraph Regression• Apply the same sparse embedding principle to labels
• The whole graph regularization process becomes low-rank
Small matrix inversion
Predicted function over graph = embedding matrix ∙ inferred labels on anchors
∗ ∗
![Page 44: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/44.jpg)
Intuition: Anchor Graph SSL
initial labelslabel mapping inlabel mapping out
Use low‐rank ARG to infer optimal labels on anchors and samples
Predict optimal labels in the anchor space (~100 labels)
Propagate to original sample space (~million labels)
![Page 45: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/45.jpg)
Performance -small data set
Method Error Rate (%) Time (seconds)1NN 20.15 0.12
LGC with 6NN graph
8.79 403.02
GFHF with 6NN graph
5.19 413.28
AGR^0 7.40 10.20AGR 6.56 16.57
40x speedup
• USPS-Train: 7,291 images of digits, 10 classes, 10 samples per class• AGR^0: K-means anchors and naïve Z
AGR: K-means anchors and optimized Z
accuracy comparable to analytical optimum
![Page 46: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/46.jpg)
Large Data Set Evaluation• 630,000 MNIST images over 10 classes, 100 labeled images only• Conventional analytical solutions infeasible• Among scalable solutions ‐ reduce error rates by 30%‐50%
Method Error Rate (%) Training Time (seconds)
1NN 39.65 5.46Eigenfunction (‘09) 36.94 44.08
PVM (‘09) 29.37 266.89AGR^0 24.71 232.37AGR 19.75 331.72
30%-50% gain
![Page 47: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/47.jpg)
Extension to Web-Scale• Techniques described above not scalable to Web‐scale or dynamic data sets– Cannot handle cases when n = ~ billions – For dynamic data, updating graph is expensive
• Preferred: learn Inductive Models to handle novel dynamic data
47
![Page 48: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/48.jpg)
Data Subsampling & Learn Inductive Model
48
Web-scale database
one million data pointsone million data points
anchor points
subsampling
seed labels
Anchor GraphConstruction
data-to-anchor map z(x)
Anc
hor G
raph
Reg
ular
izat
ion
anchors’ labels a
x
novel data point xz(x) × f(x)=z(x)aT
predict x’s label
![Page 49: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/49.jpg)
background
truck
ship
horse
frog
dog
deer
cat
bird
automobile
airplane
training images test images
ARG over 80M Tiny Images + CIFAR‐10
![Page 50: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/50.jpg)
Method 1NN LinearSVM
EigenFunction
PVM AGR
1K Anchors
2kAnchors
1K Anchors
2kAnchors
Accuracy(%)
51.66±0.28
60.14±0.34
53.86±0.35
60.55±0..32
60.95±0.41
62.39±0.33
64.23±0.28
TrainingTime (s) 0 8.00 149.83 213.88 517.82 206.60 477.61
Test Time (s)
6.29e‐4 2.66e‐6 1.39e‐4 5.79e‐5 1.27e‐4 6.20e‐5 1.39e‐4
80Million Tiny
Images1Million samples(1% labels from CIFAR-10)
ARG as inductive modelNovel test sample
Learn ARG
![Page 51: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/51.jpg)
Additional Issues
• Multi‐edge Graph– Multiple relation edges between nodes
• Multi‐feature Graph– Build graphs in multiple feature spaces– Joint optimization
• Label tuning vs. Active Learning
51
![Page 52: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/52.jpg)
Image‐Based Multi‐Edge Graph
52
two images with the same tag
dog, flower dog, bird
• one edge connecting the two regions sharing the tag, but not all
• How to propagate label over multiple edges?
Liu et al, ACM Multimedia 2010
![Page 53: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/53.jpg)
Extension to Multi-Feature Graphs
Feature 1
Feature K
0 50 100 150 200-0.1
0
0.1
0.2
0.3
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
…
Graph 1 Graph K
User Input
Ranking list
Label Propagation
How to handle noisy labels in multiple graphs?
How to handle noisy labels in
multiple graphs?
![Page 54: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/54.jpg)
Multi-graph SSL vs. single-graph
Caltech 101 data set
Improve performance by 20%-80%
![Page 55: Graph-based Techniques for Searching Large-Scale Noisy …sfchang/papers/UCLA IPAM Chang 2012.pdf · 2012-01-20 · Graph-based Techniques for Searching Large-Scale Noisy Multimedia](https://reader034.vdocument.in/reader034/viewer/2022050523/5fa6996a845e0b29bf692801/html5/thumbnails/55.jpg)
References and Tools
1. X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi‐supervised learning using Gaussian fields and harmonic functions. ICML, 2003.
2. D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. NIPS, 2004.
3. W. Liu, J. He, and S.‐F. Chang. Large graph construction for scalable semi‐supervised learning. ICML, 2010. Software: http://www.ee.columbia.edu/∼wliu/Anchor Graph.zip.
4. W. Liu, J. Wang, S. Kumar, and S.‐F. Chang. Hashing with graphs. ICML, 2011.5. J. Wang, T. Jebara, and S.‐F. Chang. Graph transduction via alternating minimization. ICML,
2008.6. J. Wang, Y.‐G. Jiang, and S.‐F. Chang. Label diagnosis through self tuning for web image
search. CVPR, 2009.7. W. Liu, J. Jun, and S.‐F. Chang, Robust and Scalable Graph‐Based Semi‐Supervised Learning.
In Review, IEEE Proceedings, 2012.8. J. Wang, E. Pohlmeyer, B. Hanna, Y.‐G. Jiang, P. Sajda, and S.‐F. Chang,
“Brain State Decoding for Rapid Image Retrieval,” ACM Multimedia Conference, 2009.9. J. Wang, A. Kumar, S.‐F. Chang, “Semi‐Supervised Hashing for Scalable Image Retrieval”,
CVPR 2010.
55