dimensionality reduction for graph of words embedding€¦ · graph of words...
TRANSCRIPT
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Dimensionality Reduction for Graph of Words EmbeddingGbR’2011 Oral talk
Jaume Gibert, Ernest ValvenyComputer Vision Center,
Universitat Autonoma de Barcelona,Barcelona, Spain
Horst BunkeInstitute of Computer Science and Applied Mathematics,
University of Bern,Bern, Switzerland
May 18th, 2011 Dimensionality Reduction for GOW Embedding 1/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Are these features really non-informative?
Simple features
Number of nodes, number ofedges
Number of nodes with label A, orlabel B,...
Number of edges between label Aand label C ,...
...
−→ These features might not be informative nor discriminative.
May 18th, 2011 Dimensionality Reduction for GOW Embedding 2/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Goals of the paper
Define a new embedding methodology exploiting features based on statistics ofnode labels
Adapt these features to graphs with continuous labels on nodes
→ Graph of Words Embedding
Deal with an inherent issue of the methodology: high dimensionality and sparsity
→ Dimensionality reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 3/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Goals of the paper
Define a new embedding methodology exploiting features based on statistics ofnode labels
Adapt these features to graphs with continuous labels on nodes
→ Graph of Words Embedding
Deal with an inherent issue of the methodology: high dimensionality and sparsity
→ Dimensionality reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 3/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Goals of the paper
Define a new embedding methodology exploiting features based on statistics ofnode labels
Adapt these features to graphs with continuous labels on nodes
→ Graph of Words Embedding
Deal with an inherent issue of the methodology: high dimensionality and sparsity
→ Dimensionality reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 3/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
1 Graph of Words Embedding
2 Dimensionality Reduction
3 Experiments
4 Conclusions
May 18th, 2011 Dimensionality Reduction for GOW Embedding 4/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
1 Graph of Words Embedding
2 Dimensionality Reduction
3 Experiments
4 Conclusions
May 18th, 2011 Dimensionality Reduction for GOW Embedding 5/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Motivation
Given a graph g = (V ,E , µ), a vectorial representation could be
xg = (#(l1, g),#(l2, g), . . . ,#(ln, g)) ,
For instance,
xg1 = xg2 = (#(A, g),#(B, g),#(C , g))
= (2, 1, 1)
May 18th, 2011 Dimensionality Reduction for GOW Embedding 6/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Motivation
Given a graph g = (V ,E , µ), a vectorial representation could be
xg = (#(l1, g),#(l2, g), . . . ,#(ln, g)) ,
For instance,
xg1 = xg2 = (#(A, g),#(B, g),#(C , g))
= (2, 1, 1)
May 18th, 2011 Dimensionality Reduction for GOW Embedding 6/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Motivation
More features taking topology into account:
#(li ↔ lj , g),
xg = (#(A, g),#(B, g),#(C , g),
#(A↔ A, g),#(A↔ B, g),#(A↔ C , g),
#(B ↔ B, g),#(B ↔ C , g),#(C ↔ C , g)),
xg1 = (
nodes︷ ︸︸ ︷2, 1, 1,
edges︷ ︸︸ ︷1, 2, 1, 0, 1, 0),
xg2 = (2, 1, 1, 1, 1, 2, 0, 1, 0).
May 18th, 2011 Dimensionality Reduction for GOW Embedding 7/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Motivation
More features taking topology into account:
#(li ↔ lj , g),
xg = (#(A, g),#(B, g),#(C , g),
#(A↔ A, g),#(A↔ B, g),#(A↔ C , g),
#(B ↔ B, g),#(B ↔ C , g),#(C ↔ C , g)),
xg1 = (
nodes︷ ︸︸ ︷2, 1, 1,
edges︷ ︸︸ ︷1, 2, 1, 0, 1, 0),
xg2 = (2, 1, 1, 1, 1, 2, 0, 1, 0).
May 18th, 2011 Dimensionality Reduction for GOW Embedding 7/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Problem
What if we have graphs with continuous labels?
May 18th, 2011 Dimensionality Reduction for GOW Embedding 8/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Idea
→ The more similar two labels are, the more they should count for the same label:
May 18th, 2011 Dimensionality Reduction for GOW Embedding 9/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Idea
→ The more similar two labels are, the more they should count for the same label:
May 18th, 2011 Dimensionality Reduction for GOW Embedding 9/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Idea
→ The more similar two labels are, the more they should count for the same label:
May 18th, 2011 Dimensionality Reduction for GOW Embedding 9/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
How to
We need a set of special points:
From the whole set of node labels, select a set W of representatives / words1
Assign each node to the closest word
λh : V −→Wv 7−→ λh(v) = argmin
wi∈W‖ µ(v)− wi ‖2
Each edge counts as a relation between the words assigned to its nodes
Thus we have the features
Ui = #(wi , g) = | {v ∈ V |wi = λh(v)} |
and
Bij = #(wi ↔ wj , g)
= | {(u, v) ∈ E |wi = λh(u) ∧ wj = λh(v)} |.
1Graph of Words, in analogy to Bag-of-Words
May 18th, 2011 Dimensionality Reduction for GOW Embedding 10/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
How to
We need a set of special points:
From the whole set of node labels, select a set W of representatives / words1
Assign each node to the closest word
λh : V −→Wv 7−→ λh(v) = argmin
wi∈W‖ µ(v)− wi ‖2
Each edge counts as a relation between the words assigned to its nodes
Thus we have the features
Ui = #(wi , g) = | {v ∈ V |wi = λh(v)} |
and
Bij = #(wi ↔ wj , g)
= | {(u, v) ∈ E |wi = λh(u) ∧ wj = λh(v)} |.
1Graph of Words, in analogy to Bag-of-Words
May 18th, 2011 Dimensionality Reduction for GOW Embedding 10/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
How to
We need a set of special points:
From the whole set of node labels, select a set W of representatives / words1
Assign each node to the closest word
λh : V −→Wv 7−→ λh(v) = argmin
wi∈W‖ µ(v)− wi ‖2
Each edge counts as a relation between the words assigned to its nodes
Thus we have the features
Ui = #(wi , g) = | {v ∈ V |wi = λh(v)} |
and
Bij = #(wi ↔ wj , g)
= | {(u, v) ∈ E |wi = λh(u) ∧ wj = λh(v)} |.
1Graph of Words, in analogy to Bag-of-Words
May 18th, 2011 Dimensionality Reduction for GOW Embedding 10/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
How to
We need a set of special points:
From the whole set of node labels, select a set W of representatives / words1
Assign each node to the closest word
λh : V −→Wv 7−→ λh(v) = argmin
wi∈W‖ µ(v)− wi ‖2
Each edge counts as a relation between the words assigned to its nodes
Thus we have the features
Ui = #(wi , g) = | {v ∈ V |wi = λh(v)} |
and
Bij = #(wi ↔ wj , g)
= | {(u, v) ∈ E |wi = λh(u) ∧ wj = λh(v)} |.
1Graph of Words, in analogy to Bag-of-Words
May 18th, 2011 Dimensionality Reduction for GOW Embedding 10/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
How to
We need a set of special points:
From the whole set of node labels, select a set W of representatives / words1
Assign each node to the closest word
λh : V −→Wv 7−→ λh(v) = argmin
wi∈W‖ µ(v)− wi ‖2
Each edge counts as a relation between the words assigned to its nodes
Thus we have the features
Ui = #(wi , g) = | {v ∈ V |wi = λh(v)} |
and
Bij = #(wi ↔ wj , g)
= | {(u, v) ∈ E |wi = λh(u) ∧ wj = λh(v)} |.
1Graph of Words, in analogy to Bag-of-Words
May 18th, 2011 Dimensionality Reduction for GOW Embedding 10/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Consequences
We have some issues to take care of:
Some representatives might not be informative
Some relations between words might not even appear in the whole set (sparsity)
We may have redundancy in the features
The size of the feature vector is quadratic in the number of words
−→ Dimensionality Reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 11/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Consequences
We have some issues to take care of:
Some representatives might not be informative
Some relations between words might not even appear in the whole set (sparsity)
We may have redundancy in the features
The size of the feature vector is quadratic in the number of words
−→ Dimensionality Reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 11/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Consequences
We have some issues to take care of:
Some representatives might not be informative
Some relations between words might not even appear in the whole set (sparsity)
We may have redundancy in the features
The size of the feature vector is quadratic in the number of words
−→ Dimensionality Reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 11/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Consequences
We have some issues to take care of:
Some representatives might not be informative
Some relations between words might not even appear in the whole set (sparsity)
We may have redundancy in the features
The size of the feature vector is quadratic in the number of words
−→ Dimensionality Reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 11/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Consequences
We have some issues to take care of:
Some representatives might not be informative
Some relations between words might not even appear in the whole set (sparsity)
We may have redundancy in the features
The size of the feature vector is quadratic in the number of words
−→ Dimensionality Reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 11/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Consequences
We have some issues to take care of:
Some representatives might not be informative
Some relations between words might not even appear in the whole set (sparsity)
We may have redundancy in the features
The size of the feature vector is quadratic in the number of words
−→ Dimensionality Reduction
May 18th, 2011 Dimensionality Reduction for GOW Embedding 11/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
1 Graph of Words Embedding
2 Dimensionality Reduction
3 Experiments
4 Conclusions
May 18th, 2011 Dimensionality Reduction for GOW Embedding 12/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
DR methods
In order to solve the dimensionality problem we have applied two differentdimensionality reduction techniques
Kernel Principal Component Analysis (kPCA)
Non-linear extension of PCAUse the kernel matrix to discover linear behavior in the kernel spaceBack-project to find non-linear components of data
Independent Component Analysis(ICA)
Finds linearly uncorrelated componentsStatistically independent componentsMaximize the non-Gaussianity of data
May 18th, 2011 Dimensionality Reduction for GOW Embedding 13/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
DR methods
In order to solve the dimensionality problem we have applied two differentdimensionality reduction techniques
Kernel Principal Component Analysis (kPCA)Non-linear extension of PCAUse the kernel matrix to discover linear behavior in the kernel spaceBack-project to find non-linear components of data
Independent Component Analysis(ICA)
Finds linearly uncorrelated componentsStatistically independent componentsMaximize the non-Gaussianity of data
May 18th, 2011 Dimensionality Reduction for GOW Embedding 13/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
DR methods
In order to solve the dimensionality problem we have applied two differentdimensionality reduction techniques
Kernel Principal Component Analysis (kPCA)Non-linear extension of PCAUse the kernel matrix to discover linear behavior in the kernel spaceBack-project to find non-linear components of data
Independent Component Analysis(ICA)Finds linearly uncorrelated componentsStatistically independent componentsMaximize the non-Gaussianity of data
May 18th, 2011 Dimensionality Reduction for GOW Embedding 13/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
1 Graph of Words Embedding
2 Dimensionality Reduction
3 Experiments
4 Conclusions
May 18th, 2011 Dimensionality Reduction for GOW Embedding 14/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Aim of the experimentation
Discover the effect of the reduction by classification rates onThe raw vectorsThe kPCA reduced vectorsThe ICA reduced vectors
We use a kNN classifier and validate the meta-parameters using a validation set
May 18th, 2011 Dimensionality Reduction for GOW Embedding 15/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Databases
We use datasets from the IAM graphs database repository:
Letters (low distortion)
GREC
Fingerprints
May 18th, 2011 Dimensionality Reduction for GOW Embedding 16/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Databases
We use datasets from the IAM graphs database repository:
Letters (low distortion)
GREC
Fingerprints
May 18th, 2011 Dimensionality Reduction for GOW Embedding 16/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Databases
We use datasets from the IAM graphs database repository:
Letters (low distortion)
GREC
Fingerprints
May 18th, 2011 Dimensionality Reduction for GOW Embedding 16/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Databases
We use datasets from the IAM graphs database repository:
Letters (low distortion)
GREC
Fingerprints
May 18th, 2011 Dimensionality Reduction for GOW Embedding 16/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Results on validation
Selection of representatives: kMeans
0 20 40 60 80 1000.75
0.8
0.85
0.9
0.95
1
Original vectorskPCAICA
Letter
0 20 40 60 80 1000.5
0.6
0.7
0.8
0.9
1
Original vectorskPCAICA
GREC
0 20 40 60 80 1000.4
0.5
0.6
0.7
0.8
0.9
Original vectorskPCAICA
Fingerprint
Figure: Accuracy rate on the validation set (vertical axis) for every vocabulary size (horizontalaxis). These figures depict the comparison between the classification of the original vectors(without reduction) with the two dimensionality reduction techniques, kPCA and ICA.
May 18th, 2011 Dimensionality Reduction for GOW Embedding 17/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Results on test
Letter GREC Fingerprint
AR Dims AR Dims AR Dims
Original vectors 98.8 702 97.5 3402 77.7 189kPCA 97.6 43 97.1 67 80.6 108ICA 82.8 251 58.9 218 63.3 184
Results with kPCA are as good as with raw vectors while reducing dimensionality
ICA does not provide good results due to the correlation between features
May 18th, 2011 Dimensionality Reduction for GOW Embedding 18/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
1 Graph of Words Embedding
2 Dimensionality Reduction
3 Experiments
4 Conclusions
May 18th, 2011 Dimensionality Reduction for GOW Embedding 19/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Conclusions and future work
We have presented an embedding methodology for continuously labelled graphs
It exploits statistics of node labels by similarity to a set of representatives
Address the dimensionality problem it exhibits
Good results when using kPCA
More kernel functions on the kPCA framework to improve results
Other selection methods for the set of representatives
Combination of different sets of representatives
May 18th, 2011 Dimensionality Reduction for GOW Embedding 20/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Conclusions and future work
We have presented an embedding methodology for continuously labelled graphs
It exploits statistics of node labels by similarity to a set of representatives
Address the dimensionality problem it exhibits
Good results when using kPCA
More kernel functions on the kPCA framework to improve results
Other selection methods for the set of representatives
Combination of different sets of representatives
May 18th, 2011 Dimensionality Reduction for GOW Embedding 20/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Conclusions and future work
We have presented an embedding methodology for continuously labelled graphs
It exploits statistics of node labels by similarity to a set of representatives
Address the dimensionality problem it exhibits
Good results when using kPCA
More kernel functions on the kPCA framework to improve results
Other selection methods for the set of representatives
Combination of different sets of representatives
May 18th, 2011 Dimensionality Reduction for GOW Embedding 20/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Conclusions and future work
We have presented an embedding methodology for continuously labelled graphs
It exploits statistics of node labels by similarity to a set of representatives
Address the dimensionality problem it exhibits
Good results when using kPCA
More kernel functions on the kPCA framework to improve results
Other selection methods for the set of representatives
Combination of different sets of representatives
May 18th, 2011 Dimensionality Reduction for GOW Embedding 20/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Conclusions and future work
We have presented an embedding methodology for continuously labelled graphs
It exploits statistics of node labels by similarity to a set of representatives
Address the dimensionality problem it exhibits
Good results when using kPCA
More kernel functions on the kPCA framework to improve results
Other selection methods for the set of representatives
Combination of different sets of representatives
May 18th, 2011 Dimensionality Reduction for GOW Embedding 20/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Conclusions and future work
We have presented an embedding methodology for continuously labelled graphs
It exploits statistics of node labels by similarity to a set of representatives
Address the dimensionality problem it exhibits
Good results when using kPCA
More kernel functions on the kPCA framework to improve results
Other selection methods for the set of representatives
Combination of different sets of representatives
May 18th, 2011 Dimensionality Reduction for GOW Embedding 20/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Conclusions and future work
We have presented an embedding methodology for continuously labelled graphs
It exploits statistics of node labels by similarity to a set of representatives
Address the dimensionality problem it exhibits
Good results when using kPCA
More kernel functions on the kPCA framework to improve results
Other selection methods for the set of representatives
Combination of different sets of representatives
May 18th, 2011 Dimensionality Reduction for GOW Embedding 20/21
Graph of Words Embedding Dimensionality Reduction Experiments Conclusions
Thanks for the attention!
Questions?
May 18th, 2011 Dimensionality Reduction for GOW Embedding 21/21