![Page 1: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/1.jpg)
Image Analysis & Retrieval
CS/EE 5590 Special Topics (Class Ids: 44873, 44874)
Fall 2016, M/W 4-5:15pm@Bloch 0012
Lec 08
Feature Aggregation II:
Fisher Vector, Super Vector and AKULA
Zhu Li
Dept of CSEE, UMKC
Office: FH560E, Email: [email protected], Ph: x 2346.
http://l.web.umkc.edu/lizhu
p.1Z. Li, Image Analysis & Retrv. 2016
![Page 2: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/2.jpg)
Outline
ReCap of Lecture 07
Image Retrieval System
BoW
VLAD
Dense SIFT
Fisher Vector Aggregation
AKULA
Summary
Z. Li, Image Analysis & Retrv. 2016 p.2
![Page 3: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/3.jpg)
Precision, Recall, F-measure
Precision, TPR = TP/(TP + FP),
Recall = TP/(TP + FN),
FPR=FP/(TP+FP)
F-measure
= 2*(precision*recall)/(precision + recall)
Precision:is the probability that a
retrieved document is relevant.
Recall:is the probability that a
relevant documentis retrieved in a search .
Z. Li, Image Analysis & Retrv. 2016 p.3
![Page 4: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/4.jpg)
Why Aggregation ?
Curse of Dimensionality
Decision Boundary / Indexing
Z. Li, Image Analysis & Retrv. 2016 p.4
+
β¦..
![Page 5: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/5.jpg)
Bag-of-Words: Histogram Coding
Codebook:
Feature space: Rd, k-means to get k centroids, {π1, π2, β¦ , ππ}
BoW Hard Encoding:
For n feature points,{x1, x2, β¦,xn} assignment matrix: kxn, with column only 1-non zero entry
Aggregated dimension: k
Z. Li, Image Analysis & Retrv. 2016 p.5
k
n
![Page 6: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/6.jpg)
Kernel Code Book Soft Encoding
Kernel Code Book Soft Encoding
Kernel Affinity: πΎ π₯π , ππ = πβπ|π₯πβππ|
2
Assignment Matrix: π΄π,π = πΎ(π₯π , ππ)/ ππΎ(π₯π , ππ)
Encoding: k-dimensional: X(k)= 1
π ππ΄π,π
Z. Li, Image Analysis & Retrv. 2016 p.6
![Page 7: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/7.jpg)
VLAD- Vector of Locally Aggregated Descriptors
Aggregate feature difference from the codebook
Hard assignment by finding the NN of feature {xk} to {ππ}
Compute aggregated differences
L2 normalize
Final feature: k x d
Z. Li, Image Analysis & Retrv. 2016 p.7
3
x
v1 v2v3 v4
v5
1
4
2
5
β assign descriptors
β‘ compute x- i
β’ vi=sum x- i for cell i
π£π =
βπ,π .π‘.ππ π₯π =ππ
π₯π β ππ
π£π = π£π/ π£π 2
![Page 8: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/8.jpg)
VLAD on SIFT
Example of aggregating SIFT with VLAD
K=16 codebook entries
Each cell is a SIFT visualized as centroids in blue, and VLAD difference in red
Top row: left image, bottom row: right image, red: code book, blue: encoded VLAD
Z. Li, Image Analysis & Retrv. 2016 p.8
![Page 9: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/9.jpg)
Outline
ReCap of Lecture 07
Image Retrieval System
BoW
VLAD
Dense SIFT
Fisher Vector Aggregation
AKULA
Summary
Z. Li, Image Analysis & Retrv. 2016 p.9
![Page 10: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/10.jpg)
One more trick
Recall that SIFT is a powerful descriptor
VL_FEAT: vl_dsift
A dense description of image by computing SIFT descriptor (no spatial-scale space extrema detection) at predetermined grid
Supplement HoG as an alternative texture descriptor
Z. Li, Image Analysis & Retrv. 2016 p.10
![Page 11: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/11.jpg)
VL_FEAT: vl_dsift
Compute dense SIFT as a texture descriptor for the image
[f, dsift]=vl_dsift(single(rgb2gray(im)), βstepβ, 2);
Thereβs also a FAST option
[f, dsift]=vl_dsift(single(rgb2gray(im)), βfastβ, βstepβ, 2);
Huge amount of SIFT data will be generated
Z. Li, Image Analysis & Retrv. 2016 p.11
![Page 12: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/12.jpg)
Fisher Vector
Fisher Vector and variations:
Winning in image classification:
Winning in the MPEG object re-identification:o SCFV(Scalable Coded Fisher Vec) in CDVS
Z. Li, Image Analysis & Retrv. 2016 p.12
![Page 13: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/13.jpg)
Codebook: Gaussian Mixture Model (GMM)
GMM is a generative model to express data
Assuming data is generated from with parameters {π€π , ππ , ππ}
Z. Li, Image Analysis & Retrv. 2016 p.13
π₯π ~
π=1
πΎ
π€ππ(ππ,ππ)
π ππ ,ππ =1
2ππ2 Ξ£π
1/2πβ12π₯βππ
β²Ξ£πβ1(π₯βππ)
![Page 14: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/14.jpg)
A bit of Theory: Fisher Kernel
Encode the derivation from the generative model
Observed feature set, {x1, x2, β¦,xn} in Rd, e.g, d=128 for SIFT.
Howβs these observations derivate from the given GMM model with a set of parameter, π = π€π , ππ , ππ ?o i.e, how the parameter, e.g, mean will move to best fit the observation
?
Z. Li, Image Analysis & Retrv. 2016 p.14
π4π3
π2
π1
X1 +
![Page 15: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/15.jpg)
A bit of Theory: Fisher Kernel
Score function w.r.t. the likelihood function ππ(π) πΊππ = π»π logπ’π(π): derivative on the log likelihood
The dimension of score function is m, where m is the number of generative model parameters, m=3 for GMM
Given the observed data X, score function indicate how likelihood function parameter (e.g, mean) should move to better fit the data.
Distance/Derivation of two observation X, Y w.r.t the generative model
Fisher Info Matrix (roughly the covariance in the Mahanolibisdistance)
πΉπ = πΈπ πΊπππΊππβ²
Fisher Kernel Distance: normalized by the Fisher Info Matrix:
Z. Li, Image Analysis & Retrv. 2016 p.15
πΎπΉπΎ π, π = πΊππβ²πΉπβ1πΊππ
![Page 16: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/16.jpg)
Fisher Vector
KFK(X, Y) is a measure of similarity, w.r.t. the generative model
Similar to the Mahanolibis distance case, we can decompose this kernel as,
That give us a kernel feature mapping of X to Fisher Vector
For observed images features {xt}, can be computed as,
Z. Li, Image Analysis & Retrv. 2016 p.16
πΎπΉπΎ π,π = πΊππβ²πΉπβ1πΊππ = πΊπ
πβ²πΏπβ²πΏππΊππ
![Page 17: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/17.jpg)
GMM Fisher Vector
Encode the derivation from the generative model Observed feature set, {x1, x2, β¦,xn} in Rd, e.g, d=128 (!) for SIFT.
Howβs these observations derivate from the given GMM model with a set of parameter, π = ππ, ππ, ππ ?
GMM Log Likelihood Gradient
Let π€π =πππ
π πππ
, Then we have
Z. Li, Image Analysis & Retrv. 2016 p.17
weight
mean
variance
![Page 18: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/18.jpg)
GMM Fisher Vector VL_FEAT implementation
GMM codebook
For a K-component GMM, we only allow 3K parameters, ππ , ππ ,ππ π = 1. . πΎ}, i.e, iid Gaussian component
Posterior prob of feature point xi to GMM component k
Z. Li, Image Analysis & Retrv. 2016 p.18
Ξ£π =
ππ 0 0 00 ππ 0 0β¦.ππ
![Page 19: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/19.jpg)
GMM Fisher Vector VL_FEAT implementation
FV encoding
Gradient on the mean, for GMM component k, j=1..D
In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances
Z. Li, Image Analysis & Retrv. 2016 p.19
πΉπ = [π’1 , π’2 ,β¦ ,π’πΎ , π£1 , π£2 , β¦ , π£πΎ]
![Page 20: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/20.jpg)
VL_FEAT GMM/FV API
Compute GMM model with VL_FEAT
Prepare data:numPoints = 1000 ; dimension = 2 ;
data = rand(dimension,N) ;
Call vl_gmm:numClusters = 30 ;
[means, covariances, priors] = vl_gmm(data, numClusters) ;
Visualize:figure ;
hold on ;
plot(data(1,:),data(2,:),'r.') ;
for i=1:numClusters
vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]);
end
Z. Li, Image Analysis & Retrv. 2016 p.20
![Page 21: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/21.jpg)
VL_FEAT API
FV encoding
encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);
Bonus points:
Encode HoG features with Fisher Vector ?
randomly collect 2~3 images from each class
Stack all HoG features together into an n x 36 data matrix
Compute its GMM
Use this GMM to encode all image HoG features (other than average)
Z. Li, Image Analysis & Retrv. 2016 p.21
![Page 22: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/22.jpg)
Super Vector Aggregation β Speaker ID
Fisher Vector: Aggregates Features against a GMM
Super Vector: Aggregates GMM against GMM
Ref:
o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5): 308-311 (2006)
Z. Li, Image Analysis & Retrv. 2016 p.22
βYes, We Can !β
?
![Page 23: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/23.jpg)
Super Vector from MFCC
Motivated from Speaker ID work Speech is a continuous evolution of the vocal tract
Need to extract a sequence of spectra or sequence of spectral coefficients
Use a sliding window - 25 ms window, 10 ms shift
Z. Li, Image Analysis & Retrv. 2016 p.23
DCTLog|X(Ο)|
MFCC
![Page 24: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/24.jpg)
GMM Model from MFCC
GMM on MFCC feature
Z. Li, Image Analysis & Retrv. 2016 p.24
M
j
s
j
s
j
s
j
s pp1
)()()()( ),|()|( xx
β’ The acoustic vectors (MFCC) of speaker s is modeled by a
prob. density function parameterized by M
j
s
j
s
j
s
j
s
1
)()()()( },,{
β’ Gaussian mixture model (GMM) for speaker s:
M
j
s
j
s
j
s
j
s
1
)()()()( },,{
![Page 25: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/25.jpg)
Universal Background Model
UBM GMM Model:
Z. Li, Image Analysis & Retrv. 2016 p.25
M
j
jjj pp1
)ubm()ubm()ubm()ubm( ),|()|( xx
β’ The acoustic vectors of a general population is modeled by
another GMM called the universal background model
(UBM):
β’ Parameters of the UBM
M
jjjj 1
)ubm()ubm()ubm()ubm( },,{
![Page 26: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/26.jpg)
MAP Adaption
Given the UBM GMM, how is the new observation derivate ?
The adapted mean is given by:
Z. Li, Image Analysis & Retrv. 2016 p.26
![Page 27: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/27.jpg)
Supervector Distance
Assuming we have UBM GMM model
πππ΅π = {ππ , ππ, Ξ£π},
with identical prior and covariance
Then for two utterance samples a and b, with GMM models ππ = {ππ , ππ
π, Ξ£π},
ππ = {ππ, πππ, Ξ£π},
The SV distance is,
It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metric
This is also a linear kernel function scaled by the UBM covariances
Z. Li, Image Analysis & Retrv. 2016 p.27
πΎ ππ , ππ =
π
ππΞ£πβ(12)πππ
π
( ππΞ£πβ(12)πππ)
![Page 28: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/28.jpg)
Supervector Performance in NIST Speaker ID
System 5: Gaussian SV
DCF (Detection Cost Function)
Z. Li, Image Analysis & Retrv. 2016 p.28
![Page 29: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/29.jpg)
m31491
AKULA β Adaptive KLUster Aggregation
2013/10/25
Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park
Z. Li, Image Analysis & Retrv. 2016 p.29
![Page 30: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/30.jpg)
Outline
Motivation
Adaptive Aggregation
Results with TM7
Summary
Z. Li, Image Analysis & Retrv. 2016 p.30
![Page 31: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/31.jpg)
Motivation
Better Aggregation
Fisher Vector and VLAD type aggregation depending on a global model
AKULA removes this dependence, and directly coding the cluster centroids and sift count
SCFV/RVD all having situations where clusters are turned off due to no assignment, this can be avoided in AKULA
SIFT detection & selection K-means AKULA description
Z. Li, Image Analysis & Retrv. 2016 p.31
![Page 32: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/32.jpg)
Motivation
Better Subspace Choice
Both SCFV and RVD do fixed normalization and PCA projection based on heuristic.
What is the best possible subspace to do the aggregation ?
Using a boosting scheme to keep adding subspaces and aggregations in an iterative fashion, and tune TPR-FPR to the desired operating points on FPR.
Z. Li, Image Analysis & Retrv. 2016 p.32
![Page 33: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/33.jpg)
CE2: AKULA β Adaptive KLUster Aggregation
AKULA Descriptor: cluster centroids + SIFT count
A2={yc21, yc2
2, β¦, yc2k ; pc2
1, pc22, β¦, pc2
k }
Distance metric:
Min centroids distance, weighted by SIFT count
d A1 , A2 =1
π π=0
πdπππ1 π π€πππ
1 (π) +1
π π=0
πdπππ2 π π€πππ
2 (π)
A1={yc11, yc1
2, β¦, yc1k ; pc1
1, pc12, β¦, pc1
k },
dπππ1 π = min
πππ,π
dπππ2 π = min
πππ,π
wπππ1 π = π€π,πβ , πβ = πππmin
πππ,π
wπππ2 π = π€πβ,π , πβ = πππmin
πππ,π
Z. Li, Image Analysis & Retrv. 2016 p.33
![Page 34: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/34.jpg)
AKULA implementation in TM7
Inner loop aggregation
Dimension is fixed at 8
Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256 bytes
Quantization: scale by Β½ and quantized to int8, sift count is 8 bits, total (nc+1)*dim bytes per aggregation
Z. Li, Image Analysis & Retrv. 2016 p.34
![Page 35: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/35.jpg)
AKULA implementation in TM7
Outer loop subspace optimization by boosting
Initial set of subspace models {Ak} computed from MIR FLICKR data set SIFT extractions by k-means the space to 4096 clusters
Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall
Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.
The algorithm is still being debugged, hence only having 1st iteration results in TM7
Z. Li, Image Analysis & Retrv. 2016 p.35
![Page 36: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/36.jpg)
AKULA implementation in TM7
Outer loop subspace optimization by boosting Initial set of subspace models {Ak} computed from MIR
FLICKR data set SIFT extractions by k-means the space to 4096 clusters
Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall
Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.
The algorithm is still being debugged, hence only having 1st iteration results in TM7
Indexing/Hashing is required for AKULA, it involves nc x dim multiplications and additions at this time. A binarization scheme will be considered once its performance is optimized in non-binary form.
Z. Li, Image Analysis & Retrv. 2016 p.36
![Page 37: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/37.jpg)
GD Only TPR-FPR: AKULA vs SCFV
Data set 1:
AKULA (128bytes, dim=8, nc=16) distance is just 1-way dmin1.*wt
Forcing a weighted sum on SCFV (512 bytes) hamming distances without 2D decision fitting, i.e, count hamming distance between common active clusters, and sum up their distances
Z. Li, Image Analysis & Retrv. 2016 p.37
![Page 38: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/38.jpg)
GD Only TPR-FPR: AKULA vs SCFV
Data set 2, 3:
AKULA distance is just 1-way dmin1.*wt
AKULA=128bytes, SCFV = 512 bytes.
Z. Li, Image Analysis & Retrv. 2016 p.38
![Page 39: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/39.jpg)
3D object set: 4 , 5
Data set4, 5:
Z. Li, Image Analysis & Retrv. 2016 p.39
![Page 40: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/40.jpg)
AKULA in PM
FPR performance:
AKULA rates:
pm rates m akula rates
512 8 64
1K 16 128
2K 16 128
1K_4K 16 128
2K_4K 16 128
4K 16 128
8K 32 256
16K 32 256
Z. Li, Image Analysis & Retrv. 2016 p.40
![Page 41: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/41.jpg)
TPR@1% FPR
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR
(%
)
bitrate: 512
TM7
AKULA
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5T
PR
(%
)
bitrate: 1k
TM7
AKULA
Z. Li, Image Analysis & Retrv. 2016 p.41
![Page 42: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/42.jpg)
TPR@1%FPR:
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR
(%
)
bitrate: 2k
TM7
AKULA
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR
(%
)
bitrate: 1k-4k
TM7
AKULA
Z. Li, Image Analysis & Retrv. 2016 p.42
![Page 43: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/43.jpg)
TPR@1%FPR:
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR
(%
)
bitrate: 2k-4k
TM7
AKULA
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR
(%
)
bitrate: 4k
TM7
AKULA
Z. Li, Image Analysis & Retrv. 2016 p.43
![Page 44: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/44.jpg)
TPR@1%FPR:
75
80
85
90
95
100
105
1a 1b 1c 2 3 4 5
TPR
(%
)
bitrate: 8k
TM7
AKULA
80
85
90
95
100
105
1a 1b 1c 2 3 4 5
TPR
(%
)
bitrate: 16k
TM7
AKULA
Z. Li, Image Analysis & Retrv. 2016 p.44
![Page 45: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/45.jpg)
AKULA Localization
Quite some improvements: 2.7%
Z. Li, Image Analysis & Retrv. 2016 p.45
![Page 46: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/46.jpg)
AKULA Summary
Benefits: Allow more DoF in aggregation optimization,
o by an outer loop boosting scheme for subspace projection optimization
o And an inner loop adaptive clustering without the constraint of the global GMM model
Simple weighted distance sum metric, with no need to tune a multi-dimensional decision boundary
The overall pair wise matching matched up with TM7 SCFV with 2-dimensional decision boundary
In GD only matching outperforms the TM7 GD
Good improvements to the localization accuracy
Light in extraction, but still heavy in pair wise matching, and need binarization scheme and/or indexing scheme to work for retrieval
Future Improvements: Supervector AKULA ?
Z. Li, Image Analysis & Retrv. 2016 p.46
![Page 47: Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector](https://reader031.vdocument.in/reader031/viewer/2022021420/58720d241a28ab176b8b7c51/html5/thumbnails/47.jpg)
Lec 08 Summary
Fisher Vector
Aggregate features {Xk} in RD
against GMM
Super Vector
Aggregate GMM against a global GMM (UBM)
AKULA
Direct Aggregation
Z. Li, Image Analysis & Retrv. 2016 p.47
++ + +