description - instituto de computaçãoafalcao/mo445/description.pdf · a descriptor is...
TRANSCRIPT
Description
Alexandre Xavier Falcao
Institute of Computing - UNICAMP
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Introduction
A descriptor is an algorithm that extracts a feature vectorx(s) = (x1(s), x2(s), . . . , xn(s)) from any sample s ∈ Z, wherea sample may be defined as pixel, superpixel, the image of asegmented object, its shape, etc.
The descriptor may also include a distance function (e.g.,d(s, t) = ‖x(t)− x(s)‖) to compare the dissimilarity betweensamples s and t in the feature space.
In this module, we are interested in obtaining color, shape,and texture descriptors from the previously studied imagerepresentations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Introduction
A descriptor is an algorithm that extracts a feature vectorx(s) = (x1(s), x2(s), . . . , xn(s)) from any sample s ∈ Z, wherea sample may be defined as pixel, superpixel, the image of asegmented object, its shape, etc.
The descriptor may also include a distance function (e.g.,d(s, t) = ‖x(t)− x(s)‖) to compare the dissimilarity betweensamples s and t in the feature space.
In this module, we are interested in obtaining color, shape,and texture descriptors from the previously studied imagerepresentations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Introduction
A descriptor is an algorithm that extracts a feature vectorx(s) = (x1(s), x2(s), . . . , xn(s)) from any sample s ∈ Z, wherea sample may be defined as pixel, superpixel, the image of asegmented object, its shape, etc.
The descriptor may also include a distance function (e.g.,d(s, t) = ‖x(t)− x(s)‖) to compare the dissimilarity betweensamples s and t in the feature space.
In this module, we are interested in obtaining color, shape,and texture descriptors from the previously studied imagerepresentations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Agenda
We will focus on the following descriptors and a geneticprogramming strategy for their combination.
Color descriptors: Color histogram and BIC [8].
Texture descriptors (the most popular ones): LBP, HoG,BoVW, and CNN.
Shape descriptors using multiscale fractal dimension [3],saliences [2], and tensor scale [1, 7].
Descriptor combination by genetic programming [4].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Color histogram
The histogram h(i) of a grayscale image I = (DI , I ) is defined as
h(i) =∑∀p∈DI
δ (I (p)− i) ,
δ (I (p)− i) =
{1, when i = I (p),0, otherwise.
For color images I = (DI , I), where I(p) = (I1(p), I2(p), I3(p)) insome color space (e.g., RGB, YCbCr, Lab), this definition wouldlead to a sparse and, in this case, likely ineffective feature vectorx(s) = h, where s = I and xi (s) = h(i).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Color histogram
The histogram h(i) of a grayscale image I = (DI , I ) is defined as
h(i) =∑∀p∈DI
δ (I (p)− i) ,
δ (I (p)− i) =
{1, when i = I (p),0, otherwise.
For color images I = (DI , I), where I(p) = (I1(p), I2(p), I3(p)) insome color space (e.g., RGB, YCbCr, Lab), this definition wouldlead to a sparse and, in this case, likely ineffective feature vectorx(s) = h, where s = I and xi (s) = h(i).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Color histogram
The problem is addressed by dividing each axis of the color spaceinto fixed intervals, called bins. For a 64-bin histogram of a24-bit-RGB image, I1 = R, I2 = G , and I3 = B, the histogram h is
h(i) =∑∀p∈DI
δ (V (p)− i) ,
V (p) =R(p)
64+ 4
[G (p)
64
]+ 16
[B(p)
64
],
where V (p) ∈ [0, 63].
It is also common to create normalized
histograms by setting h(i)← h(i)|DI | , i = 1, 2, . . . , 64.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Color histogram
The problem is addressed by dividing each axis of the color spaceinto fixed intervals, called bins. For a 64-bin histogram of a24-bit-RGB image, I1 = R, I2 = G , and I3 = B, the histogram h is
h(i) =∑∀p∈DI
δ (V (p)− i) ,
V (p) =R(p)
64+ 4
[G (p)
64
]+ 16
[B(p)
64
],
where V (p) ∈ [0, 63]. It is also common to create normalized
histograms by setting h(i)← h(i)|DI | , i = 1, 2, . . . , 64.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
BIC — Border and Interior Classification
The BIC descriptor consists of three components.
A simple and yet effective pixel classification algorithm intoimage regions defined as either border (high frequency) orinterior (low frequency).
A compact region representation based on color histograms.
A logarithmic distance function to compare histograms fromtwo images.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
BIC — Border and Interior Classification
Let L(p) ∈ {0, 1} indicate when p is an interior or borderpixel.
Once V (p) is computed for each pixel p ∈ DI , a pixel p isclassified as follows.
L(p) =
{1, when ∃q ∈ A1(p) | V (q) 6= V (p), and0, otherwise.
The normalized color histograms h0 (interior) and h1 (border)of each region are computed and then quantized from 0 to255 by setting h0(i)← 255h0(i) and h1(i)← 255h1(i).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
BIC — Border and Interior Classification
Let L(p) ∈ {0, 1} indicate when p is an interior or borderpixel.
Once V (p) is computed for each pixel p ∈ DI , a pixel p isclassified as follows.
L(p) =
{1, when ∃q ∈ A1(p) | V (q) 6= V (p), and0, otherwise.
The normalized color histograms h0 (interior) and h1 (border)of each region are computed and then quantized from 0 to255 by setting h0(i)← 255h0(i) and h1(i)← 255h1(i).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
BIC — Border and Interior Classification
Let L(p) ∈ {0, 1} indicate when p is an interior or borderpixel.
Once V (p) is computed for each pixel p ∈ DI , a pixel p isclassified as follows.
L(p) =
{1, when ∃q ∈ A1(p) | V (q) 6= V (p), and0, otherwise.
The normalized color histograms h0 (interior) and h1 (border)of each region are computed and then quantized from 0 to255 by setting h0(i)← 255h0(i) and h1(i)← 255h1(i).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
BIC — Border and Interior Classification
Assuming the L1 metric, the logarithmic distance (log2) isencoded in the histograms by mapping hb(i), b = {0, 1} from[0, 255] to [0, 9] (4 bits per bin) as follows.
hb(i) ←
0 if hb(i) = 0,1 if hb(i) < 1,2 if hb(i) < 2,3 if hb(i) < 4,4 if hb(i) < 8,5 if hb(i) < 16,6 if hb(i) < 32,7 if hb(i) < 64,8 if hb(i) < 128,9 otherwise.
Finally, the histograms h0 and h1 are concatenated into asingle histogram x.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
BIC — Border and Interior Classification
Assuming the L1 metric, the logarithmic distance (log2) isencoded in the histograms by mapping hb(i), b = {0, 1} from[0, 255] to [0, 9] (4 bits per bin) as follows.
hb(i) ←
0 if hb(i) = 0,1 if hb(i) < 1,2 if hb(i) < 2,3 if hb(i) < 4,4 if hb(i) < 8,5 if hb(i) < 16,6 if hb(i) < 32,7 if hb(i) < 64,8 if hb(i) < 128,9 otherwise.
Finally, the histograms h0 and h1 are concatenated into asingle histogram x.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Local Binary Patterns (LBP)
For a given grayscale image I = (DI , I ) and adjacency setA√2(p) = {q1, q2, . . . , q8} for p ∈ DI .
A local binary pattern B(p) ∈ [0, 255] is assigned to p bysetting each bit bk(z), k = 1, 2, . . . , 8, of z = B(p) as
bk(z) ←{
1 if I (p) > I (qk),0 otherwise.
The histogram of the map B can be used as LBP featurevector.
Alternatively, DI can be divided into cells of N ×N pixels, oneLBP histogram can be extracted per cells, and the histogramsconcatenated into a single feature vector per image.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Local Binary Patterns (LBP)
For a given grayscale image I = (DI , I ) and adjacency setA√2(p) = {q1, q2, . . . , q8} for p ∈ DI .
A local binary pattern B(p) ∈ [0, 255] is assigned to p bysetting each bit bk(z), k = 1, 2, . . . , 8, of z = B(p) as
bk(z) ←{
1 if I (p) > I (qk),0 otherwise.
The histogram of the map B can be used as LBP featurevector.
Alternatively, DI can be divided into cells of N ×N pixels, oneLBP histogram can be extracted per cells, and the histogramsconcatenated into a single feature vector per image.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Local Binary Patterns (LBP)
For a given grayscale image I = (DI , I ) and adjacency setA√2(p) = {q1, q2, . . . , q8} for p ∈ DI .
A local binary pattern B(p) ∈ [0, 255] is assigned to p bysetting each bit bk(z), k = 1, 2, . . . , 8, of z = B(p) as
bk(z) ←{
1 if I (p) > I (qk),0 otherwise.
The histogram of the map B can be used as LBP featurevector.
Alternatively, DI can be divided into cells of N ×N pixels, oneLBP histogram can be extracted per cells, and the histogramsconcatenated into a single feature vector per image.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Local Binary Patterns (LBP)
However, the LBP histograms are not rotation invariant,which has inspired many variants.
Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.
The problem is also not critical, when the images are aligned.
The extension to color images can simply concatenate thehistograms from each band.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Local Binary Patterns (LBP)
However, the LBP histograms are not rotation invariant,which has inspired many variants.
Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.
The problem is also not critical, when the images are aligned.
The extension to color images can simply concatenate thehistograms from each band.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Local Binary Patterns (LBP)
However, the LBP histograms are not rotation invariant,which has inspired many variants.
Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.
The problem is also not critical, when the images are aligned.
The extension to color images can simply concatenate thehistograms from each band.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Local Binary Patterns (LBP)
However, the LBP histograms are not rotation invariant,which has inspired many variants.
Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.
The problem is also not critical, when the images are aligned.
The extension to color images can simply concatenate thehistograms from each band.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
In object detection, for instance, each sample is a subimage(called window) around a candidate object.
The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.
An example is the detection of car license plates in a grayscaleimage I = (DI , I ).
The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.
The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
In object detection, for instance, each sample is a subimage(called window) around a candidate object.
The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.
An example is the detection of car license plates in a grayscaleimage I = (DI , I ).
The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.
The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
In object detection, for instance, each sample is a subimage(called window) around a candidate object.
The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.
An example is the detection of car license plates in a grayscaleimage I = (DI , I ).
The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.
The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
In object detection, for instance, each sample is a subimage(called window) around a candidate object.
The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.
An example is the detection of car license plates in a grayscaleimage I = (DI , I ).
The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.
The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
In object detection, for instance, each sample is a subimage(called window) around a candidate object.
The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.
An example is the detection of car license plates in a grayscaleimage I = (DI , I ).
The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.
The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
As first step, the image intensities are normalized within aninterval [0− L] (e.g., by gamma correction).
I ′(p) = K
[I (p)
Imax
]γ,
where Imax = max∀p∈DI{I (p)}, γ > 0, and K = 2b − 1.
Now, for each window of size n1 ×m1 pixels around acandidate object, the HoG feature vector requires theestimation of a gradient vector ~g(p) at each pixel p.
~g(p) =∑
∀q∈Ar (p)
[I (q)− I (p)] exp
(−‖q − p‖2
2σ2
)~pq,
where σ = r/3, ~pq = q−p‖q−p‖ and r ≤ 1.
The magnitude ‖~g(p)‖ and orientation θ(p) (angle between~g(p) and x) are used as follows.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
As first step, the image intensities are normalized within aninterval [0− L] (e.g., by gamma correction).
I ′(p) = K
[I (p)
Imax
]γ,
where Imax = max∀p∈DI{I (p)}, γ > 0, and K = 2b − 1.
Now, for each window of size n1 ×m1 pixels around acandidate object, the HoG feature vector requires theestimation of a gradient vector ~g(p) at each pixel p.
~g(p) =∑
∀q∈Ar (p)
[I (q)− I (p)] exp
(−‖q − p‖2
2σ2
)~pq,
where σ = r/3, ~pq = q−p‖q−p‖ and r ≤ 1.
The magnitude ‖~g(p)‖ and orientation θ(p) (angle between~g(p) and x) are used as follows.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
As first step, the image intensities are normalized within aninterval [0− L] (e.g., by gamma correction).
I ′(p) = K
[I (p)
Imax
]γ,
where Imax = max∀p∈DI{I (p)}, γ > 0, and K = 2b − 1.
Now, for each window of size n1 ×m1 pixels around acandidate object, the HoG feature vector requires theestimation of a gradient vector ~g(p) at each pixel p.
~g(p) =∑
∀q∈Ar (p)
[I (q)− I (p)] exp
(−‖q − p‖2
2σ2
)~pq,
where σ = r/3, ~pq = q−p‖q−p‖ and r ≤ 1.
The magnitude ‖~g(p)‖ and orientation θ(p) (angle between~g(p) and x) are used as follows.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The window is further divided into an integer number of cellscontaining n2 ×m2 pixels each.
Window
object
candidate
pixel
Cell
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
One histogram of gradient orientations per cell is obtainedwith nb bins.
For nb = 9 bins, for instance, the bin 0 may be used toaccumulate votes from pixels whose ‖~g(p)‖ = 0 and theremaining bins store votes from pixels whose θ(p) falls within[0− 44], [45− 89], . . . , [315− 359], respectively.
The orientation θ(p) for hx(p) = gx (p)‖~g(p)‖ and hy (p) =
gy (p)‖~g(p)‖ is
defined as
θ(p) =
{180π cos−1(hx(p)) if hy (p) ≥ 0,
360− 180π cos−1(hx(p)) if hy (p) < 0.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
One histogram of gradient orientations per cell is obtainedwith nb bins.
For nb = 9 bins, for instance, the bin 0 may be used toaccumulate votes from pixels whose ‖~g(p)‖ = 0 and theremaining bins store votes from pixels whose θ(p) falls within[0− 44], [45− 89], . . . , [315− 359], respectively.
The orientation θ(p) for hx(p) = gx (p)‖~g(p)‖ and hy (p) =
gy (p)‖~g(p)‖ is
defined as
θ(p) =
{180π cos−1(hx(p)) if hy (p) ≥ 0,
360− 180π cos−1(hx(p)) if hy (p) < 0.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
One histogram of gradient orientations per cell is obtainedwith nb bins.
For nb = 9 bins, for instance, the bin 0 may be used toaccumulate votes from pixels whose ‖~g(p)‖ = 0 and theremaining bins store votes from pixels whose θ(p) falls within[0− 44], [45− 89], . . . , [315− 359], respectively.
The orientation θ(p) for hx(p) = gx (p)‖~g(p)‖ and hy (p) =
gy (p)‖~g(p)‖ is
defined as
θ(p) =
{180π cos−1(hx(p)) if hy (p) ≥ 0,
360− 180π cos−1(hx(p)) if hy (p) < 0.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
Each pixel p distributes ‖~g(p)‖ votes by trilinear interpolationbetween adjacent bins b1 and b2 of its four adjacent cellsq1, q2, q3, and q4.
p
q2
q3
q1
q4 p
q1,b1 q2,b1
q4,b1q3,b1
q1,b2 q2,b2
q4,b2q3,b2
Window
Cell
For θ = 30, for instance, b1 = 22 and b2 = 67, since thecenter of the 8 bins with non-zero gradient magnitude arerepresented by 22, 67, 112, 157, 202, 247, 292, and 337.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
Each pixel p distributes ‖~g(p)‖ votes by trilinear interpolationbetween adjacent bins b1 and b2 of its four adjacent cellsq1, q2, q3, and q4.
p
q2
q3
q1
q4 p
q1,b1 q2,b1
q4,b1q3,b1
q1,b2 q2,b2
q4,b2q3,b2
Window
Cell
For θ = 30, for instance, b1 = 22 and b2 = 67, since thecenter of the 8 bins with non-zero gradient magnitude arerepresented by 22, 67, 112, 157, 202, 247, 292, and 337.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.
Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.
Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.
The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.
Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.
Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.
The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.
Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.
Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.
The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.
Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.
Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.
The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The weight w = ‖~g(p)‖ is first distributed between points p1 andp2 on opposite faces, then the weights on the faces are distributedamong points p3, p4, p5, p6 of opposite edges, and finally the edgeweights are distributed to the vertices p7, p8, p9, p10, p11, p12, p13,and p14 of the corresponding edges.
p1p2
p3
p4
p5
p6
p7
p8 p9
p10
p11
p12p13
p14
p
x
yz
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The weights wi of each point pi = (xpi , ypi , zpi ), i = 1, 2, . . . , 14,are computed as
w1 = w(xp2 − xp)
(xp2 − xp1)
w2 = w(xp − xp1)
(xp2 − xp1)
w3 = w1(yp1 − yp4)
(yp3 − yp4)
w4 = w1(yp3 − yp1)
(yp3 − yp4)
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
w5 = w2(yp2 − yp6)
(yp5 − yp6)
w6 = w2(yp5 − yp2)
(yp5 − yp6)
w7 = w3(zp11 − zp3)
(zp11 − zp7)
w11 = w3(zp3 − zp7)
(yp11 − zp7)
w8 = w4(zp12 − zp4)
(zp12 − zp8)
w12 = w4(zp4 − zp8)
(zp12 − zp8)
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
w10 = w5(zp14 − zp5)
(zp14 − zp10)
w14 = w5(zp5 − zp10)
(zp14 − zp10)
w9 = w6(zp13 − zp6)
(zp13 − zp9)
w13 = w6(zp6 − zp9)
(zp13 − zp9)
Finally the weights wi are accumulated in the corresponding bin ofthe cell represented by pi , i = 7, 8, 9, 10, 11, 12, 13, 14.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
Now, each group of n3 ×m3 cells constitutes a block.
Adjacent blocks are defined by stride (displacement in x andy).
stride of one cell
Block of 2x2 cellsWindow
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
Now, each group of n3 ×m3 cells constitutes a block.
Adjacent blocks are defined by stride (displacement in x andy).
stride of one cell
Block of 2x2 cellsWindow
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
The cell histograms in each block are concatenated from left toright, top to bottom, and normalized, to treat contrast variations.Similarly, the block feature vectors are concatenated to output aHoG feature vector for the window.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
Let hk(i), i = 0, 1, . . . , nb − 1 and k = 1, 2, . . . , n3 ×m3, bethe cell histograms in a block with n3 ×m3 cells.
Their concatenation from left to right, top to bottom,generates a vector with features vj , j = 1, 2, . . . , nb × n3×m3.
These features are normalized as
vj =vj√∑nb×n3×m3
j=1 vjvj + ε
where ε is a small number.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
Let hk(i), i = 0, 1, . . . , nb − 1 and k = 1, 2, . . . , n3 ×m3, bethe cell histograms in a block with n3 ×m3 cells.
Their concatenation from left to right, top to bottom,generates a vector with features vj , j = 1, 2, . . . , nb × n3×m3.
These features are normalized as
vj =vj√∑nb×n3×m3
j=1 vjvj + ε
where ε is a small number.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
Let hk(i), i = 0, 1, . . . , nb − 1 and k = 1, 2, . . . , n3 ×m3, bethe cell histograms in a block with n3 ×m3 cells.
Their concatenation from left to right, top to bottom,generates a vector with features vj , j = 1, 2, . . . , nb × n3×m3.
These features are normalized as
vj =vj√∑nb×n3×m3
j=1 vjvj + ε
where ε is a small number.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.
If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.
The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.
The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.
If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.
The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.
The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.
If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.
The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.
The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Histogram of Oriented Gradients (HoG)
For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.
If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.
The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.
The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Let A, B, and C be examples of texts about different subjects.
A Bag of Words (BoW) is a dictionary of keywords identified asthe most frequent ones in texts from different subjects.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
By using the dictionary on the right to determine the frequency ofits words in A, B, and C, each text is represented by the followinghistograms.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
We expect higher similarity between the histograms of A and Bthen between any of them and the histogram of C.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
There are a couple of drawbacks in BoW that need more attention.
For a given subject (category), the choice of its most frequentkeywords is crucial, but BoW is unsupervised. How can weincorporate supervision in BoW?
Such keywords might not occur always in texts from thatsubject. How can we avoid errors of similarity with othersubjects?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
There are a couple of drawbacks in BoW that need more attention.
For a given subject (category), the choice of its most frequentkeywords is crucial, but BoW is unsupervised. How can weincorporate supervision in BoW?
Such keywords might not occur always in texts from thatsubject. How can we avoid errors of similarity with othersubjects?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
There are a couple of drawbacks in BoW that need more attention.
For a given subject (category), the choice of its most frequentkeywords is crucial, but BoW is unsupervised. How can weincorporate supervision in BoW?
Such keywords might not occur always in texts from thatsubject. How can we avoid errors of similarity with othersubjects?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
In Bag of Visual Words (BoVW), the problem is the same but
the words are the local features of image patches extractedfrom key locations and
the most frequent ones are the keywords, which aredetermined by patch clustering as the group representatives.
from MathWorks.com
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
from MathWorks.com
One can separate training images from each category, build andmerge the dictionaries, encode the training images, and train aclassifier, but....
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
from MathWorks.com
One can separate training images from each category, build andmerge the dictionaries, encode the training images, and train aclassifier, but....
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
from MathWorks.com
One can separate training images from each category, build andmerge the dictionaries, encode the training images, and train aclassifier, but....
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Can we identify the most discriminative visual words, whenmerging dictionaries, and take this into account whenencoding the images?
How do we choose the key patches, local features, clusteringtechnique, similarity function, image encoding rule?
By building a single dictionary, the image code becomessparse and high-dimensional. What are the most suitableclassifiers for this case?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Can we identify the most discriminative visual words, whenmerging dictionaries, and take this into account whenencoding the images?
How do we choose the key patches, local features, clusteringtechnique, similarity function, image encoding rule?
By building a single dictionary, the image code becomessparse and high-dimensional. What are the most suitableclassifiers for this case?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Can we identify the most discriminative visual words, whenmerging dictionaries, and take this into account whenencoding the images?
How do we choose the key patches, local features, clusteringtechnique, similarity function, image encoding rule?
By building a single dictionary, the image code becomessparse and high-dimensional. What are the most suitableclassifiers for this case?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Some visual words might be common for different categories,but be discriminative when used together with the otherwords.
The discriminative power of such words must be determinedby some feature selection technique.
Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.
The local features are usually color and texture features ofthose patches.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Some visual words might be common for different categories,but be discriminative when used together with the otherwords.
The discriminative power of such words must be determinedby some feature selection technique.
Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.
The local features are usually color and texture features ofthose patches.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Some visual words might be common for different categories,but be discriminative when used together with the otherwords.
The discriminative power of such words must be determinedby some feature selection technique.
Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.
The local features are usually color and texture features ofthose patches.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
Some visual words might be common for different categories,but be discriminative when used together with the otherwords.
The discriminative power of such words must be determinedby some feature selection technique.
Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.
The local features are usually color and texture features ofthose patches.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The most commonly used clustering technique is k-means,where k defines the size of the dictionary.
The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.
The images are usually coded by either hard or softassignment.
hard assigment: each patch counts for its closest visual wordonly.
soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The most commonly used clustering technique is k-means,where k defines the size of the dictionary.
The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.
The images are usually coded by either hard or softassignment.
hard assigment: each patch counts for its closest visual wordonly.
soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The most commonly used clustering technique is k-means,where k defines the size of the dictionary.
The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.
The images are usually coded by either hard or softassignment.
hard assigment: each patch counts for its closest visual wordonly.
soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The most commonly used clustering technique is k-means,where k defines the size of the dictionary.
The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.
The images are usually coded by either hard or softassignment.
hard assigment: each patch counts for its closest visual wordonly.
soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The most commonly used clustering technique is k-means,where k defines the size of the dictionary.
The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.
The images are usually coded by either hard or softassignment.
hard assigment: each patch counts for its closest visual wordonly.
soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.
The histograms lose the spatial information (localization) ofthe visual words in each image.
The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.
This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.
The histograms lose the spatial information (localization) ofthe visual words in each image.
The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.
This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.
The histograms lose the spatial information (localization) ofthe visual words in each image.
The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.
This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Bag of Visual Words (BoVW)
The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.
The histograms lose the spatial information (localization) ofthe visual words in each image.
The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.
This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
Each layer of a ConvNet may consist of four operations.
Convolution between the input image and a kernel bank,
neuron activation,
pooling, and
normalization.
By stacking multiple layers, one after the other, low-level texturefeatures are combined into high-level features for image description.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
The feature vector can be represented by the features of each pixelfrom left to right, top to bottom, after the third layer of the CNNfor classification by SVM, or as the last hidden layer of amulti-layer perceptron (MLP) classifier.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
The feature vector can be represented by the features of each pixelfrom left to right, top to bottom, after the third layer of the CNNfor classification by SVM, or as the last hidden layer of amulti-layer perceptron (MLP) classifier.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
L1 L2 L3L0
The feature vector can be represented by the features of each pixelfrom left to right, top to bottom, after the third layer of the CNNfor classification by SVM, or as the last hidden layer of amulti-layer perceptron (MLP) classifier.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
Each of the main operations in a CNN layer,
Convolution between the input image and a kernel bank,
neuron activation,
pooling, and
normalization,
can be explained as follows.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
First recall that the convolution between a grayscale imageI = (DI , I ) and a symmetric kernel (Ar ,W ) outputs agrayscale image J = (DJ , J), with
J(p) =K∑
k=1
I (qk)wk , ∀p ∈ DI ,
where Ar (p) = {q1, q2, . . . , qK}, r ≥ 1, andW = [w1,w2, . . . ,wK ].
However, the adjacency relation Ar is usually defined as a box
Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r
2and |yq − yp| ≤
r
2}
This operation is equivalent to the inner product 〈v(p),w〉between the vectors v(p) = (I (q1), I (q2), . . . , I (qK )) andw = (w1,w2, . . . ,wK ).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
First recall that the convolution between a grayscale imageI = (DI , I ) and a symmetric kernel (Ar ,W ) outputs agrayscale image J = (DJ , J), with
J(p) =K∑
k=1
I (qk)wk , ∀p ∈ DI ,
where Ar (p) = {q1, q2, . . . , qK}, r ≥ 1, andW = [w1,w2, . . . ,wK ].
However, the adjacency relation Ar is usually defined as a box
Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r
2and |yq − yp| ≤
r
2}
This operation is equivalent to the inner product 〈v(p),w〉between the vectors v(p) = (I (q1), I (q2), . . . , I (qK )) andw = (w1,w2, . . . ,wK ).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
First recall that the convolution between a grayscale imageI = (DI , I ) and a symmetric kernel (Ar ,W ) outputs agrayscale image J = (DJ , J), with
J(p) =K∑
k=1
I (qk)wk , ∀p ∈ DI ,
where Ar (p) = {q1, q2, . . . , qK}, r ≥ 1, andW = [w1,w2, . . . ,wK ].
However, the adjacency relation Ar is usually defined as a box
Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r
2and |yq − yp| ≤
r
2}
This operation is equivalent to the inner product 〈v(p),w〉between the vectors v(p) = (I (q1), I (q2), . . . , I (qK )) andw = (w1,w2, . . . ,wK ).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
One can also think of it as part of a perceptron (artificial neuron)operation at location p.
J(p) = 〈v(p),w〉
O(p) =
{J(p) + b, if J(p) + b > θ,0, otherwise,
where b is the bias and θ ≥ 0 is the neuron activation threshold.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
Graphically, v(p) is a point in <K and w is the normal vector of ahyperplane in <K . The bias b affects the position of the hyperplaneand θ selects points v(p) with some margin in its positive side.
Convolution is then followed by activation.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
Some of many alternatives for the activation function ψ(x),where x = J(p) + b, are
ψ(x) = max{0, x − θ} rectified linear unit (ReLu)
ψ(x) = ln(1 + exp(x)) softplus
ψ(x) =1
1 + exp(−x)logistic
ψ(x) = tan−1(x) arctan
A same activation function (e.g., ReLU) for all pixels p ∈ DI
in a given layer is common choice.
It is also common that ‖w‖ = 1. A CNN may also adoptkernels with mean weights equal to 0, specially when they arerandomly chosen. In this case, the bias can be zero for allpixels.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
Some of many alternatives for the activation function ψ(x),where x = J(p) + b, are
ψ(x) = max{0, x − θ} rectified linear unit (ReLu)
ψ(x) = ln(1 + exp(x)) softplus
ψ(x) =1
1 + exp(−x)logistic
ψ(x) = tan−1(x) arctan
A same activation function (e.g., ReLU) for all pixels p ∈ DI
in a given layer is common choice.
It is also common that ‖w‖ = 1. A CNN may also adoptkernels with mean weights equal to 0, specially when they arerandomly chosen. In this case, the bias can be zero for allpixels.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
Some of many alternatives for the activation function ψ(x),where x = J(p) + b, are
ψ(x) = max{0, x − θ} rectified linear unit (ReLu)
ψ(x) = ln(1 + exp(x)) softplus
ψ(x) =1
1 + exp(−x)logistic
ψ(x) = tan−1(x) arctan
A same activation function (e.g., ReLU) for all pixels p ∈ DI
in a given layer is common choice.
It is also common that ‖w‖ = 1. A CNN may also adoptkernels with mean weights equal to 0, specially when they arerandomly chosen. In this case, the bias can be zero for allpixels.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
For a kernel bank, W is a matrix N × K , the input image isrepresented by a matrix XI of size K ×m, m = |DI |, and theirconvolution becomes the matrix XJ = WXI of size N ×m, where
W =
w11 w12 . . . w1K
w21 w22 . . . w2K...
......
...wN1 wN2 . . . wNK
,
XI =
I (q11) I (q12) . . . I (q1m)I (q21) I (q22) . . . I (q2m)
......
......
I (qK1) I (qK2) . . . I (qKm)
, and
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
XJ =
J(p11) J(p12) . . . J(p1m)J(p21) J(p22) . . . J(p2m)
......
......
J(pN1) J(pN2) . . . J(pNm)
.
Each row in W contains a kernel i = 1, 2, . . . ,N of the bank,
each column in XI contains the intensities of the adjacentpixels qkj , k = 1, 2, . . . ,K , of each pixel pj , j = 1, 2, . . . ,m,and
each column in XJ represents the feature vector J(pj) of the
resulting multiband image J = (DI , J). It is also common toeliminate pixels at distance r to the image border, reducing itsdomain to DJ ⊂ DI .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
When I = (DI , I) is a multiband image,
the feature vectors I(qkj) are expanded along their respectivecolumns in XI ,
each kernel i = 1, 2, . . . ,N must be multiband with vectorialweights wik , k = 1, 2, . . . ,K , expanded along their respectiverows in W .
J(pij) =K∑
k=1
〈I(qkj),wik〉
The bias and activation are then applied to J = (DI , J) tocreate the output O = (DJ ,O).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
When I = (DI , I) is a multiband image,
the feature vectors I(qkj) are expanded along their respectivecolumns in XI ,
each kernel i = 1, 2, . . . ,N must be multiband with vectorialweights wik , k = 1, 2, . . . ,K , expanded along their respectiverows in W .
J(pij) =K∑
k=1
〈I(qkj),wik〉
The bias and activation are then applied to J = (DI , J) tocreate the output O = (DJ ,O).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
When I = (DI , I) is a multiband image,
the feature vectors I(qkj) are expanded along their respectivecolumns in XI ,
each kernel i = 1, 2, . . . ,N must be multiband with vectorialweights wik , k = 1, 2, . . . ,K , expanded along their respectiverows in W .
J(pij) =K∑
k=1
〈I(qkj),wik〉
The bias and activation are then applied to J = (DI , J) tocreate the output O = (DJ ,O).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
For a given box adjacency Ar of size r ≥ 1 around pixel p, thepooling operation aims at aggregating for p the activationsthat might have occurred nearby.
This operation can also be applied only to every r ≥ d ≥ 1pixels (stride).
It makes the process robust to slight object translations and,for d > 1, reduces the output image domain.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
For a given box adjacency Ar of size r ≥ 1 around pixel p, thepooling operation aims at aggregating for p the activationsthat might have occurred nearby.
This operation can also be applied only to every r ≥ d ≥ 1pixels (stride).
It makes the process robust to slight object translations and,for d > 1, reduces the output image domain.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
For a given box adjacency Ar of size r ≥ 1 around pixel p, thepooling operation aims at aggregating for p the activationsthat might have occurred nearby.
This operation can also be applied only to every r ≥ d ≥ 1pixels (stride).
It makes the process robust to slight object translations and,for d > 1, reduces the output image domain.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
The pooling applied to image O = (DJ ,O) creates an imageP = (DP ,P), DP ⊂ DJ , P = (P1,P2, . . . ,PN),
Pi (p) = α
√ ∑∀q∈Ar (p)
Oi (q)α,
for i = 1, 2, . . . ,N, and α ≥ 1 controls the operation from
additive to max-pooling.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Texture: Convolutional Network (ConvNets)
For a given box adjacency set Ar (p) = {q1, q2, . . . , qK} of sizer ≥ 1 around pixel p, the normalization enhances the isolated(relevant) activations in detriment of the others and outputs imageQ = (DQ ,Q), DQ ⊂ DP , Q = (Q1,Q2, . . . ,QN), and
Qi (p) =Qi (p)√∑N
i=1
∑Kk=1Qi (qk)Qi (qk)
.
The process continues by having Q as input of the next layer.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Multiscale fractal dimension
The fractal dimension of a 2D point set S (contour, skeleton) byMinkowski-Bouligand is a number F ∈ [0, 2],
F = 2− limr→0
ln(A(r))
ln(r),
where A(r) is the number of propagated points (area) when S isdilated by a disk of radius r .
The fractal dimension represents the self-similarity of S whenr tends to zero.
Note that A can be obtained from the cumulative histogramof the Euclidean distance map of S upto the distance r .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Multiscale fractal dimension
The fractal dimension of a 2D point set S (contour, skeleton) byMinkowski-Bouligand is a number F ∈ [0, 2],
F = 2− limr→0
ln(A(r))
ln(r),
where A(r) is the number of propagated points (area) when S isdilated by a disk of radius r .
The fractal dimension represents the self-similarity of S whenr tends to zero.
Note that A can be obtained from the cumulative histogramof the Euclidean distance map of S upto the distance r .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Multiscale fractal dimension
The fractal dimension of a 2D point set S (contour, skeleton) byMinkowski-Bouligand is a number F ∈ [0, 2],
F = 2− limr→0
ln(A(r))
ln(r),
where A(r) is the number of propagated points (area) when S isdilated by a disk of radius r .
The fractal dimension represents the self-similarity of S whenr tends to zero.
Note that A can be obtained from the cumulative histogramof the Euclidean distance map of S upto the distance r .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Multiscale fractal dimension
It is known, for instance, that the fractal dimension of a koch star(on the left) is F ≈ 1.26 [3].
8.5
9
9.5
10
10.5
11
11.5
12
12.5
13
1 2 3 4 5 6
Log(A
)
Log(r)
observed valuesfitted straight line
One can then fit a line to the logarithmic curve of the cumulativehistogram of the EDT (on the right) and use the slope m of theline to estimate F = 2−m ≈ 1.23.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Multiscale fractal dimension
By fitting a polynomial curve (on the left) and computing its firstderivative, the resulting curve F (on the right) is a feature vectorof the shape, called multiscale fractal dimension.
8.5
9
9.5
10
10.5
11
11.5
12
12.5
13
1 2 3 4 5 6
Log(A
)
Log(r)
observed valuespolynomial regression
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5F
ract
al D
imen
sion (
F)
Log(r)
Note that its maximum value (on the right) is ≈ 1.25 for somevalue ln(r) ∈ [3.5, 4].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Point/segment saliences
The dilation of a contour S by a disk of small radius r (e.g.,r = 10) shows that the outside area Aout(p) of the influence zoneof a point p ∈ S is higher than its inside area Ain(p), when p isconvex, and the other way around when it is concave [2].
��������������������
��������������������
��������
��������
��������
���������
���������
A
B
Such areas come from the root histogram of the EDT, which canbe normalized for the purpose of shape description.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Point/segment saliences
By considering only the salience points p ∈ S, a point-saliencefeature vector can encode the respective positive (convex) andnegative (concave) areas.
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0 0.2 0.4 0.6 0.8 1
Infl
uen
ce A
rea
Contour point
A B
C
D E
FG
H I
J
K
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Point/segment saliences
Similar idea works for a segment-salience feature vector, whenreplacing points by contour segments around the salience points.
Note that, in any case, the distance function must account forpossible shape rotations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape: Tensor scale
The tensor scale of each pixel contains orientation θ(p) andanisotropy α(p) values. One can divide the contour S intosegments and compute a weighted angular mean θ for eachsegment, using α(p) as weight and points p of its internalinfluence zone R [1].
θ = arctan
(∑∀p∈R α(p) sin(2θ(p))∑∀p∈R α(p) cos(2θ(p))
)
The tensor-scale feature vector is composed by these weightedangular mean values.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape Descriptor based on tensor scale
The distance function must consider possible rotations, requiring ashape matching for distance computation.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Shape Descriptor based on tensor scale
The distance function must consider possible rotations, requiring ashape matching for distance computation.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
Let {D1,D2, . . . ,Dk} be a set of descriptors such thatDi = (vi , di ), i = 1, 2, . . . , k , consists of
an algorithm vi that extracts feature vectors vi (s) and vi (t)from samples s and t, and
a distance function di that assigns a dissimilarity value di (s, t)in the feature space between samples s and t (e.g.,‖vi (t)− vi (s)‖).
The distance functions di , i = 1, 2, . . . , k , can be combinedinto a single distance function d using, for instance, GeneticProgramming (GP) [4].
The resulting descriptor D is called a composite descriptor.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
Let {D1,D2, . . . ,Dk} be a set of descriptors such thatDi = (vi , di ), i = 1, 2, . . . , k , consists of
an algorithm vi that extracts feature vectors vi (s) and vi (t)from samples s and t, and
a distance function di that assigns a dissimilarity value di (s, t)in the feature space between samples s and t (e.g.,‖vi (t)− vi (s)‖).
The distance functions di , i = 1, 2, . . . , k , can be combinedinto a single distance function d using, for instance, GeneticProgramming (GP) [4].
The resulting descriptor D is called a composite descriptor.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
Let {D1,D2, . . . ,Dk} be a set of descriptors such thatDi = (vi , di ), i = 1, 2, . . . , k , consists of
an algorithm vi that extracts feature vectors vi (s) and vi (t)from samples s and t, and
a distance function di that assigns a dissimilarity value di (s, t)in the feature space between samples s and t (e.g.,‖vi (t)− vi (s)‖).
The distance functions di , i = 1, 2, . . . , k , can be combinedinto a single distance function d using, for instance, GeneticProgramming (GP) [4].
The resulting descriptor D is called a composite descriptor.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
Illustration of a composite descriptor D∗ with the GP combiner C(one may use any other optimization technique).
D*
Dk
D2
D1
d1(s,t)
d2(s,t)
dk(s,t)
*(s,t)d
(b)
......C
t
s
vi
t
vi
s
vi(t)vi(s)
di
di (s,t)
Di
(a)
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
GP is an Artificial Intelligence technique based on biologicalprinciples of heritage and evolution.
Each candidate solution is an individual of a population, asrepresented by a data structure (e.g., tree, list, stack) whosenodes are mathematical operations, rather than a sequence ofnumbers, such as in genetic algorithms.
From some initial random population, the most promisingindividuals pass through genetic transformations (e.g.,mutations) that make the population more diverse andsuitable to solve the problem.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
GP is an Artificial Intelligence technique based on biologicalprinciples of heritage and evolution.
Each candidate solution is an individual of a population, asrepresented by a data structure (e.g., tree, list, stack) whosenodes are mathematical operations, rather than a sequence ofnumbers, such as in genetic algorithms.
From some initial random population, the most promisingindividuals pass through genetic transformations (e.g.,mutations) that make the population more diverse andsuitable to solve the problem.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
GP is an Artificial Intelligence technique based on biologicalprinciples of heritage and evolution.
Each candidate solution is an individual of a population, asrepresented by a data structure (e.g., tree, list, stack) whosenodes are mathematical operations, rather than a sequence ofnumbers, such as in genetic algorithms.
From some initial random population, the most promisingindividuals pass through genetic transformations (e.g.,mutations) that make the population more diverse andsuitable to solve the problem.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
Example of individual, where the distances di , i = 1, 2, . . . , k, arethe terminal nodes of a binary tree and the remaining nodes areother mathematical operations.
* 1d
1d 2d
/
+
sqrt
3d
d
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
In addition to the mathematical operations (see examples in [5]),
the reproduction selects the most effective individuals to thenext population,
the crossover exchanges subtrees between selected individualsto increase diversity, generating new trees (sons), and
the mutation replaces a subtree of a selected individual byanother subtree randomly chosen.
The individuals are assessed by a fitness function, which can be theaccuracy of classification in a validation set.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Combining descriptors by genetic programming
The algorithm can be sketched as follows.
1. Generate an initial random population (first generation).
2. For each generation from a given maximum number do.
3. Evaluate each individual by the fitness function.
4. Select a number of the most effective ones.
5. Generate the next population by reproduction, crossover,
and mutation of the selected individuals.
6. Select the best individual as the final solution.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
[1] F.A. Andalo, P.A.V. Miranda, R. da S. Torres, and A.X.Falcao.Shape feature extraction and description based on tensor scale.Pattern Recognition, 43(1):26 – 36, 2010.
[2] R. da S. Torres and A.X. Falcao.Contour salience descriptors for effective image retrieval andanalysis.Image and Vision Computing, 25(1):3 – 13, 2007.
[3] R. da S. Torres, A.X. Falcao, and L. da F. Costa.A graph-based approach for multiscale shape analysis.Pattern Recognition, 37(6):1163 – 1174, 2004.
[4] R. da S. Torres, A.X. Falcao, M.A. Goncalves, J.P. Papa,B. Zhang, W. Fan, and E.A. Fox.A genetic programming framework for content-based imageretrieval.Pattern Recognition, 42(2):283 – 292, 2009.Learning Semantics from Multimedia Content.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
[5] R. da S. Torres, A.X. Falcao, M.A. Goncalves, B. Zhang,W. Fan, and E.A. Fox.A new framework to combine descriptors for content-basedimage retrieval.Technical Report IC-05-21, Institute of Computing, Universityof Campinas, September 2005.
[6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning.MIT Press, 2016.http://www.deeplearningbook.org.
[7] P. A. V. Miranda, R. da S. Torres, and A. X. Falcao.TSD: A shape descriptor based on a distribution of tensor scalelocal orientation.In XVIII Brazilian Symposium on Computer Graphics andImage Processing (SIBGRAPI’05), pages 139–146, Oct 2005.
[8] Renato O. Stehling, Mario A. Nascimento, and Alexandre X.Falcao.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
A compact and efficient image retrieval approach based onborder/interior pixel classification.In Proceedings of the Eleventh International Conference onInformation and Knowledge Management, pages 102–109,2002.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis