description - instituto de computaçãoafalcao/mo445/description.pdf · a descriptor is...

Description

Alexandre Xavier Falcao

Institute of Computing - UNICAMP

[email protected]

Alexandre Xavier Falcao MC940/MO445 - Image Analysis

Introduction

A descriptor is an algorithm that extracts a feature vectorx(s) = (x1(s), x2(s), . . . , xn(s)) from any sample s ∈ Z, wherea sample may be defined as pixel, superpixel, the image of asegmented object, its shape, etc.

The descriptor may also include a distance function (e.g.,d(s, t) = ‖x(t)− x(s)‖) to compare the dissimilarity betweensamples s and t in the feature space.

In this module, we are interested in obtaining color, shape,and texture descriptors from the previously studied imagerepresentations.

Introduction

Agenda

We will focus on the following descriptors and a geneticprogramming strategy for their combination.

Color descriptors: Color histogram and BIC [8].

Texture descriptors (the most popular ones): LBP, HoG,BoVW, and CNN.

Shape descriptors using multiscale fractal dimension [3],saliences [2], and tensor scale [1, 7].

Descriptor combination by genetic programming [4].

Color histogram

The histogram h(i) of a grayscale image I = (DI , I ) is defined as

h(i) =∑∀p∈DI

δ (I (p)− i) ,

δ (I (p)− i) =

{1, when i = I (p),0, otherwise.

For color images I = (DI , I), where I(p) = (I1(p), I2(p), I3(p)) insome color space (e.g., RGB, YCbCr, Lab), this definition wouldlead to a sparse and, in this case, likely ineffective feature vectorx(s) = h, where s = I and xi (s) = h(i).

Color histogram

The histogram h(i) of a grayscale image I = (DI , I ) is defined as

h(i) =∑∀p∈DI

δ (I (p)− i) ,

δ (I (p)− i) =

{1, when i = I (p),0, otherwise.

For color images I = (DI , I), where I(p) = (I1(p), I2(p), I3(p)) insome color space (e.g., RGB, YCbCr, Lab), this definition wouldlead to a sparse and, in this case, likely ineffective feature vectorx(s) = h, where s = I and xi (s) = h(i).

Color histogram

The problem is addressed by dividing each axis of the color spaceinto fixed intervals, called bins. For a 64-bin histogram of a24-bit-RGB image, I1 = R, I2 = G , and I3 = B, the histogram h is

h(i) =∑∀p∈DI

δ (V (p)− i) ,

V (p) =R(p)

[G (p)

where V (p) ∈ [0, 63].

It is also common to create normalized

histograms by setting h(i)← h(i)|DI | , i = 1, 2, . . . , 64.

Color histogram

The problem is addressed by dividing each axis of the color spaceinto fixed intervals, called bins. For a 64-bin histogram of a24-bit-RGB image, I1 = R, I2 = G , and I3 = B, the histogram h is

h(i) =∑∀p∈DI

δ (V (p)− i) ,

V (p) =R(p)

[G (p)

where V (p) ∈ [0, 63]. It is also common to create normalized

histograms by setting h(i)← h(i)|DI | , i = 1, 2, . . . , 64.

BIC — Border and Interior Classification

The BIC descriptor consists of three components.

A simple and yet effective pixel classification algorithm intoimage regions defined as either border (high frequency) orinterior (low frequency).

A compact region representation based on color histograms.

A logarithmic distance function to compare histograms fromtwo images.

Let L(p) ∈ {0, 1} indicate when p is an interior or borderpixel.

Once V (p) is computed for each pixel p ∈ DI , a pixel p isclassified as follows.

L(p) =

{1, when ∃q ∈ A1(p) | V (q) 6= V (p), and0, otherwise.

The normalized color histograms h0 (interior) and h1 (border)of each region are computed and then quantized from 0 to255 by setting h0(i)← 255h0(i) and h1(i)← 255h1(i).

L(p) =

Assuming the L1 metric, the logarithmic distance (log2) isencoded in the histograms by mapping hb(i), b = {0, 1} from[0, 255] to [0, 9] (4 bits per bin) as follows.

hb(i) ←

0 if hb(i) = 0,1 if hb(i) < 1,2 if hb(i) < 2,3 if hb(i) < 4,4 if hb(i) < 8,5 if hb(i) < 16,6 if hb(i) < 32,7 if hb(i) < 64,8 if hb(i) < 128,9 otherwise.

Finally, the histograms h0 and h1 are concatenated into asingle histogram x.

Assuming the L1 metric, the logarithmic distance (log2) isencoded in the histograms by mapping hb(i), b = {0, 1} from[0, 255] to [0, 9] (4 bits per bin) as follows.

hb(i) ←

0 if hb(i) = 0,1 if hb(i) < 1,2 if hb(i) < 2,3 if hb(i) < 4,4 if hb(i) < 8,5 if hb(i) < 16,6 if hb(i) < 32,7 if hb(i) < 64,8 if hb(i) < 128,9 otherwise.

Finally, the histograms h0 and h1 are concatenated into asingle histogram x.

Texture: Local Binary Patterns (LBP)

For a given grayscale image I = (DI , I ) and adjacency setA√2(p) = {q1, q2, . . . , q8} for p ∈ DI .

A local binary pattern B(p) ∈ [0, 255] is assigned to p bysetting each bit bk(z), k = 1, 2, . . . , 8, of z = B(p) as

bk(z) ←{

1 if I (p) > I (qk),0 otherwise.

The histogram of the map B can be used as LBP featurevector.

Alternatively, DI can be divided into cells of N ×N pixels, oneLBP histogram can be extracted per cells, and the histogramsconcatenated into a single feature vector per image.

bk(z) ←{

However, the LBP histograms are not rotation invariant,which has inspired many variants.

Some of them treat rotation in the distance function andothers incorporate rotation invariance in the feature vector.

The problem is also not critical, when the images are aligned.

The extension to color images can simply concatenate thehistograms from each band.

Texture: Histogram of Oriented Gradients (HoG)

In object detection, for instance, each sample is a subimage(called window) around a candidate object.

The candidate objects reduce the number of windows foranalysis and they can be obtained by segmentation and simplecomponent analysis.

An example is the detection of car license plates in a grayscaleimage I = (DI , I ).

The problem can be reduced to extract a HoG feature vector(or its concatenation with LBP) inside each window forpattern classification as car license plate or background.

The extension to color images can simply concatenate theHoG feature vectors of each band inside the window.

As first step, the image intensities are normalized within aninterval [0− L] (e.g., by gamma correction).

I ′(p) = K

[I (p)

where Imax = max∀p∈DI{I (p)}, γ > 0, and K = 2b − 1.

Now, for each window of size n1 ×m1 pixels around acandidate object, the HoG feature vector requires theestimation of a gradient vector ~g(p) at each pixel p.

~g(p) =∑

∀q∈Ar (p)

[I (q)− I (p)] exp

(−‖q − p‖2

where σ = r/3, ~pq = q−p‖q−p‖ and r ≤ 1.

The magnitude ‖~g(p)‖ and orientation θ(p) (angle between~g(p) and x) are used as follows.

I ′(p) = K

[I (p)

~g(p) =∑

∀q∈Ar (p)

[I (q)− I (p)] exp

(−‖q − p‖2

I ′(p) = K

[I (p)

~g(p) =∑

∀q∈Ar (p)

[I (q)− I (p)] exp

(−‖q − p‖2

The window is further divided into an integer number of cellscontaining n2 ×m2 pixels each.

Window

object

candidate

One histogram of gradient orientations per cell is obtainedwith nb bins.

For nb = 9 bins, for instance, the bin 0 may be used toaccumulate votes from pixels whose ‖~g(p)‖ = 0 and theremaining bins store votes from pixels whose θ(p) falls within[0− 44], [45− 89], . . . , [315− 359], respectively.

The orientation θ(p) for hx(p) = gx (p)‖~g(p)‖ and hy (p) =

gy (p)‖~g(p)‖ is

defined as

θ(p) =

{180π cos−1(hx(p)) if hy (p) ≥ 0,

360− 180π cos−1(hx(p)) if hy (p) < 0.

defined as

θ(p) =

{180π cos−1(hx(p)) if hy (p) ≥ 0,

360− 180π cos−1(hx(p)) if hy (p) < 0.

defined as

θ(p) =

{180π cos−1(hx(p)) if hy (p) ≥ 0,

360− 180π cos−1(hx(p)) if hy (p) < 0.

Each pixel p distributes ‖~g(p)‖ votes by trilinear interpolationbetween adjacent bins b1 and b2 of its four adjacent cellsq1, q2, q3, and q4.

q1,b1 q2,b1

q4,b1q3,b1

q1,b2 q2,b2

q4,b2q3,b2

Window

For θ = 30, for instance, b1 = 22 and b2 = 67, since thecenter of the 8 bins with non-zero gradient magnitude arerepresented by 22, 67, 112, 157, 202, 247, 292, and 337.

Each pixel p distributes ‖~g(p)‖ votes by trilinear interpolationbetween adjacent bins b1 and b2 of its four adjacent cellsq1, q2, q3, and q4.

q1,b1 q2,b1

q4,b1q3,b1

q1,b2 q2,b2

q4,b2q3,b2

Window

For θ = 30, for instance, b1 = 22 and b2 = 67, since thecenter of the 8 bins with non-zero gradient magnitude arerepresented by 22, 67, 112, 157, 202, 247, 292, and 337.

The distribution of votes aims to treat relevant pixels withhigh gradient magnitude that might fall in adjacent cells.

Let (xp, yp, zp), zp = θ(p), be the coordinate of p in a 3Dspace.

Let (xi , yi ) be the center of the cell qi , i = 1, 2, 3, 4 and(q1, b1), (q2, b1), (q3, b1), (q4, b1), (q1, b2), (q2, b2), (q3, b2),and (q4, b2) be the 8 vertices (xi , yi , zi ), i = 1, 2, . . . , 8,around p.

The gradient magnitude w = ‖~g(p)‖ is a weight distributedamong the 8 vertices by trilinear interpolation.

The weight w = ‖~g(p)‖ is first distributed between points p1 andp2 on opposite faces, then the weights on the faces are distributedamong points p3, p4, p5, p6 of opposite edges, and finally the edgeweights are distributed to the vertices p7, p8, p9, p10, p11, p12, p13,and p14 of the corresponding edges.

p12p13

The weights wi of each point pi = (xpi , ypi , zpi ), i = 1, 2, . . . , 14,are computed as

w1 = w(xp2 − xp)

(xp2 − xp1)

w2 = w(xp − xp1)

(xp2 − xp1)

w3 = w1(yp1 − yp4)

(yp3 − yp4)

w4 = w1(yp3 − yp1)

(yp3 − yp4)

w5 = w2(yp2 − yp6)

(yp5 − yp6)

w6 = w2(yp5 − yp2)

(yp5 − yp6)

w7 = w3(zp11 − zp3)

(zp11 − zp7)

w11 = w3(zp3 − zp7)

(yp11 − zp7)

w8 = w4(zp12 − zp4)

(zp12 − zp8)

w12 = w4(zp4 − zp8)

(zp12 − zp8)

w10 = w5(zp14 − zp5)

(zp14 − zp10)

w14 = w5(zp5 − zp10)

(zp14 − zp10)

w9 = w6(zp13 − zp6)

(zp13 − zp9)

w13 = w6(zp6 − zp9)

(zp13 − zp9)

Finally the weights wi are accumulated in the corresponding bin ofthe cell represented by pi , i = 7, 8, 9, 10, 11, 12, 13, 14.

Now, each group of n3 ×m3 cells constitutes a block.

Adjacent blocks are defined by stride (displacement in x andy).

stride of one cell

Block of 2x2 cellsWindow

Now, each group of n3 ×m3 cells constitutes a block.

Adjacent blocks are defined by stride (displacement in x andy).

stride of one cell

Block of 2x2 cellsWindow

The cell histograms in each block are concatenated from left toright, top to bottom, and normalized, to treat contrast variations.Similarly, the block feature vectors are concatenated to output aHoG feature vector for the window.

Let hk(i), i = 0, 1, . . . , nb − 1 and k = 1, 2, . . . , n3 ×m3, bethe cell histograms in a block with n3 ×m3 cells.

Their concatenation from left to right, top to bottom,generates a vector with features vj , j = 1, 2, . . . , nb × n3×m3.

These features are normalized as

vj =vj√∑nb×n3×m3

j=1 vjvj + ε

where ε is a small number.

j=1 vjvj + ε

For instance, for a window with 126× 36 pixels and cells with6× 6 pixels, each window contains 21× 6 cells.

If each block is defined by 2× 2 cells and the stride is 1 cell inx and y , each window generates 20× 5 blocks.

The four cell histograms of 9 bins in each block areconcatenated and normalized to compose a vector of 36features per block.

The feature vectors of the blocks are then concatenated toform a HoG vector with 20× 5× 36 features.

Texture: Bag of Visual Words (BoVW)

Let A, B, and C be examples of texts about different subjects.

A Bag of Words (BoW) is a dictionary of keywords identified asthe most frequent ones in texts from different subjects.

By using the dictionary on the right to determine the frequency ofits words in A, B, and C, each text is represented by the followinghistograms.

We expect higher similarity between the histograms of A and Bthen between any of them and the histogram of C.

There are a couple of drawbacks in BoW that need more attention.

For a given subject (category), the choice of its most frequentkeywords is crucial, but BoW is unsupervised. How can weincorporate supervision in BoW?

Such keywords might not occur always in texts from thatsubject. How can we avoid errors of similarity with othersubjects?

In Bag of Visual Words (BoVW), the problem is the same but

the words are the local features of image patches extractedfrom key locations and

the most frequent ones are the keywords, which aredetermined by patch clustering as the group representatives.

from MathWorks.com

One can separate training images from each category, build andmerge the dictionaries, encode the training images, and train aclassifier, but....

from MathWorks.com

Can we identify the most discriminative visual words, whenmerging dictionaries, and take this into account whenencoding the images?

How do we choose the key patches, local features, clusteringtechnique, similarity function, image encoding rule?

By building a single dictionary, the image code becomessparse and high-dimensional. What are the most suitableclassifiers for this case?

Some visual words might be common for different categories,but be discriminative when used together with the otherwords.

The discriminative power of such words must be determinedby some feature selection technique.

Sampling techniques, such as Scale Invariant FeatureTransform (SIFT), grid, random, have been used to locate keypatches.

The local features are usually color and texture features ofthose patches.

The most commonly used clustering technique is k-means,where k defines the size of the dictionary.

The similarity function between patches (and visual words) isinversely proportional to a distance function (e.g., cosine,Euclidean, etc) suitable for the type of local description.

The images are usually coded by either hard or softassignment.

hard assigment: each patch counts for its closest visual wordonly.

soft assignment: each patch counts for all visual words withcount directly proportional to the similarity between patch andword.

The resulting feature vectors (histograms) ask forclassification techniques such as support vector machines andmulti-layer perceptron.

The histograms lose the spatial information (localization) ofthe visual words in each image.

The convolution between the image and a filter bank(dictionary) is equivalent to select patches for all pixels,compute the similarity to all kernels (visual words), such as insoft assignment, and store the result in a multiband image,preserving all spatial and texture information.

This is the strategy in Convolutional Networks (CNNs orConvNets) for image description [6].

Texture: Convolutional Network (ConvNets)

Each layer of a ConvNet may consist of four operations.

Convolution between the input image and a kernel bank,

neuron activation,

pooling, and

normalization.

By stacking multiple layers, one after the other, low-level texturefeatures are combined into high-level features for image description.

The feature vector can be represented by the features of each pixelfrom left to right, top to bottom, after the third layer of the CNNfor classification by SVM, or as the last hidden layer of amulti-layer perceptron (MLP) classifier.

L1 L2 L3L0

Each of the main operations in a CNN layer,

Convolution between the input image and a kernel bank,

neuron activation,

pooling, and

normalization,

can be explained as follows.

First recall that the convolution between a grayscale imageI = (DI , I ) and a symmetric kernel (Ar ,W ) outputs agrayscale image J = (DJ , J), with

J(p) =K∑

I (qk)wk , ∀p ∈ DI ,

where Ar (p) = {q1, q2, . . . , qK}, r ≥ 1, andW = [w1,w2, . . . ,wK ].

However, the adjacency relation Ar is usually defined as a box

Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r

2and |yq − yp| ≤

This operation is equivalent to the inner product 〈v(p),w〉between the vectors v(p) = (I (q1), I (q2), . . . , I (qK )) andw = (w1,w2, . . . ,wK ).

J(p) =K∑

Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r

J(p) =K∑

Ar : {(p, q) ∈ DI × DI | |xq − xp| ≤r

One can also think of it as part of a perceptron (artificial neuron)operation at location p.

J(p) = 〈v(p),w〉

O(p) =

{J(p) + b, if J(p) + b > θ,0, otherwise,

where b is the bias and θ ≥ 0 is the neuron activation threshold.

Graphically, v(p) is a point in <K and w is the normal vector of ahyperplane in <K . The bias b affects the position of the hyperplaneand θ selects points v(p) with some margin in its positive side.

Convolution is then followed by activation.

Some of many alternatives for the activation function ψ(x),where x = J(p) + b, are

ψ(x) = max{0, x − θ} rectified linear unit (ReLu)

ψ(x) = ln(1 + exp(x)) softplus

ψ(x) =1

1 + exp(−x)logistic

ψ(x) = tan−1(x) arctan

A same activation function (e.g., ReLU) for all pixels p ∈ DI

in a given layer is common choice.

It is also common that ‖w‖ = 1. A CNN may also adoptkernels with mean weights equal to 0, specially when they arerandomly chosen. In this case, the bias can be zero for allpixels.

ψ(x) =1

For a kernel bank, W is a matrix N × K , the input image isrepresented by a matrix XI of size K ×m, m = |DI |, and theirconvolution becomes the matrix XJ = WXI of size N ×m, where

w11 w12 . . . w1K

w21 w22 . . . w2K...

......

...wN1 wN2 . . . wNK

I (q11) I (q12) . . . I (q1m)I (q21) I (q22) . . . I (q2m)

......

I (qK1) I (qK2) . . . I (qKm)

J(p11) J(p12) . . . J(p1m)J(p21) J(p22) . . . J(p2m)

......

J(pN1) J(pN2) . . . J(pNm)

Each row in W contains a kernel i = 1, 2, . . . ,N of the bank,

each column in XI contains the intensities of the adjacentpixels qkj , k = 1, 2, . . . ,K , of each pixel pj , j = 1, 2, . . . ,m,and

each column in XJ represents the feature vector J(pj) of the

resulting multiband image J = (DI , J). It is also common toeliminate pixels at distance r to the image border, reducing itsdomain to DJ ⊂ DI .

When I = (DI , I) is a multiband image,

the feature vectors I(qkj) are expanded along their respectivecolumns in XI ,

each kernel i = 1, 2, . . . ,N must be multiband with vectorialweights wik , k = 1, 2, . . . ,K , expanded along their respectiverows in W .

J(pij) =K∑

〈I(qkj),wik〉

The bias and activation are then applied to J = (DI , J) tocreate the output O = (DJ ,O).

J(pij) =K∑

〈I(qkj),wik〉

J(pij) =K∑

〈I(qkj),wik〉

For a given box adjacency Ar of size r ≥ 1 around pixel p, thepooling operation aims at aggregating for p the activationsthat might have occurred nearby.

This operation can also be applied only to every r ≥ d ≥ 1pixels (stride).

It makes the process robust to slight object translations and,for d > 1, reduces the output image domain.

The pooling applied to image O = (DJ ,O) creates an imageP = (DP ,P), DP ⊂ DJ , P = (P1,P2, . . . ,PN),

Pi (p) = α

√ ∑∀q∈Ar (p)

Oi (q)α,

for i = 1, 2, . . . ,N, and α ≥ 1 controls the operation from

additive to max-pooling.

For a given box adjacency set Ar (p) = {q1, q2, . . . , qK} of sizer ≥ 1 around pixel p, the normalization enhances the isolated(relevant) activations in detriment of the others and outputs imageQ = (DQ ,Q), DQ ⊂ DP , Q = (Q1,Q2, . . . ,QN), and

Qi (p) =Qi (p)√∑N

∑Kk=1Qi (qk)Qi (qk)

The process continues by having Q as input of the next layer.

Shape: Multiscale fractal dimension

The fractal dimension of a 2D point set S (contour, skeleton) byMinkowski-Bouligand is a number F ∈ [0, 2],

F = 2− limr→0

ln(A(r))

ln(r),

where A(r) is the number of propagated points (area) when S isdilated by a disk of radius r .

The fractal dimension represents the self-similarity of S whenr tends to zero.

Note that A can be obtained from the cumulative histogramof the Euclidean distance map of S upto the distance r .

F = 2− limr→0

ln(A(r))

ln(r),

F = 2− limr→0

ln(A(r))

ln(r),

It is known, for instance, that the fractal dimension of a koch star(on the left) is F ≈ 1.26 [3].

1 2 3 4 5 6

Log(r)

observed valuesfitted straight line

One can then fit a line to the logarithmic curve of the cumulativehistogram of the EDT (on the right) and use the slope m of theline to estimate F = 2−m ≈ 1.23.

By fitting a polynomial curve (on the left) and computing its firstderivative, the resulting curve F (on the right) is a feature vectorof the shape, called multiscale fractal dimension.

1 2 3 4 5 6

Log(r)

observed valuespolynomial regression

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5F

sion (

Log(r)

Note that its maximum value (on the right) is ≈ 1.25 for somevalue ln(r) ∈ [3.5, 4].

Shape: Point/segment saliences

The dilation of a contour S by a disk of small radius r (e.g.,r = 10) shows that the outside area Aout(p) of the influence zoneof a point p ∈ S is higher than its inside area Ain(p), when p isconvex, and the other way around when it is concave [2].

��

Such areas come from the root histogram of the EDT, which canbe normalized for the purpose of shape description.

By considering only the salience points p ∈ S, a point-saliencefeature vector can encode the respective positive (convex) andnegative (concave) areas.

0 0.2 0.4 0.6 0.8 1

Contour point

Similar idea works for a segment-salience feature vector, whenreplacing points by contour segments around the salience points.

Note that, in any case, the distance function must account forpossible shape rotations.

Shape: Tensor scale

The tensor scale of each pixel contains orientation θ(p) andanisotropy α(p) values. One can divide the contour S intosegments and compute a weighted angular mean θ for eachsegment, using α(p) as weight and points p of its internalinfluence zone R [1].

θ = arctan

(∑∀p∈R α(p) sin(2θ(p))∑∀p∈R α(p) cos(2θ(p))

The tensor-scale feature vector is composed by these weightedangular mean values.

Shape Descriptor based on tensor scale

The distance function must consider possible rotations, requiring ashape matching for distance computation.

Shape Descriptor based on tensor scale

The distance function must consider possible rotations, requiring ashape matching for distance computation.

Combining descriptors by genetic programming

Let {D1,D2, . . . ,Dk} be a set of descriptors such thatDi = (vi , di ), i = 1, 2, . . . , k , consists of

an algorithm vi that extracts feature vectors vi (s) and vi (t)from samples s and t, and

a distance function di that assigns a dissimilarity value di (s, t)in the feature space between samples s and t (e.g.,‖vi (t)− vi (s)‖).

The distance functions di , i = 1, 2, . . . , k , can be combinedinto a single distance function d using, for instance, GeneticProgramming (GP) [4].

The resulting descriptor D is called a composite descriptor.

Illustration of a composite descriptor D∗ with the GP combiner C(one may use any other optimization technique).

d1(s,t)

d2(s,t)

dk(s,t)

*(s,t)d

......C

vi(t)vi(s)

di (s,t)

GP is an Artificial Intelligence technique based on biologicalprinciples of heritage and evolution.

Each candidate solution is an individual of a population, asrepresented by a data structure (e.g., tree, list, stack) whosenodes are mathematical operations, rather than a sequence ofnumbers, such as in genetic algorithms.

From some initial random population, the most promisingindividuals pass through genetic transformations (e.g.,mutations) that make the population more diverse andsuitable to solve the problem.

Example of individual, where the distances di , i = 1, 2, . . . , k, arethe terminal nodes of a binary tree and the remaining nodes areother mathematical operations.

In addition to the mathematical operations (see examples in [5]),

the reproduction selects the most effective individuals to thenext population,

the crossover exchanges subtrees between selected individualsto increase diversity, generating new trees (sons), and

the mutation replaces a subtree of a selected individual byanother subtree randomly chosen.

The individuals are assessed by a fitness function, which can be theaccuracy of classification in a validation set.

The algorithm can be sketched as follows.

1. Generate an initial random population (first generation).

2. For each generation from a given maximum number do.

3. Evaluate each individual by the fitness function.

4. Select a number of the most effective ones.

5. Generate the next population by reproduction, crossover,

and mutation of the selected individuals.

6. Select the best individual as the final solution.

[1] F.A. Andalo, P.A.V. Miranda, R. da S. Torres, and A.X.Falcao.Shape feature extraction and description based on tensor scale.Pattern Recognition, 43(1):26 – 36, 2010.

[2] R. da S. Torres and A.X. Falcao.Contour salience descriptors for effective image retrieval andanalysis.Image and Vision Computing, 25(1):3 – 13, 2007.

[3] R. da S. Torres, A.X. Falcao, and L. da F. Costa.A graph-based approach for multiscale shape analysis.Pattern Recognition, 37(6):1163 – 1174, 2004.

[4] R. da S. Torres, A.X. Falcao, M.A. Goncalves, J.P. Papa,B. Zhang, W. Fan, and E.A. Fox.A genetic programming framework for content-based imageretrieval.Pattern Recognition, 42(2):283 – 292, 2009.Learning Semantics from Multimedia Content.

[5] R. da S. Torres, A.X. Falcao, M.A. Goncalves, B. Zhang,W. Fan, and E.A. Fox.A new framework to combine descriptors for content-basedimage retrieval.Technical Report IC-05-21, Institute of Computing, Universityof Campinas, September 2005.

[6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning.MIT Press, 2016.http://www.deeplearningbook.org.

[7] P. A. V. Miranda, R. da S. Torres, and A. X. Falcao.TSD: A shape descriptor based on a distribution of tensor scalelocal orientation.In XVIII Brazilian Symposium on Computer Graphics andImage Processing (SIBGRAPI’05), pages 139–146, Oct 2005.

[8] Renato O. Stehling, Mario A. Nascimento, and Alexandre X.Falcao.

A compact and efficient image retrieval approach based onborder/interior pixel classification.In Proceedings of the Eleventh International Conference onInformation and Knowledge Management, pages 102–109,2002.

description - instituto de computaçãoafalcao/mo445/description.pdf · a descriptor is...

Documents