journal of la a bio-inspired monogenic cnn layer for … · 2020. 6. 23. · journal of latex class...

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

A Bio-inspired Monogenic CNN Layer forIllumination-Contrast Invariance

E. Ulises Moya-Sanchez Member, IEEE, Sebastian Salazar Colores, Sebastia Xambo-Descamps, AbrahamSanchez and Ulises Cortes

Abstract—Deep learning (DL) is attracting considerable inter-est as it currently achieves remarkable performance in manybranches of science and technology. However, current DL cannotguarantee capabilities of the mammalian visual systems such aslighting changes and rotation equivariance. This paper proposesan entry layer capable of classifying images even with low-contrast conditions. We achieve this by means of an improvedversion of monogenic wavelets inspired by some physiologicalexperimental results (such as the stronger response of primaryvisual cortex to oriented lines and edges). We have simulatedthe atmospheric degradation of the CIFAR-10 and the Dogs andCats datasets to generate realistic illumination degradations ofthe images. The most important result is that the accuracy gainedby using our layer is substantially more robust to illuminationchanges than nets without such a layer.

Index Terms—Monogenic Signal, Convolutional Neural Net-works, Bio-inspired models, Geometric Deep Learning.

I. INTRODUCTION

MANY authors argue that currently no suitable theoryof CNNs design is available [1], [2], [3], [4], [5].

Although some evidence supports that the depth (number oflayers) [6], and the data augmentation during the training pro-cess [7], can occasionally provide invariance or equivariancerelative to some class of transformations, the reasons for thatbehaviour do not seem to be well understood. Some investiga-tions indicate that the learning of an invariance response mayfail even with very deep CNNs or by large data augmentationsin the training [1], [3], [5], [8].

To overcome these shortcomings, one idea is to embracesuitable geometric methods, as in [9], where the main tech-niques are real algebraic varieties and methods of computeralgebra, and in [1], [10], [3], [4], [11], [2], [12], in whichmethods of differential geometry are used. In this regard,another strategy to progress in “Geometric Deep Learning”is to use Geometric Calculus (GC) in the sense of [13], [14],[15], [16]. The main strong points for this advance are the longhistory of achievements in a great variety of fields (see [14],§6.4, and the references therein); that it includes the complex

E. Ulises Moya-Sanchez is a postdoctoral researcher at Barcelona Su-percomputing Center, Barcelona, Spain. and Researcher at UniversidadAutonoma de Guadalajara, Guadalajara Mexico

Sebastian Salazar is a PhD student at Universidad Autonoma de Queretaro,Facultad de informatica, Queretaro, Mexico.

Sebastia Xambo-Descamps is Emeritus Full Professor of Mathematics,UPC-Barcelona Tech and Barcelona Supercomputing Center, Barcelona,Spain.

Abraham, Sanchez Perez is an artificial intelligence analyst at Jaliscogovernment.

Ulises Cortes is Full Professor, UPC-Barcelona Tech and Barcelona Super-computing Center, Barcelona, Spain.

Manuscript received , 2019; revised , 2019

numbers and the quaternions as very special cases; and thefact that there is a well-developed theory of GC wavelets withthe potential to be applied to DL much as scalar wavelets areused in current DL techniques [17], [18], [19]. An additionalbonus of GC is that the representation of the signals occurs ina higher dimensional space and hence they provide naturallya stronger discrimination capacity.

In this work, as the first step in this general strategy,we work with Hamilton’s quaternions H, which is the moststraightforward geometric calculus beyond the complex num-bers C (see appendix A). The main results are the design andimplementation of a CNN layer (M6) based on the monogenicsignal proposed by Feslberg et al. [20]. These layers substan-tially enhance the invariance response to illumination-contrast.As we will see, they reproduce to a good extent characteristicproperties of the mammal primary cortex V1.

Up till now, quaternions have been used with fully con-nected NNs [21], [22], [23], [24], and, more recently, withCNNs [25], [26], [27], [28]. In this context, the methodproposed in this paper is the first, to the best of our knowledge,that combines a CNN with local phase computations usingquaternions.

On the experimental side, to evaluate the predictive per-formance of M6, we have simulated illumination-contrastchanges using the atmospheric degradation model (see Ap-pendix B) over two image-datasets, the CIFAR-10 [29] andthe Dogs and Cats [30].

The rest of the paper is organised into six additional sectionsand three appendices. Section II gives a brief overview ofthe theoretical background used in the rest of the paper. Thecore of the paper consists of sections III and IV, whichdescribe the new monogenic layer, M6, and the experimentalsetup, respectively. The experiments, results and analyses arepresented in section V. Our conclusions are drawn in sectionVI. Finally, in appendixes A and B we summarize what weneed about the quaternion field H and about the atmosphericscattering model of light, respectively.

II. BACKGROUND

In this section, we explain three issues that are relevant forthis paper: equivariance and invariance, some properties of themammalian visual system, and the monogenic signal.

A. Equivariance and Invariance

The term equivariance tends to be used to refer to the pre-dictable way in which features of a signal change under certaintransformations [5]. More formally, a function f : X → X is


equivariant with respect to a group of transformations G of Xif

f(g(x)) = g(f(x)), (1)

for all x ∈ X and g ∈ G. For instance, one of the mostimportant equivariant properties of the mamalian visual system(measured by Hubel et al [31]) is its equivariance underrotations. Subsequently, many authors have been interested inextending this property to NNs and CNNs [1], [4], [32], [5].

On the other hand, invariance of a feature map f : X → Ymeans that [16]

f(g(x)) = f(x) for all x ∈ X. (2)

One of the key points on which this work is based isthat the monogenic signal has nice invariant and equivariantproperties (see [20], [16]). Of those properties, in this paperwe emphasize its remarkable capacity to produce invarianceunder illumination changes.

B. Bio-inspired CNN

Investigations by many researchers, including the Nobellaureates Santiago Ramon y Cajal [33] and Hubel et al.[31], have established that one of the major features of themammalian visual cortex is its built-in capacity to recognizeobjects independently of size, contrast, illumination, spatialfrequency, position on the retina, angle of view, lighting,among others [34].

One of the most important processes for object recognitionoccurs in the visual cortex [35], [34]. The visual cortex isorganised as a set of hierarchically connected cortical areasconsisting at least of V1, V2, and V4 cells [34]. The corticalarea V1 contains two types of orientation-sensitive cells,simple cells, and complex cells. The preferred orientation-sensitive response was observed in two-dimensional mapsas highly ordered regions named iso-orientation patches orpinwheel patterns. Figure 1 depicts the original image1 of thepinwheel-like structures and a close-up (see [36]).

Fig. 1. Image from [36] (with permission). Left: color-coded orientationpreference map (pinwheel-like structures) in the cat brain area 18. Right:zooming on the selected region.

The main properties of the V1 simple cells are (see [35],[37], [34], [38], [36]):

1) The V1 cells form the first layer of the hierarchicalcortical processing.

1License number 4431311414773 Springer Nature

2) They are insensitive to the colour of the light falling intheir receptive fields.

3) These neurons respond vigorously only to edges (odd-signal) and lines (even-signal) at a particular spatialdirection through the orientation columns [36].

In addition, the visual system exhibits a translation and rotationequivariance in recognition of objects [35], [34], [32].

Inspired by these findings, we propose a simplified modelof V1 as a monogenic layer that filters elementary geometricpatterns and achieves their learning coupled with a standardCNN. This combined architecture features one of the distin-guishing properties of the primary cortex: illumination-contrastinvariance.

C. Monogenic Signal

The terminology we are going to use is as follows (cf.Felsberg et al. [20]). We define 1D (resp. 2D) multivectorialsignals as C1 maps U → G from an interval U ⊂ R (aregion U ⊂ R2) into a geometric algebra G (see [14]).For G = R (G = C, G = H) we say that the signal isscalar (complex, quaternionic). For technical reasons we alsoassume that signals are in L2 (that is, the modulus is square-integrable).

The Riesz-Felsberg transform (RF) maps 2D scalar signalsto 2D quaternionic signals. Among the signals obtained in thisway, our interest lies in the (quaternionic) monogenic signals(see [20] for details). Some applications of the monogenicsignal are: visual perception measurements [39], [40], localfeature detection such as lines (even-signal) and edges (odd-signal) [16], [41], estimation of the disparity of stereo imagesand blending of images [42], or computation of fast phase-based video magnification [43].

The monogenic signal IM = IM (x, y) ∈ H associated toan image2 I = I(x, y) ∈ R (where x, y ∈ U , U a region ofR2). The definition of IM is as follows (cf. [20]):

IM = I + IM ′ , IM ′ = iI1 + jI2, (3)

where, denoting by ∗ the convolutional product,

I1 = I ∗ h1, I2 = I ∗ h2,

h1(x, y) = − x

2π(x2 + y2)3/2, (4)

h2(x, y) = − y

2π(x2 + y2)3/2.

The signals I1 and I2 are the Riesz transforms (quadraturefilters) of I in the x and y directions [20]. Note that IM ∈〈1, i, j〉 ⊂ H.

The local amplitude signal |IM | is defined by |IM |(x, y) =|IM (x, y)|, where the last expression is the modulus of thequaternion IM (x, y) [20]. Notice that we have

|IM |2 = |I|2 + |IM ′ |2 = |I|2 + |I1|2 + |I2|2, (5)

where |I|(x, y) = |I(x, y)| and similarly with |I1| and |I2|.

2Here it is to be noted that I is not the source image we are interested in,but a filtered version of it in the sense explained in Section IV.


The local phase Iφ and the local orientation Iθ associatedto I are defined, following [20], by the relations

Iφ = atan2

(I

|IR|

), (6)

Iθ = atan

(−I2I1

), (7)

where the quotients of signals are taken point-wise. For thegeometric interpretation of these signals see Figure 2.

I

IM

1

i

j

I1

I2

IR

Iφ

Iθ

Fig. 2. Geometry of the monogenic signal.

The local phase can distinguish lines and edges [37], [41],[16], [40], [22], whereas the local orientation appears as apinwheel picture resembling the behaviour of V1 simple cellsand orientation columns.

Figure 3 illustrates the monogenic transform of the imageof a white circle. From left to right, we display the originalimage, the filtered image (in the sense of Section IV), and thecorresponding local magnitude, phase and orientation signals.The highest local energy values take place at the circleboundary, whereas the dominant values of the local orientationare −π/2 (blue), 0 (white), π/2 (red).

Fig. 3. Original image, filtered image, local energy IM , local phase Iφ, andlocal orientation Iθ .

III. MONOGENIC CONVOLUTION NEURAL NETWORKLAYER

What we call Monogenic Convolutional Neural Network(MCNN) is the coupling of a deterministic layer based on themonogenic signal, which we call monogenic layer (ML), witha conventional convolutional neural network (CNN). Whatwe accomplish in this way, as shown by the experiments insubsequent sections, is a system for classifying images that not

only outperforms the usual CNNs in speed but which is alsoresilient in front of severe changes in illumination or contrast.The compound system is inspired, as said in the introduction,in the described behaviour of the V1 layer of the mammalianneocortex.

A. Description of the monogenic layer

We report on work about one monogenic layer, which wecall M6. As we will see, both share promising features withthe V1 layer in which they are inspired.

The purpose of M6 is to perform the following set ofoperations on the input image J . If J is formed by differentchannels indexed by c = 0, 1, . . . , N −1, we denote by Jc theimage corresponding to channel c. For instance, in an RGB(HSV) image, we would have the images IR, IG, and IB (IH ,IS , and IV ).

1) Get the average J of J over the channels forming J ,that is, with the previous notations,

J =1

N

N−1∑c=0

Jc (8)

2) Get the image I obtained by filtering J in the senseexplained in section IV.

3) Calculate the monogenic components |IM |, Iφ, Iθ of Ias defined by the equations (5), (6), and (7), respectively.

4) Construct two HSV images as follows:

HSVφ = (H,S, V ) = (Iφ, |IR|, 1), (9)HSVθ = (H,S, V ) = (Iθ, |IR|, 1). (10)

5) Transform the HSV images into RGB images RGBφand RGBθ according to the standard conventions (seepage 304 of [44]).

The use of the colour space HSV has been handy for theencoding in different colour hues the values of the monogenicangular components. Moreover, the further transformation tothe RGB colour space enhances the visibility of the regions inthe original image in which the local amplitude is significant,which translates into a sensitivity to sharp edges.

The six components of M6, namely RGBθ plus RGBψ ,together with the 3 RGB components of the input signal J ,are illustrated in Figure 4.

Fig. 4. (a) RGB input image. (b) RGBθ . (c) RGBφ. M6 is defined by (b)and (c).


IV. EXPERIMENTAL SETUP

A. Datasets

We have used two datasets, CIFAR-10 [29] (Figure 5 (a))and Dogs and Cats (Figure 5 (b)). Each data set was splitinto three sets: training set, validation set and test set. Table Ishows their main characteristics.

CIFAR-10 Dogs and CatsTraining set 36 000 2 400

Validation set 12 000 800

Test set 12 000 800

Total of images 60 000 4 000

Input shape [32, 32, 3] [150, 150, 3]

TABLE ICHARACTERISTICS OF THE CIFAR-10 AND DOGS AND CATS DATASETS.

Fig. 5. Examples of: (a) 100 RGB images from the CIFAR-10 dataset; (b)100 images from the Dogs and Cats dataset.

B. Degrading procedures

In our experiments, any of the original images is degradedby the addition of fog according to the McCartney atmosphericscattering model [45] (summarized in Appendix B) and furthermodified by the addition of independent random changes inthe illumination colour (in the range 0.8-1.0 for each colorcomponent) of atmospheric light A(r, g, b). In addition, thetransmission map t(x, y) was computed with the randomparameters summarized in the Table II.

Degradation levelsA([0.8, 1], [0.8, 1], [0.8, 1])

Level Parametersd0 zero degradationd1 t(x, y) = [0.5, 0.8]d2 t(x, y) = [0.3, 0.5]d3 t(x, y) = [0.0, 0.15]

TABLE IIDEGRADATION PARAMETERS OF COLOUR OF ATMOSPHERIC LIGHT

A(r, g, b) AND THE TRANSMISSION MAP t(x, y).

An account of the details is deferred to Appendix B, whileFigure 6 provides an illustration of a degradation run of

the images in Figure 5.For concreteness, we consider threedegraded levels (d1,d2,d3) for each dataset; for an illustration,see Figure 7

Fig. 6. Degraded versions (d1) of the images in Figure 5: (a) From CIFAR-10;(b) From Dogs and Cats.

Fig. 7. (a) Original image, level d0; (b) contrast level d1; (c) contrast leveld2; (d) contrast level d3. For the meaning of these labels see subsection IV-E.

C. CNN and MCNN Architectures

Functionally, an NN layer takes an input x and producesan output x′. The map f : x 7→ x′ depends on parametersassociated to the layer and whose nature depends on thekind of layer. In general, x, x′, and the layer parameters aremultidimensional arrays whose nature is chosen according tothe processing that has to be achieved.

Write [n1, n2, . . . , nd] to denote the type of a d-dimensional(real) array with axis dimensions n1, . . . , nd. Thus [n] isthe type of n-dimensional vectors and [n1, n2], the type ofmatrices with n1 rows and n2 columns. Matrices are usefulto represent monochrome images, but for RGB images weneed arrays of type [n1, n2, 3], or [n1, n2, n3] if it is requiredthat the image be represented by n3 channels, as for examplen3 = 6 for a pair of color stereoscopic images.

The parameters associated to convolutional and fully con-nected layers are represented by a filter array of weights, W ,3

and a bias array, b. In these cases, the expression of f hasthe form

fπ(x) = g(x ?π W + b), (11)

where ?π is a pairing specific of the layer and g is an activationfunction, ReLU in this paper, that is applied component-wiseto arrays. For convolutional layers, ?π = ? is array cross-correlation, to be described below, while for fully connected

3Filters are also called kernels.


layers, ?π is matrix product, which is denoted by juxtoposi-tion of its factors, xW . For a maximum pooling layer, theparameters are represented by a triple of positive integers(w1, w2, s = 1), where (w1, w2) is the shape of the poolingwindow and s is the stride (1 by default). In this case ?π = ?mpis given by the rule

(x ?mp W )[i, j, k] = max(x[is : is+w1 − 1, js : js+w2 − 1, k]), (12)

where we use the standard slicing conventions for arrays. Theshape of the array x ?mp W is [n′1, n

′2, n3], where n′1 and n′2

are the greatest integers such that n′1 ≤ (n1 − w1)/s andn′2 ≤ (n2 − w2)/s.

In the cross-correlation product y = x ? W , x is an arrayof type [n1, n2, n3] and W (the filter) is an array of type[w1, w2, n3,m3]. The pair (n1, n2) is the shape of the spacedimensions of x and n3 the number of channels. The pair(w1, w2) denotes the window dimensions of the filter and m3

the number of channels of the array y. The definition is givenby the following formula:

y[i, j, k] =

w1−1∑m=0

w2−1∑n=0

n3−1∑r=0

x[i+m, j + n, r]W [m,n, r, k],

(13)which can be expressed more compactly as

y[i, j, k] =

n3−1∑r=0

x[i : i+w1−1, j : j+w2−1, r]∗W [:, :, r, k],

(14)where ∗ denotes the ordinary scalar product of matrices.Notice that the shape of y is [n1 −w1 + 1, n2 −w2 + 1,m3].

There is also a downsampled cross-correlation y = x ?sWby a stride s:

y[i, j, l] =∑k,m,n

x[is+m, js+ n, k]W [m,n, k, l]

=∑k

x[is : is+ w1 − 1, js : js+ w2 − 1, k]

∗W [:, :, k, l]. (15)

The shape of the array x ?s W is [n′1, n′2, n3], where n′1 and

n′2 are the greatest integers such that n′1 ≤ (n1 − w1)/s andn′2 ≤ (n2 − w2)/s.

In this work, we have used two architectures: CNN-1 (forCIFAR-10) and CNN-2 (for Dogs and Cats). See Figure 8 foran schematic representation of MLCNN-1, which consists ofCNN-1 with the M6 layer (MLCNN-2 is defined similarly).The computation flux of these networks is summarized inTable III for CIFAR-10 and in Table IV for Cats and Dogs.In these two tables, the input I is processed by M6, and theresulting output is processed by a sequence of convolutional(C)4, max-pooling (MP), flatten (FL), fully connected (FC),and softmax (SMAX) layers. If the monogenic step is omitted,the flow agrees with fairly standard CNNs (here called CNN-1 and CNN-2; see below for further details). The value n3 isequal to 3 for I and 6 for M6, respectively. The W columnspecifies the filter of the current step. It is to be understoodthat the action of the layers C and FC is completed with aReLU activation function.

CIFAR-10I [32, 32, n3 = 3]

M6 [32, 32, n3 = 6]

x→ x′ W x′

C*-1 [3, 3, n3, 32] [32, 32, 32]

C-2 [3, 3, 32, 32] [30, 30, 32]

MP-1 [2, 2, s = 2] [15, 15, 32]

C*-3 [3, 3, 32, 64] [15, 15, 64], Dropout (0.25)C-4 [3, 3, 32, 64] [13, 13, 64]

MP-2 [2, 2, s = 2] [6, 6, 64] Dropout (0.25)FL [2304]

FC-1 [2304, 512] [512]

FC-2 [512, 10] [10] Dropout (0.5)SMAX [10]

TABLE IIIFLOW OF THE MONOGENIC CNN AND OF CNN-1 FOR THE CIFAR-10

DATASET. CNN-1 HAS 1,250,858 TRAINABLE PARAMETERS.

Cats and DogsI [224, 224, n3 = 3]

M6 [224, 224, n3 = 6]

x→ x′ W x′

C*-1 [3, 3, n3, 32] [224, 224, 32]

MP-1 [2, 2, s = 2] [112, 112, 32]

C-2 [3, 3, 32, 32] [110, 110, 32]

MP-2 [2, 2, s = 2] [55, 55, 32]

C-2 [3, 3, 32, 64] [53, 53, 64]

MP-2 [2, 2, s = 2] [26, 26, 64]

FL [43264]

FC-1 [438264, 64] [64] Dropout (0.5)SMAX [2]

TABLE IVFLOW OF THE MONOGENIC CNN, AND OF CNN-2, FOR CATS AND DOGS

DATASET. CNN-2 WITH 2,797,730 TRAINABLE PARAMETERS

Initially we tested our monogenic layer on top of two CNNsarchitectures, CNN-1 and CNN-2, both with 9 hidden layers.The aim of these testings was to carry out a relatively fastsearch of adequate hyper-parameter values that guarantee aclassification accuracy close to a baseline mark. Additionally,we have tested our layer on top of a well-known RESNET-20 v2 architecture (with 571,034 trainable parameters) [46],with 18 hidden layers, to ascertain that it also produced gainssimilar to those observed with the simpler architectures.

D. Monogenic Filter Bank

In practice the local phase and orientation needs a bandpassfiltering, in order to define the local region of the signal [47],[16]. For all the computations of the filtered version of theimage I we have used a radial (isotropic) bandpass Log-Gabor function in the frequency domain G(u1, u2)) definedas follows:

G(u1, u2) = exp

−log

(√u21+u

22

ω0

)2

2 log(σ)2

, (16)

4 C* is a convolution with zero padding


[32, 32, 6][32, 32, 3] [32, 32, 32] [30,30,32] [15, 15, 32] [15, 15, 64] [13, 13, 64] [6, 6, 64]

Softm

ax

10

M6 C*-1W[3, 3, 6, 32]

C-2W[3, 3, 32, 32]

MP-1(2,2, stride=2)

C*-3W [3, 3, 32, 64]Dropout(0.25)

C-4W [3, 3, 64, 64]

MP-2(2,2,stride=2)Dropbox (0.5)

FC-1[2304, 512]

[2304] [512] [10]

FC-2[512, 10]

FL

Fig. 8. CNN-1 architecture with 1,250,858 trainable parameters including the monogenic layer, 4 convolutional layers, 2 dropout, 2 max polling layers, 3fully connected and one softmax function.

where u1, u2 are frequency components, ω0 is the centralfrequency of the filter, σ is a bandwidth parameter (see [48] formore details). The filtering process is describe in the followingsteps:

1) Compute the 2D Fourier transform F(J) of the meanvalue image J as in [49], namely

F(J)[u1, u2] =∑m1

∑m2

J [m1,m2]e−i2π(u1m1+u2m2) (17)

2) We have modified the P. Kovesi Python implementation[50], in order to compute a monogenic filter bankbased on G(u1, u2) with different scales. A filter bankcan be computed with the following parameters ωs0 =

1minwl ∗ss−1

f

, where minwl is the minimum wavelength,sf is a scale factor, s = 1, 2, . . . , ns is the current scale.

Js(u1, u2) = J(u1, u2) exp

−log

(√u21+u

22

ωs0

)2

2 log(σ)2

(18)

Figure 9 presents various views of the Log-Gabor functionG(u1, u2).

Fig. 9. (a) Log-Gabor filters at different scales and bandwidths in thefrequency domain.

In order to find the best parameters of the Monogenic layer,we evaluated the validation accuracy of CNN-1 on the CIFAR-10 dataset up to 100 epochs, with scales s = [3, 4, 5], mini-mum wavelength minw = [3, 4, 5] (i.e. smallest scale filter),scaling factors sf = [1.1, 1.2...2.1], and standard deviationsσ = [0.3, 0.4, 0.48, 0.6]. Finally we tried the learning ratevalues lr = [0.0001, 0.001, 0.005]. Altogether this amounts to1584 combinations. The outcome was that the best parametersare lr = 0.0001, s = 1,minw = 3, σ = 0.25, sf = 1.1, for amaximum of test accuracy and minimum processing time.

E. Experiments

We trained and tested six nets: CNN-1, M6CNN-1, CNN-2, M6CNN-2, RESNET-20 v2, M6-RESNET-20 v2 [46], following the scheme summarized in Ta-ble V, where di (i = 0, 1, 2, 3) means a degrada-tion degree (see Figure 7 for an intuitive view of thesignificance of these values). The computational codesare available at https://github.com/asp1420/monogenic-cnn-illumination-contrast.

In order to test each of all trained models about theirgeneralization capacity, they were run not only on the originaltest set, but also on the three modified versions of it consistingin adding the same three levels of haze.

CNNTrained Tested

d0 d0, d1, d2, d3d1 d0, d1, d2, d3d2 d0, d1, d2, d3d3 d0, d1, d2, d3

TABLE VEXPERIMENTAL ARRANGEMENTS FOR EACH CNN.

All the experiments were carried out for 100 epochs, andlearning rate of lr = 0.0001, no data augmentation, on theCTE-power 9 cluster of the Barcelona Supercomputing Centerwith one Tesla V-100.

V. RESULTS AND ANALYSIS

A synopsis of experimental results for CNN-1 and CNN-2 is reported in Figure 10 for Cifar-10 a Figure 11 for Catsand Dogs. Similarly, Figure 12 summarizes the findings forRESNET-20 in the case of Cifar-10.

The general idea of the experimental design has been to trainthe nets under four different degradation levels. These trainedsystems are represented on the horizontal axes and labeled bythe degradation labels dj (j = 0, 1, 2, 3). Each dj is run ona set of images not seen before and also presented in fourdegradation levels dk (k = 0, 1, 2, 3). The observed accuraciesare represented by colored circles in the case of the basicnet and by colored squares in the case of the correspondingmonogenic-enhanced net. Thus each of the three graphicsquotes 32 accuracies, 16 for the basic net and 16 for themonogenic net.


The main finding is the resilience of each of the monogenicsystems dj with respect to any of the degradations dk, forthe squares above dj are clustered around 0.70 accuracy forall dk. This contrasts with the wide dispersion of the circlesabove dj , with a (to be expected) maximum when k = j andsubstantially lower values for k 6= j. To note, however, that thebasic nets dj perform slightly better just for the degradation dj ,as shown by the top position of several of the correspondingcircles (in Fig. 10, for instance, blue circle for d0− d0, greenfor d1 − d1, and magenta for d2 − d2).

Fig. 10. Cifar 10 test data with different degradation using CNN1 models.See text for details.

Fig. 11. Dogs and Cats test data with different degradation models usingCNN-2. See text for details.

VI. CONCLUSIONS

The context of this paper is the idea that geometric calculushas the potential to articulate novel and promising researchesin deep learning. A suitable name for this avenue is GeometricDeep Learning (GDL).

Fig. 12. Cifar-10 test data with different degradation models using RESNET-20. See text for details.

In the explorations reported in this paper, we have used thequaternion calculus, which is the simplest geometric calculusbeyond the complex calculus. More specifically, we havedesigned a bio-inspired front layer for CNNs that processes amonogenic signal by extracting phase and orientatation signalsand assembling them in a HSV space.

The experimental results with two different datasets andthree CNNs confirm that the accuracy gained by using ourlayer has a substantially more robust performance when facedwith severe illumination changes than the same nets withoutsuch a front layer.

We plan to continue the the trail walked in this researchby developing a front layer for CNNS that is resilient whenfaced with other transformations of the images, as for instancerotations or even small deformations.

APPENDIX AQUATERNION ALGEBRA

The quaternion algebra H is a four dimensional real vectorspace with basis 1, i, j,k,

H = R1⊕Ri⊕Rj ⊕Rk (19)

endowed with the bilinear product (multiplication) defined byHamilton’s relations, namely

i2 = j2 = k2 = ijk = −1. (20)

As it is easily seen, these relations imply that

ij = −ji = k, jk = −kj = i, ki = −ik = j. (21)

The elements of are called quaternions, and i, j,k, quater-nionic units. By definition, a quaternion q can be written in aunique way in the form

q = a+ bi + cj + dk, a, b, c, d ∈ R. (22)

Its conjugate, q, is defined as

q = a− (bi + cj + dk). (23)


Note that (q+ q)/2 = a, which is called the real part or scalarpart of q, and (q − q)/2 = q − a = bi + cj + dk, the vectorpart of q.

Since the conjugates of i, j,k are −i,−j,−k, the relations(20) and (21) imply that the conjugation is an antiautomor-phism of H, which means that it is a linear automorphismsuch that qq′ = q′q.

Using Hamilton’s relations again, we easily conclude that

qq = a2 + b2 + c2 + d2. (24)

This allows to define the modulus of q, q, as the unique non-negative real number such that

q2 = qq. (25)

Observe that qq′ = qq′. Indeed, qq′2 = qq′qq′ = qq′q′q =qq′

2q = q2q′

2.Finally, for q 6= 0, |q| > 0 and q(q/|q|2) = 1, which shows

that any non-zero quaternion has an inverse and therefore thatH is a (skew) field.

APPENDIX BTHE ATMOSPHERIC SCATTERING MODEL

We have used the atmospheric scattering model in order tomodel the illumination and contrast degradation of the images.The formation of a degraded image is modeling using theatmospheric scattering model proposed by McCartney et al.[45], defined as follows:

I(x, y) = J(x, y)t(x, y) +A(1− t(x, y)), (26)

where I(x, y) is the measured image, J(x, y) is the originalscene without affectations, A(r, g, b) is the illumination colourof atmospheric light, and t(x, y) is named as transmission map,which can be defined in a homogeneous atmosphere as:

t(x, y) = e−βd(x,y), (27)

where β is the scattering coefficient of the atmosphereand d(x, y) is the scene depth. An example of the estimatedtransmission map is presented in Figure 13.

Fig. 13. Transmission map estimation. (a) Transmission map over 100 imagesfrom CIFAR-10; (b) Transmission map over 100 images from Dogs and Catsdataset.

ACKNOWLEDGMENT

The authors would like to thank to CONACYT andBarcelona supercomputing Center. Sebastian Salazar-Colores(CVU 477758) would like to thank CONACYT (ConsejoNacional de Ciencia y Tecnologıa) for the financial supportof his PhD studies under Scholarship 285651.

REFERENCES

[1] T. Cohen and M. Welling, “Group equivariant convolutional networks,”in International conference on machine learning, pp. 2990–2999, 2016.

[2] T. Cohen, M. Geiger, J. Kohler, and M. Welling, “Convolutional net-works for spherical signals,” arXiv preprint arXiv:1709.04893, 2017.

[3] S. Mallat, “Understanding deep convolutional networks,” Phil. Trans. R.Soc. A, vol. 374, no. 2065, p. 20150203, 2016.

[4] J. Bruna and S. Mallat, “Invariant scattering convolution networks,”IEEE transactions on pattern analysis and machine intelligence, vol. 35,no. 8, pp. 1872–1886, 2013.

[5] S. Dieleman, J. De Fauw, and K. Kavukcuoglu, “Exploitingcyclic symmetry in convolutional neural networks,” arXiv preprintarXiv:1602.02660, 2016.

[6] F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Pog-gio, “Unsupervised learning of invariant representations,” TheoreticalComputer Science, vol. 633, pp. 112–121, 2016.

[7] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convo-lutional neural networks applied to visual document analysis,” p. 958,IEEE, 2003.

[8] K. Lenc and A. Vedaldi, “Understanding image representations bymeasuring their equivariance and equivalence,” in Proceedings of theIEEE conference on computer vision and pattern recognition, pp. 991–999, 2015.

[9] M. Weinstein, P. Breiding, B. Sturmfels, and S. Kalisnik Verovsek,“Learning algebraic varieties from samples,” Revista Matemtica Com-plutense, no. 31, pp. 545–593, 2018.

[10] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst,“Geometric deep learning: going beyond Euclidean data,” IEEE SignalProcessing Magazine, vol. 34, no. 4, pp. 18–42, 2017.

[11] X. Wang, X. Jin, G. Xu, and X. Xu, “A multi-scale decompositionbased haze removal algorithm,” in International Conference on RemoteSensing, Environment and Transportation Engineering (RSETE), vol. 2,(Nanjing, China), pp. 1–4, jun 2012.

[12] L. Sifre and S. Mallat, “Rotation, scaling and deformation invariantscattering for texture discrimination,” in Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 1233–1240,2013.

[13] D. Hestenes and G. Sobczyk, Clifford algebra to geometric calculus: aunified language for mathematics and physics, vol. 5. Springer Science& Business Media, 2012.

[14] S. Xambo-Descamps, Real spinorial groups—a short mathematicalintroduction. SBMA/Springerbrief, Springer, 2018.

[15] C. Lavor, S. Xambo-Descamps, and I. Zaplana, A Geometric AlgebraInvitation to Space-Time Physics, Robotics and Molecular Geometry.SBMA/Springerbrief, Springer, 2018.

[16] M. Felsberg, Low-level image processing with the structure multivector,vol. 203. Inst. fur Informatik und Praktische Mathematik, 2002.

[17] M. Mitrea, Clifford wavelets, singular integrals, and Hardy spaces.Springer, 2006.

[18] W. L. Chan, H. Choi, and R. Baraniuk, “Quaternion wavelets for imageanalysis and processing,” in Image Processing, 2004. ICIP’04. 2004International Conference on, vol. 5, pp. 3057–3060, IEEE, 2004.

[19] E. Hitzer and S. J. Sangwine, Quaternion and Clifford Fourier trans-forms and wavelets. Springer Science & Business Media, 2013.

[20] M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Transac-tions on Signal Processing, vol. 49, no. 12, pp. 3136–3144, 2001.

[21] E. U. Moya-Sanchez and E. Bayro-Corrochano, “Quaternion atomicfunction wavelet for applications in image processing,” in IberoamericanCongress on Pattern Recognition, pp. 346–353, Springer, 2010.

[22] E. Bayro-Corrochano, E. Vazquez-Santacruz, E. Moya-Sanchez, andE. Castillo-Munis, “Geometric bioinspired networks for recognition of2-d and 3-d low-level structures and transformations,” IEEE transactionson neural networks and learning systems, vol. 27, no. 10, pp. 2020–2034,2016.


[23] T. Isokawa, N. Matsui, and H. Nishimura, “Quaternionic neural net-works: Fundamental properties and applications,” in Complex-valuedneural networks: utilizing high-dimensional parameters, pp. 411–439,IGI Global, 2009.

[24] S. Buchholz and G. Sommer, “Quaternionic spinor MLP,” 2000.[25] Y. Kominami, H. Ogawa, and K. Murase, “Convolutional neural net-

works with multi-valued neurons,” in Neural Networks (IJCNN), 2017International Joint Conference on, pp. 2673–2678, IEEE, 2017.

[26] T. Parcollet, Y. Zhang, M. Morchid, C. Trabelsi, G. Linares, R. De Mori,and Y. Bengio, “Quaternion convolutional neural networks for end-to-end automatic speech recognition,” arXiv preprint arXiv:1806.07789,2018.

[27] X. Zhu, Y. Xu, H. Xu, and C. Chen, “Quaternion convolutional neuralnetworks,” in Proceedings of the European Conference on ComputerVision (ECCV), pp. 631–647, 2018.

[28] C. Gaudet and A. Maida, “Deep quaternion networks,” arXiv preprintarXiv:1712.04604, 2017.

[29] A. Krizhevsky and G. Hinton, “Learning multiple layers of features fromtiny images,” tech. rep., Citeseer, 2009.

[30] J. Elson, J. J. Douceur, J. Howell, and J. Saul, “Asirra: a captcha thatexploits interest-aligned manual image categorization,” 2007.

[31] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interactionand functional architecture in the cat’s visual cortex,” The Journal ofphysiology, vol. 160, no. 1, pp. 106–154, 1962.

[32] G. Hinton, A. Krizhevsky, N. Jaitly, T. Tieleman, and Y. Tang, “Doesthe brain do inverse graphics?,” in Brain and Cognitive Sciences FallColloquium, vol. 2, 2012.

[33] Santiago, Ramon y Cajal, The structure of the retina. Charles C. ThomasPublisher, 1972.

[34] E. T. Rolls and S. M. Stringer, “Invariant visual object recognition: amodel, with lighting invariance,” Journal of Physiology-Paris, vol. 100,no. 1-3, pp. 43–62, 2006.

[35] J. Bigun, Vision with direction. Springer, 2006.[36] T. Bonhoeffer and A. Grinvald, “Iso-orientation domains in cat vi-

sual cortex are arranged in pinwheel-like patterns,” Nature, vol. 353,no. 6343, p. 429, 1991.

[37] G. H. Granlund and H. Knutsson, Signal processing for computer vision.Springer Science & Business Media, 2013.

[38] L. Spillmann, B. Dresp-Langley, and C.-h. Tseng, “Beyond the classicalreceptive field: the effect of contextual stimuli,” Journal of vision,vol. 15, no. 9, pp. 7–7, 2015.

[39] Z. Wang and E. P. Simoncelli, “Local phase coherence and the per-ception of blur,” in Advances in neural information processing systems,pp. 1435–1442, 2004.

[40] V. Sierra-Vazquez and I. Serrano-Pedraza, “Application of riesz trans-forms to the isotropic am-pm decomposition of geometrical-opticalillusion images,” J. Opt. Soc. Am. A, vol. 27, pp. 781–796, Apr 2010.

[41] E. U. Moya-Sanchez and E. Vazquez-Santacruz, “A geometric bio-inspired model for recognition of low-level structures,” in InternationalConference on Artificial Neural Networks, pp. 429–436, Springer, 2011.

[42] A. Tewari, “Image blending using local phase,” 2015.[43] N. Wadhwa, M. Rubinstein, F. Durand, and W. T. Freeman, “Riesz

pyramids for fast phase-based video magnification,” in ComputationalPhotography (ICCP), 2014 IEEE International Conference on, pp. 1–10,IEEE, 2014.

[44] M. K. Agoston, Computer graphics and geometric modeling, vol. 1.Springer, 2005.

[45] E. J. McCartney and F. F. Hall, “Optics of the atmosphere: scatteringby molecules and particles,” Physics Today, vol. 30, no. 5, pp. 76–77,1977.

[46] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residualnetworks,” in European Conference on Computer Vision, pp. 630–645,Springer, 2016.

[47] M. Unser, D. Sage, and D. Van De Ville, “Multiresolution monogenicsignal analysis using the riesz–laplace wavelet transform,” IEEE Trans-actions on Image Processing, vol. 18, no. 11, pp. 2402–2418, 2009.

[48] M. Felsberg and G. Sommer, “A new extension of linear signalprocessing for estimating local properties and detecting features,” inMustererkennung 2000, pp. 195–202, Springer, 2000.

[49] R. C. Gonzlez, R. E. Woods, and S. L. Eddins, Digital image processingusing MATLAB (2nd edition). MATLAB examples, Tata McGraw-Hill,2010.

[50] P. D. Kovesi, “MATLAB and Octave functions for computer vision andimage processing,” 2018.

E. Ulises Moya-Sanchez was a postdoc ofHigh-Performance Artificial Intelligence group atBarcelona Supercomputing Center and is researcherat Universidad Autonoma de Guadalajara. He re-ceived his PhD degree in 2014 from CINVESTAVunidad Guadalajara. He is a member of SistemaNacional de Investigadores (SNI-C), CONACyT,Mexico. He is the Artificial Intelligence director ofthe Jalisco Government.

Sebastia Xambo-Descamps is an Emeritus FullProfessor at the Department of Mathematics ofthe Universitat Politecnica de Catalunya (UPC). Heholds a Ph.D. in Mathematics from the Univer-sity of Barcelona, and an M.Sc. degree in Math-ematics from Brandeis University, USA. Has beenFull Professor at the Department of Algebra ofthe Universidad Complutense of Madrid, Presidentof the Catalan Mathematical Society, Dean of theFaculty of Mathematics and Statistics of the UPCand President of the Spanish Conference of Deans

of Mathematics. Led various R+D+I projects, including the development ofthe Wiris mathematical platform. Authored Block Error-Correcting Codes–AComputational Primer and Real Spinorial Groups. Co-authored An invitationto Geometric Algebra through Spacetime Physics, Robotics and MolecularGeometry. Co-edited Cosmology, Quantum Vacuum and Zeta Functions.

Sebastian Salazar-Colores received his B. S. degreein Computer Science from Universidad AutonomaBenito Juarez de Oaxaca, received his M. S. degreein Electrical Engineering at Universidad de Guana-juato. He is a PhD candidate in Computer Scienceat the Universidad Autonoma de Queretaro. Hisresearch interest are image processing and computervision.

Abraham Sanchez Perez is an artificial intelligenceanalyst at Jalisco government. Received his M. S. de-gree in Computer Sciences at Universidad Autonomade Guadalajara.

Ulises Cortes is a Full-Professor and Researcher ofthe Universitat Politecnica de Catalunya (UPC). Heis the scientific coordinator of the High-PerformanceArtificial Intelligence group at Barcelona Supercom-puting Center (BSC). He is a member of SistemaNacional de Investigadores (SNI-III), CONACyT,Mexico.

journal of la a bio-inspired monogenic cnn layer for … · 2020. 6. 23. · journal of latex class...

Documents