eye tracking data guided feature selection for image ...enzheru/publications/pr-zhou-2017.pdf ·...

Pattern Recognition 63 (2017) 56–70

Contents lists available at ScienceDirect

Pattern Recognition

http://d0031-32

n CorrE-m1 Xu2 Fe

bioinfor

journal homepage: www.elsevier.com/locate/pr

Eye tracking data guided feature selection for image classification

Xuan Zhou a,1, Xin Gao b,1, Jiajun Wang a,n, Hui Yu a, Zhiyong Wang c, Zheru Chi d,e

a School of Electronic and Information Engineering, Soochow University, Suzhou 215006, PR Chinab Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou 215163, PR Chinac School of Information Technologies, The University of Sydney, NSW 2006, Australiad Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Konge PolyU Shenzhen Research Institute, Shenzhen, PR China

a r t i c l e i n f o

Article history:Received 20 September 2014Received in revised form9 September 2016Accepted 10 September 2016Available online 12 September 2016

Keywords:Eye trackingFeature selectionQuantum genetic algorithm (QGA)mRMRSVM-RFE

x.doi.org/10.1016/j.patcog.2016.09.00703/& 2016 Elsevier Ltd. All rights reserved.

esponding author.ail address: [email protected] (J. Wang).an Zhou and Xin Gao contributed equally toature selection is often referred as genematics.

a b s t r a c t

Feature selection has played a critical role in image classification, since it is able to remove irrelevant andredundant features and to eventually reduce the dimensionality of feature space. Although existingfeature selection methods have achieved promising progress, human factors have seldom been takeninto account. To tackle such a problem, a novel two-stage feature selection method is proposed for imageclassification by taking human factors into account and leveraging the value of eye tracking data. In thecoarse selection stage, with the help of eye tracking data, Regions of Interests (ROIs) from the humanperspective are first identified to represent an image with visual features. Then, with an improvedquantum genetic algorithm (IQGA) that incorporates a novel mutation strategy for alleviating the pre-mature convergence, a subset of features is obtained for the subsequent fine selection. In the fine se-lection stage, a hybrid method is proposed to integrate the efficiency of the minimal-Redundancy-Maximal-Relevance (mRMR) and the effectiveness of the Support Vector Machine based RecursiveFeature Elimination (SVM-RFE). In particular, the ranking criterion of the SVM-RFE is improved by in-corporating the ranking information obtained from the mRMR. Comprehensive experimental results intwo benchmark datasets demonstrate that eye tracking data are of great importance to improve theperformance of feature selection for image classification.

& 2016 Elsevier Ltd. All rights reserved.

1. Introduction

Feature selection has been one of the key components of manypattern recognition systems such as image classification [1,2] andcancer classification systems2 [3], when more and more diverseinformation is available to characterize entities such as images andobjects to be classified. Since more features do not always lead tobetter classification performance, feature selection aims to identifya set of relevant and necessary features and to reduce the di-mensionality of feature space for improving classification perfor-mance [4]. It will also reduce storage and computational costs.

There are two types of feature selection methods: filters andwrappers [5,6]. Filter type methods utilize the general character-istics of feature data and select the top-ranked features accordingto a criterion. In general, they aim to maximize the relevancy of a

this study.selection in the field of

set of features while minimizing the redundancy among features[3,7–9]. For example, in [8], with a minimal-Redundancy-Max-imal-Relevance (mRMR) criterion defined based on the mutualinformation, near-optimal features were identified with an incre-mental search method. However, the mRMR method does not al-low a flexible trade-off between the relevancy and redundancy offeatures although a suitable trade-off could be useful for improv-ing the classification performance [2]. The greedy algorithms andthe simulated annealing ones [3,10] are typical examples takingthe trade-off into account and attempting to determine the opti-mal trade-off between the relevancy and the redundancy of a setof genes. The most important merit of the filter methods lies intheir efficiency. However, due to the lack of a feedback processinvolved, it is difficult to ensure improved classification perfor-mance when the resulted feature subset is used for model trainingand learning.

The wrapper type methods are usually able to achieve higherclassification accuracy than the filter type ones, since the char-acteristics of classifiers are taken into account in the feature se-lection process. Among the wrapper type methods, the supportvector machine (SVM) is one of the most widely used classifiers.

www.sciencedirect.com/science/journal/00313203

www.elsevier.com/locate/pr

http://dx.doi.org/10.1016/j.patcog.2016.09.007



http://crossmark.crossref.org/dialog/?doi=10.1016/j.patcog.2016.09.007&domain=pdf



mailto:[email protected]


X. Zhou et al. / Pattern Recognition 63 (2017) 56–70 57

For example, the SVM recursive feature elimination (SVM-RFE)method [11] has attracted more and more attention. In thismethod, the weights of the trained SVM classifier were used as theranking measures of genes. Based on these measures, genes withthe poorest performance were removed. Different from the SVM-RFE [11] which recursively eliminates individual genes, an SVMrecursive cluster-elimination-based (SVM-RCE) method [12] wasproposed to remove gene clusters according to a score defined interms of the accuracy of the SVM for each gene cluster. Duan et al.proposed a multiple SVM-RFE (MSVM-RFE) method [13] wherethe SVM was trained in multiple subsets of training data and geneswere ranked through the statistical analysis of gene weights inmultiple runs. In [14], Wahde and Szallasi reviewed evolutionaryalgorithm based wrapper methods where gene selection wasachieved with genetic operation based optimizations. In general,wrapper type methods outperform filter type ones, while sufferingfrom high computational cost and robustness.

Recently, some researchers also proposed to combine the filtertype and the wrapper type feature selection methods to achieveboth the efficiency and the effectiveness. For example, Mundraet al. proposed to minimize the redundancy among selected genesby incorporating the mutual information based mRMR methodinto the SVM-RFE [15] method. The drawback of such single-filter-single-wrapper (SFSW) type approaches is that its classificationaccuracy is dependent on the choices of specific filters andwrappers. In [16], Leung et al. proposed a multiple-filter-multiple-wrapper (MFMW) approach that used multiple filters and multiplewrappers to identify promising biomarker genes for improving theclassification accuracy and robustness. Their experimental resultsdemonstrated that the MFMW approach outperforms SFSWmethods in all cases generated by all combinations of filters andwrappers used in the MFMW approach.

Similarly, there have been lots of studies in the feature selec-tion specific for image classification [17–19]. Although much pro-mising progress has been achieved, little attention has been paidto address the human factors in the feature selection process. Onthe contrary, most existing feature selection methods aimed tomathematically identify a subset from a given set of low level vi-sual features such as color and texture [20,21]. Considering the factthat human beings are very good at interpreting visual informa-tion for various tasks such as object recognition and image clas-sification, it will be particularly valuable to leverage the cognitiveprocess of human beings.

In order to explore mechanisms of human eyes for processingthe visual information, we propose to use the eye tracking tech-nology. Since the 1960s, this technology has been used as an ob-jective tool for the analysis of visual attention and perception bydetermining fixations. In [22], eye tracking experiments werecarried out to explore the relationship between gaze behavior anda visual attention model that identifies Regions of Interests (ROIs)in images. The results demonstrated that eye gaze behavior onimages with clear ROIs is different from that on images withoutclear ROIs. In [23], the feasibility of using an eye tracker as animage retrieval interface was explored, which showed that eyetracking data could be used to retrieve target images in fewersteps than the random selection. In [24], the fixation data wereused to locate human focused ROIs (hROIs) for image matching,while the relative gaze duration of each hROI was used to weighthe similarity measure for image retrieval. This method outper-forms the conventional content-based image retrieval methodsespecially when important regions in images are difficult to belocated based on visual features. Therefore, it is anticipated thateye tracking data will help identify important features for imageclassification by leveraging the cognitive process of human beings.

Based on the above observations, in this paper we propose atwo-stage feature selection method. In the coarse selection stage,

an eye tracking device is employed to acquire eye tracking data foridentifying hROIs as sample images. With the hROIs identified, animproved quantum genetic algorithm (IQGA) is proposed to selectan initial subset of features by utilizing QGA's efficient searchcapability and effective capacity for optimization problems ofcombinatorial nature. In order to alleviate the premature con-vergence problem of the traditional QGA [25], an adaptive muta-tion strategy is employed. In the fine selection stage, we propose ahybrid method to take the complementary advantages of bothfilter type and wrapper type feature selection methods. Thismethod is performed with respect to the components of coarselyselected features by combining the filter type mRMR method andthe wrapper type SVM-RFE method sequentially, which is calledmRMR-SVM-RFE. The former method is implemented to select anear optimal subset of feature components efficiently while thelatter one is used to select more effective feature components fromthe subset obtained from the former step.

In summary, the key contributions of our work are:

1. Different from most existing feature selection methods whichaim to improve feature selection by devising mathematicallysound algorithms, our method is one of the first studies takinghuman factors into account in the process of feature selectionfor image classification by using eye tracking data. In addition,instead of utilizing several visual features, we investigate 75visual features. It is the largest number of visual features stu-died in the literature of image classification, to the best of ourknowledge.

2. We propose a two-stage feature selection method. In the coarseselection stage, a subset of visual features is identified with thehelp of eye tracking data and the quantum genetic algorithm(QGA). We also propose an improved mutation strategy toeliminate the premature convergence issue of the QGA.

3. We propose a hybrid method, namely mRMR-SVM-RFE, in thefine feature selection stage to have both the efficiency merit ofthe mRMR and the effectiveness merit of the SVM-RFE. Wedevise an improved SVM-RFE method by integrating the rank-ing information of individual feature components obtained fromthe mRMR method into the ranking criterion of the originalSVM-RFE method. Therefore, our proposed mRMR-SVM-RFEmethod performs better than the one proposed in [15] wherethe ratio between the relevancy and redundancy in mRMR wasdirectly used to devise the ranking criterion in SVM-RFE.

The rest of this paper is organized as follows. In Section 2, webriefly introduce the 75 visual features used in our study. InSection 3, we introduce three aspects of our feature selectionmethods: the acquisition of eye tracking data, the QGA basedcoarse feature selection, and the mRMR-SVM-RFE based fine fea-ture selection. In Section 4, we present and discuss our experi-mental results of image classification in two benchmark datasets.In Section 5, we conclude our work together with discussions onour future work.

2. Feature extraction

Low level visual features such as the color, the texture and theshape are fundamental for characterizing the visual content suchas images [26–29]. In this paper, we investigate 75 descriptors ofthese three types of visual features which have been widely usedand popular for the visual content representation in various taskssuch as the image classification and the image retrieval.

Color features include the color histogram [30,31], the domi-nant color [32], color moments [33], the color set [34], the colorstructure descriptor [32], the color layout [32], and the scalable

Table 1Summary of color features.

Feature index 1–10 11–20 21 22 23 24–36

Feature name Generalhistogram

Accumulated histogram HSV histogram HSV histogram HSV histogram Dominant colors

(in 10 colorspaces)

(in 10 color spaces) (8:3:3 uniformquantization)

(16:4:4 uniformquantization)

(non-uniformquantization)

(in 13 color spaces)

Dimensions 48 48 72 256 35 16

Feature index 37–49 50 51 52 53 54

Feature name Color moment HSV Color set Color structure Color layout Scalable color Scalable color(in 13 colorspaces)


in RGB space in YCbCr space in 16:4:4 HSV space in 8:3:3 HSV space

Dimensions 9 8 256 12 66 20

For the color histogram features, the order for all the 13 color spaces is HSL (Hue, Saturation, Lightness), HSV (Hue, Saturation, Value), JPEG/YCbCr, Lab, Lch (Lightness,chroma, hue), Luv, RGB, XYZ, YCbCr, YDbDr, YIQ, YPbPr and YUV. The general histogram and the accumulative histogram are derived in color spaces other than HSL, HSV, Lch.

Table 3Summary of shape features.

Feature index 66 67 68 69 70

Feature name Area Eulernumber

Horizontalareaprojection

Verticalarea pro-jection

Eccentricity

Dimensions 1 1 127 127 1


Feature name Principledirection

Geometricinvariantmoments

Legendremoments

Zernikemoments

Pseudo-Zernikemoments

Dimensions 1 7 25 10 10

X. Zhou et al. / Pattern Recognition 63 (2017) 56–7058

color descriptor [35]. In combination with different color spacesand quantization methods, totally 54 color descriptors (indexedfrom 1 to 54) are extracted for each image, as shown in Table 1.

In total, 11 texture descriptors (indexed from 55 to 65) [27], asshown in Table 2, are studied in this work: three texture de-scriptors recommended in MPEG-7, including the local edge his-togram descriptor (EHD) [36], the texture browsing descriptor(TBD) and the homogeneous texture descriptor (HTD) [37], theTamura texture [38], the Gabor feature [39], the primitive lengthdescriptor, the auto-correlation descriptor [40], the edge frequencydescriptor [41], the Fourier descriptor, and the co-occurrencematrix descriptor [42].

In this work, 10 widely used shape features (indexed from 66 to75) [29] as shown in Table 3 are used: the area, the Euler number,area projections, the eccentricity, the principle direction, geo-metric invariant moments, Legendre moments, Zernike moments[43], and pseudo-Zernike moments [44].

For convenience of the following discussions, each feature willbe assigned with a unique index. That is, all features in the featureset will be indexed from = { | = … }( )I m m M1, 2, ,0 where M is thenumber of image features and equals to 75 in this paper. Note thatother visual features can also be utilized and integrated in ourfeature selection framework.

3. Feature selection

Our proposed feature selection method consists of two stages:a coarse selection stage and a fine selection stage. In the coarseselection stage guided by eye tracking data, an improved quantumgenetic algorithm (IQGA) is proposed to select a subset of visualfeatures from the total 75 features to achieve the best classificationperformance in the dataset where the eye tracking data wereobtained. In the fine selection stage, a hybrid feature selectionmethod mRMR-SVM-RFE is proposed to take the advantages of

Table 2Summary of texture features.

Feature index 55 56 57

Feature name EHD TBD HTD

Dimensions 8 5 62


Feature name Autocorrelation descriptor Edge frequencydescriptor

Fourier texture(Energy)

Dimensions 8 385 256

both the mRMR and the SVM-RFE.

3.1. Coarse selection with eye tracking data and IQGA

The coarse selection stage consists of three components: theeye tracking data acquisition, the hROI identification, and thecoarse feature selection with the IQGA.

3.1.1. Eye tracking data acquisitionEye tracking data were acquired from a purposely prepared

image dataset, namely Dataset1. This dataset comprises two ca-tegories of images: one with distinct objects and the other withoutdistinct objects. Each category contains 50 images which wererandomly picked up from the 7346 Hemera color image database[45]. Each image in this dataset has a resolution of around

×1545 1024 pixels which covers most of the screen with a re-solution of ×1920 1280 pixels. Such a setting guarantees that al-most all eye tracking data are located inside the image area dis-played on the screen.

A non-intrusive table-mounted eye tracker, Tobii X120, was

58 59 60

Tamura texturedescriptor

Gabor texturedescriptor

Primitive lengthdescriptor

3 24 5

64 65

Fourier texture(Amplitude)

Co-occurrence matrix

256 8

Fig. 1. Illustration of eye tracking data and hROIs where blue asterisks indicate raw gaze samples and green squares indicate fixation points and hROIs. (a) Eye tracking dataand hROIs for object-distinctive images. (b) Eye tracking data and hROIs for non-object-distinctive images. (For interpretation of the references to color in this figure caption,the reader is referred to the web version of this paper.)


used in a user-friendly environment to obtain eye tracking data forimages in Dataset1. A high accuracy of 0.5° (with 0.3° drift) can beachieved with this eye tracker. The experiment was conducted at asampling rate of 120 Hz. The freedom for the head movement was

× ×30 22 30 cm [24]. The highest rotation speed of a participant'shead is 35 cm/s. A calibration was carried out under a grid of ninecalibration points to minimize errors of eye tracking data.

In the experiment, one participant was invited to view each ofthe 100 images within 5 s under a free-view condition. The par-ticipant was a proficient computer user with a normal vision andnew to eye tracking devices. The participant sits at a viewingdistance of about 68 cm in front of the computer screen and thecorresponding subtended visual angle was about ° × °41.5 26.8 .

In total, nearly 600 samples of raw gaze data for each imagewere collected with the Tobii X120 eye tracker. As shown in Fig. 1,four sample images are overlaid with raw gaze samples (markedwith blue asterisks) to illustrate the eye tracking patterns of ob-ject-distinctive images and non-object-distinctive images. It isclearly observed that the eye gaze data of object-distinctive imagesoften concentrate in the regions of objects while those of non-object-distinctive images scatter broadly, which indicates that notall image content is necessary for human perception. As a result,visual features extracted from eye gaze regions would be morehelpful for image classification.

3.1.2. hROI identificationIn order to identify hROIs from raw gaze data, gazing samples

are firstly clustered to form fixation points. The clustering processwas completed with the Tobii X120 Studio software where thefixation radius (35 pixels) and the minimum fixation duration(100 ms) are set to extract fixation data from the raw gaze data.Next, square regions are identified as hROIs around individualfixation points. To guarantee a proper coverage of visual content inan hROI, a suitable region size should be chosen. On one hand,larger regions tend to include unnecessary or even noisy

information, which may compromise the performance of imageclassification. On the other hand, smaller regions tend to missnecessary information which could be discriminative for imageclassification. In this work, a size of 127�127 pixels is chosen forhROIs after different sizes have been tried. As shown in Fig. 1,fixation points and the corresponding hROIs are marked withgreen squares while the red dotted lines connect the temporallyadjacent fixation points. Note that the numbers of fixation pointsvary from image to image. Fig. 2 gives the enlarged version ofhROIs as shown in Fig. 1 for the object-distinctive and non-object-distinctive cases, respectively.

After identifying hROIs for images in Dataset1, all the 75 fea-tures listed in Tables 1–3 are extracted for each hROI. Therefore, aconcatenated feature vector x with M features can be obtained torepresent each hROI,

= [ … ] ( )x x x x, , , , 1T TMT T

1 2

where = …m Mx , 1, 2, ,m , is a column vector denoting the m-thfeature, T is the transposition operation. In our case, since 15–20fixation points can be derived in each image in Dataset1, we select15 hROIs from each image as samples for the following coarseselection procedure. As a result, 100 images in Dataset1 will pro-duce 1500 such samples. Upon extracting features for all L¼1500hROI samples as in Eq. (1), a pool = { … }( ) ( ) ( )P x x x, , , L

x1 2 of feature

vectors can be obtained.

3.1.3. Coarse feature selection with the IQGAOur coarse feature selection aims to best classify the L hROIs

represented as Px into their corresponding class labels= { … }P y y y, , ,lb L1 2 (either object-distinctive class or non-object-

distinctive class) with a subset of ( = )M 75 visual features. In thissection, we introduce our proposed IQGA algorithm from five as-pects: the encoding strategy, the observation operator, the fitnessfunction, the improved rotation gate for the mutation, and thequantum crossover. Note that in this stage, the selection is

Fig. 2. Illustration of the enlarged version of hROIs as shown in Fig. 1. (a) hROIs for object-distinctive images. (b) hROIs for non-object-distinctive images.


performed on individual visual features, instead of individualcomponents of each visual feature.

(1) Encoding strategy for feature selection: Rather than encodinga feature as a binary value (0 or 1) like a traditional genetic al-gorithm (GA), the quantum genetic algorithm (QGA) encodes theselection status of a feature with a Q-bit which gives the prob-ability of a feature being selected [25,46]. In this way, the selectiontask for M features can be encoded as a Q-bits vector with Mcomponents (i.e., a chromosome with M genes),

α α αβ β β= ( … ) =

…… ( )

⎛⎝⎜

⎞⎠⎟q q q q, , , ,

2M

M

M1 2

1 2

1 2

where ( )α β=q ,m m m

T( = … )m M1, 2, , represents the Q-bit (se-

lection status) corresponding to the m-th feature. Here, both αm

and βm are variables with continuous values in [ ]0, 1 , with α| |m2

being the probability of the m-th feature not being selected, whileβ α| | = − | |1m m

2 2 being the probability of being selected. With such acoding strategy, the quantum genetic algorithm searches for theoptimal chromosome in terms of the fitness function from thesample space through mutation and crossover operations. Since allα and β in Eq. (2) are of continuous values, the population forquantum chromosome sample space is much larger than that forthe traditional binary valued chromosome sample space. There-fore, it will be of greater probability using QGA to find a globallyoptimal chromosome in such a space.

(2) Observation operator: In order to complete the feature se-lection task, the quantum chromosome should be converted toconventional binary bits which indicate whether the correspond-ing features are selected. In QGA, such a conversion is usually re-ferred to as the observation procedure. In this paper, the

observation procedure can be mathematically formulated as:

α= | | > = …( )

⎪

⎪⎧⎨⎩o

rm M

0, if

1, otherwise1, 2, , ,

3m

m2

where r is a random number uniformly distributed in [ ]0, 1 , =o 1mimplies that the m-th feature is selected, otherwise, the m-thfeature is discarded.

(3) Fitness function: The fitness function is used to evaluate theeffectiveness of a chromosome sample. In our case, the purpose offeature selection is to select an optimal subset of features whichcan achieve the best classification accuracy. Hence, the fitnessfunction ( )fit o is defined as the classification accuracy P achievedwith the subset of features selected using the chromosome guidedselection scheme o,

( ) ≜ =( )

( )fit P

NN

oo

,4

correct

total

where ( )N ocorrect and Ntotal are the number of correctly classifiedhROIs according to the selection scheme o and the total number ofhROIs in the test subset in Dataset1, respectively.

In the coarse selection stage, the following support vectormachine (SVM) classifier is employed [47] for evaluating differentfeature selection schemes:

∑Φ α( ) = · ( ) + = ( ) +( )

( ) ( ) ( ) ( )f b y K bx w x x x, ,5

ol

ol

jj j o

loj

where ( )xol is a vector of the selected features under the selection

scheme o for the l-th hROI sample to be classified, ( )xoj is a vector of

the selected features of the j-th training sample, yj denotes the

Inp

Ou

Ini

Be

Obwhf


class label of the j-th sample, αj is the Lagrange parameter whichcan be obtained from the training sample set, and (·)K is a kernelfunction, b is a classification threshold. In our case, the radial basisfunction (RBF) is selected as the kernel function of the SVM clas-sifier:

( ) ( )= − ∥ − ∥ ( )( ) ( ) ( ) ( )K gx x x x, exp , 6ol

oj

ol

oj 2

where g is set to 1 according to a ten-fold cross validation of theclassification results when setting g to different values in the set{ … … }− −10 , 10 , , 10 , , 105 4 0 5 .

(4) Improved rotation gate for mutation: In the QGA, the muta-tion operation is usually implemented with an invertible nor-malized matrix referred to as a quantum rotation gate. Then, themutation operation with respect to the m-th gene in a chromo-some can be expressed as:

( ) ( )( ) ( )

αβ

θ θ

θ θ

αβ

′′ =

−

( )

⎛⎝⎜⎜

⎞⎠⎟⎟

⎛⎝⎜⎜

⎞⎠⎟⎟⎛⎝⎜

⎞⎠⎟

cos sin

sin cos,

7

m

m

m m

m m

m

m

where α β( ),m mT and α β( ′ ′ ),m m

T are, respectively, the Q-bit of them-thgene before and after the mutation operation, θm is the rotationangle and θ α β θ= ( )·Δs ,m m m m. Here, θΔ m determines the rotationangle and α β( )s ,m m controls the rotation direction. In most existingQGAs, the above two parameters are determined in a manner asshown in Table 4 [48] where om is the observed binary value of them-th gene in the chromosome being considered, om opt, is the ob-served binary value of the m-th gene in the optimal chromosomein terms of the fitness function, ( )fit o and ( )fit oopt are fitnessfunctions of the above two chromosomes as defined in Eq. (4).

From Table 4, we can see that θΔ can only have a few fixeddiscrete values. The limited number of the values for θΔ is nothelpful for increasing the diversity of the chromosome populationand hence the searching algorithm is prone to be trapped in a localoptimum. In order to tackle such a problem, upon comprehensiveconsiderations of the relationship between θΔ and evolutiongenerations, and the relationship between θΔ and the fitness va-lues, the rotation angle is alternatively computed as follows:

θ α β θ= ( )·Δ −( ) − ( )

( )

⎪

⎪

⎪

⎪

⎧⎨⎩

⎡⎣ ⎤⎦ ⎫⎬⎭

sfit fit t

t

o o, exp ,

8m m m m

opt

max

where α β( )s ,m m and θΔ m still represent the rotation angle and itsdirection whose values can be found in Table 4, ( ( ) − ( ))fit fito oopt isthe difference between the fitness values of the chromosomebeing considered and those of the optimal one, t and tmax are thecurrent and the largest generation numbers, respectively. Withsuch an improved mutation strategy, the amount of rotation isadaptively adjusted with the evolution generation and the fitnessvalue. As a result, the diversity of the chromosome population isincreased, which effectively avoids the premature convergence

Table 4Rotation angles and rotation directions [48].

om om opt, ( ) ≥ ( )f o oopt θΔ m α β( )s ,m m

α β· > 0m m α β· < 0m m α = 0m β = 0i

0 0 False 0 0 0 0 00 0 True 0 0 0 0 00 1 False 0 0 0 0 00 1 True π0.05 �1 þ1 71 01 0 False π0.01 �1 þ1 71 01 0 True π0.025 þ1 �1 0 711 1 False π0.005 þ1 �1 0 711 1 True π0.025 þ1 �1 0 71

problem of the conventional QGA.Although the diversity in the population can be increased by

revising the rotation scheme, such diversity will be lost when theQ-bit components α and β prematurely converge to a value in thevicinity of 0 or 1 upon quantum rotation. To address this problem,the Hε gate [49] is employed to modify the mutation result asfollows:

αβ

α β

ε ε α ε

ε ε α ε

α β ε α ε

″″ = ( ′ ′ ) =

( − ) | ′ | <

( − ) | ′ | > −

( ′ ′ ) ≤ | ′ | ≤ − ( )

ε

⎛⎝⎜⎜

⎞⎠⎟⎟

⎧⎨⎪⎪

⎩⎪⎪

H ,

, 1 , if ;

1 , , if 1 ;

, , if 1 , 9

m

mm m

Tm

Tm

m mT

m

2

2

2

where ε< ⪡0 1 is a threshold set to 0.01 in this paper [50]. Ob-viously, through such a modification, Hε, α β( )′′ ′′,m m

T will depart awayfrom the vicinity of 0 or 1, avoiding being forced to these twovalues during the observation procedure. Hence, the diversity ofthe observed Q-bit value can be increased and the prematureconvergence problem can be alleviated.

(5) Quantum crossover: In this paper, the crossover operation isperformed with a probability Pc of the population after observa-tion. Suppose that = ( … )o o oo , , ,k k k

Mk

1 2 and = ( … )o o oo , , ,crk

crk

crk

cr Mk

,1 ,2 ,are the observed states of the k-th chromosome before and aftercrossover operations. The crossover operation can be formulatedas:

= ( )(( + − ))o o , 10cr mk

mk m

,1 K

where (( + − )) = ( + − )k m k m mod K1 1K , K is the number ofsamples in the population. This operation simulates the quantuminterference procedure and can make the best use of the in-formation contained in the population. The crossover operation isvery helpful for increasing the diversity of the population andavoiding prematurity of the algorithm.

In summary, the algorithm for coarse feature selection is de-scribed in Algorithm 1.

Algorithm 1. Coarse feature selection with IQGA.

eF

f

ef

e

ut: Feature vector pool Px, half for training and half fortesting, class label pool Plb, crossover probability Pc, max-imum generation number tmax

tput: Feature subset ( )Sf1 and its indices ( )I f

1 for the selected

featurestialize: The generation number t¼0 and the population

= { … }Q q q q, , , K1 2

gin

tain = { … }O o o o, , , K1 2 for Q from Eq. (3);ile ( < )t tmax door ( ← )k Kto1 do

Perform feature selection according to ok;Train the SVM in the training set with the selected

features;

Perform classification and compute ( )fit ok with the testingset;ndind the chromosome index kopt with the highest classifica-tion accuracy;or ( ←k to1 K and ≠ )k kopt do

Perform mutation for qk according to Eqs. (7)–(9);

Obtain ok for qk from Eq. (3);ndor ( ←k to1 K and ( = ( )) ≤ )r rand P0, 1 c doPerform crossover operation according to Eq. (10);nd

tenFin

Ou


← +t 1;dd oopt with the highest classification accuracy;

tput subsets = { | = }( )I m o 1f opt m1

, , = { | ∈ }( ) ( )S m Ixf m f1 1 ;

d

Inp

OuIni

Be

wh

S

en

SetOb

Set

Cowh

En

3.2. Fine selection with mRMR-SVM-RFE

After the coarse feature selection, the Mcoarse selected featuresin ( )Sf

1 are utilized to characterize image content. That is, eachimage will be represented with a concatenated feature vector

= [ … ]x x x x, , ,coarse iT

iT

iT TMcoarse1 2

with … ∈ ( )i i i I, , , M f1 21

coarse, where x ik

is

the ik-th selected feature with nik( ≥1) components, i.e.,

= [ … ]x x xx , , ,i i i i nT

,1 ,2 ,k k k k ik.

We perform fine selection with respect to individual compo-nents of xcoarse, as opposed to the feature-wise approach used inthe coarse selection. Therefore, in the following discussion, thecoarsely selected and concatenated feature vector xcoarse will berewritten in a component-wise manner as = [ … ]x x xx , , ,coarse N

T1 2 ,

with = ∑ ∈ ( )N ni I ik f k1 . As a result, the correspondences between ( )I f

1

and = { | = … }( )I i i N1, 2, ,c1 and between ( )Sf

1 and

= { | = … }( )S x i N1, 2, ,c i1 can be established. In terms of the com-

ponent-wise manner, the purpose of the fine selection is to select( < )n Nfine feature components for effective image content

representation.Our fine feature selection process consists of two steps: the

mRMR based feature selection to further reduce the number offeature components, and an improved SVM-RFE based featureselection for deriving the best performed feature components.Therefore, we name our proposed fine feature selection method asmRMR-SVM-RFE.

3.2.1. mRMR based feature selectionIn the mRMR based feature selection method [8], the re-

dundancy R and the relevance D of a feature subset are measuredin terms of the mutual information (MI) defined as follows:

∑( ) = ( )( )∈

R SS

I x x1

; ,11x x S

i j2,i j

and

∑( ) = ( )( )∈

D S cS

I x c,1

; ,12x S

i

i

where R(S) is the redundancy of the feature subset S, ( )D S c, is therelevance of the feature subset S to the target classes c, ( )I x x;i j and( )I x c;i are the mutual information between the feature compo-nents xi and xj and that between xi and c, respectively, and | |S is thenumber of elements in the subset S.

Therefore, the objective of maximizing the relevance andminimizing the redundancy simultaneously can be expressed asthe following optimization problem:

ϕ= ( ) = ( )( ) ( )

( )

⊂ ( )S S c

D S cR S

arg max ,,

,13

cS S

2

c1

where the subset ( )Sc2 is the optimally selected subset of feature

components in the subset ( )Sc1 . In our work, the incremental search

scheme [9] is employed to find the near-optimal solution to theproblem in Eq. (13). With this approach, nmM feature componentsare finally selected to construct the feature subset ( )Sc

2 .

3.2.2. Improved SVM-RFE based feature selectionAfter the near-optimal subset ( )Sc

2 of feature components areobtained with the filter type mRMR based feature selectionmethod, the searching space has been reduced to cater for thecomputationally expensive wrapper type SVM-RFE based featureselection method.

(1) Traditional SVM-RFE Method: The SVM-RFE based featureselection method [51] starts with all feature components and re-cursively removes the feature component with the least im-portance for classification in a backward elimination manner. Themeasure for the importance of a feature component is computedfrom the weight vector of the SVM [15]:

∑ α= × ×( )

( )yw x ,14l

l l mMl

where αl is the Lagrange multipliers, yl is the class label of the l-thsample, and ( )xmM

l is the feature vector resulted from mRMR se-lection for the l-th sample. With such a weighting vector, theimportance of the i-th component ci can be determined as wi

2, andhence the feature components can be selected according to theirimportance as determined above.

(2) Improved SVM-RFE method: Since the mRMR criterion aimsto simultaneously maximize the relevance and minimize the re-dundancy of a feature component, it would be helpful to integratesuch a criterion into ranking the importance of a feature compo-nent in the SVM-RFE method. Therefore, we devise a new rankingcriterion through a convex combination as follows:

β β= ( − ) × ′( ) + × | | ( )c I i w1 , 15i i

where ci is the importance of the i-th component of the feature, βis a constant satisfying β ∈ [ ]0, 1 , wi is as defined in Eq. (14), and′( )I i indicates the relevance-redundancy factor of the i-th featurecomponent in terms of the mRMR criterion.

Now there are two problems to be solved: how to define ′( )I iand how to choose a suitable β. For defining ′( )I i , we employ asimple yet effective solution: ranking each feature component interms of the mRMR criterion in a decreasing order and assigning adecreasing order number to each feature component. As a result,an ordered set ′ = { − … }I n n, 1, , 1mM mM can be obtained for ( )Sc

2

using the mRMR.

Algorithm 2. Fine feature selection algorithm mRMR-SVM-RFE.

ut: Pool of features = { … }( ) ( ) ( )P x x x, , ,coarse coarse coarseL

x1 2

coarse, pool of

class labels = { … }P y y y, , ,lb L1 2 , subset ( )Sc1 of feature com-

ponents after coarse selection, the number nmM of featurecomponents from mRMR, and the number of the final se-lected feature components nfine.

tput: Subset ( )Sc3 for the nfine selected feature components

tialize: Feature rank list = [ ]r , feature subset obtained

from the mRMR selection method Φ Φ= =( ) ( )S I,c c2 2

gin

ile ( | |) ≠( )S nc mM2 ) do

earch the best feature component x̂ in −( ) ( )S Sc c1 2 with the

incremental search scheme;

← ∪ {^} ← ∪ { (^)}( ) ( ) ( ) ( )S S x I I index x,c c c c2 2 2 2 ;

d

= =( ) ( )S S I I,c c2 2 ;

tain the set ′I of the order numbers for features in S;

= [ … ]( ) ( ) ( )X x x x, , ,coarse coarse coarseL T1 2 , = ( )IX X : ,t ;

mpute the combinatorial coefficient β from Eq. (18);ile ( Φ≠S ) do

TCC

Fin

r

Sen

Ou

Fig. 3. Sample images of each category in Dataset2: (a) airplanes, (b) cars, (c) faces, (d) guitars, (e) leaves, (f) motorbikes, and (g) background.


rain the SVM classifier α = − ( )SVM train X y,t ;ompute the weight vector w from Eq. (14);ompute the ranking criterion ci from Eq. (15) for all featurecomponents;

d the feature component: ^ = ∈i cargmini I i;

← [^ ]i r, , ← − { } ← − {^}^S S x I I i,i ;

et = ( )IX X : ,t ;d

tput = { | = [ ] = … }( )S x i r k k n, 1, 2, ,c i fine3 ;

d

3 http://wang.ist.psu.edu/docs/home.shtml

En

Rather than choosing the combination coefficient β empirically,we propose to obtain its value through a mathematical model thatcharacterizes the dependency of the classification accuracy on β.This model is formulated by fitting data to a polynomial functionof the k-th order as follows:

β β β( ) = + + ⋯ + ( )P d d d , 16kk

0 1

where ( = … )d i k0, 1, 2, ,,i is one of the kþ1 fitting coefficients whichcan be determined with the following least square fitting strategy.

Suppose that ( = … )P j J1, 2, ,j is one of the J classification ac-curacies achieved upon J different choices of β. With these discretedata, the fitting coefficients can be determined as follows:

∑ β( … ) = [ − ( )]( )( … ) =

d d d P P, , , arg min ,17

kd d d

j

J

j j0 1, , ,

1

2

k0 1

where β( )P j is the classification accuracy computed from Eq. (16)

when β β= j. With this model, the value of β can be obtained bysolving the following optimization problem:

β β= ( )( )β

Parg max .18

In summary, the fine feature selection algorithm is described inAlgorithm 2.

4. Experimental results and discussions

4.1. Datasets

Three image datasets were used in our experiments. The firstdataset, namely Dataset1, was used for coarse selection. As dis-cussed in Section 3.1.1, under the guidance of eye tracking data,totally 1500 hROIs were extracted from images in this dataset.Among these hROIs, half were used for training the SVM classifierand the other half were used to evaluate the performance of theselected features. In order to perform statistical comparisons, 25rounds of experiments were conducted with different initialpopulations.

The second dataset, namely Dataset2, is composed of 5306images randomly picked from the Caltech image database [52].Images in this dataset were divided to seven categories, namely,airplanes, cars, faces, guitars, leaves, motorbikes, and background.Our fine feature selection process was firstly performed on onehalf of the images (i.e., the training set) from this dataset and thenperformance evaluations of different feature selection algorithmswere conducted on the other half of images (i.e., the testing set).Some sample images of each category in Dataset2 are shown inFig. 3. A detailed description of Dataset2 is provided in Table 5.

The third dataset, namely Dataset3, is composed of 1000 ima-ges in 10 categories (100 images per category) picked from theCOREL image database.3 This dataset was constructed with thesame protocol as that in [53]. Sample images of each category inDataset 3 are shown in Fig. 4.

In order to perform statistical comparisons among differentalgorithms in the fine selection stage, 10 rounds of experimentswere conducted in both Dataset2 and Dataset3 by partitioning thedatasets randomly to two parts for feature selection and for per-formance evaluation in 10 different ways. Unless specified, ex-perimental results discussed below are those corresponding to oneof the 10 rounds.

http://wang.ist.psu.edu/docs/home.shtml

Table 5Statistics of Dataset2.

Category Number of images inthe category

Number of imagesfor training

Number of imagesfor testing

airplanes 800 400 400cars 1155 578 577faces 435 217 218guitars 1030 515 515leaves 186 93 93motorbikes 800 400 400background 900 450 450


4.2. Results of coarse feature selection

The coarse feature selection using eye tracking data was per-formed in Dataset1 with our proposed improved QGA algorithm(namely IQGA-E as referred to Algorithm 1). For the im-plementation of this algorithm, the population size K and theprobability of the crossover Pc were, respectively, set to 20 and0.1 for maintaining the diversity of the population. The maximumgeneration number tmax was set to 200 to ensure convergence of

Fig. 4. Sample images of each category in Dataset3. (a) African people and villages, (b)(i) mountains and glaciers, and (j) foods.

the evolution algorithm and meanwhile to keep a sufficientnumber of features for the subsequent fine selection. The numberof genes was set to 75, which is equal to the number of features inour study.

Since our IQGA-E algorithm is an evolution algorithm, differentinitial population will result in slightly different results, one ofwhich is presented in Table 6. As can be seen from this table, thereare 18 visual features selected: 13 color features, 3 texture fea-tures, and 2 shape features, which indicates that color featuresplay important roles in human vision when discriminating differ-ent types of images. Our classification evaluation shows that, withthese features, we can achieve the best accuracy of 92.4% in thetest set of the hROIs. The 18 selected features form a feature subset

( )Sf1 for the next stage fine selection. By concatenating all the 18

features, the feature subset ( )Sf1 is converted to a new set ( )Sc

1 with864 feature components.

We also compared our proposed IQGA-E coarse selection al-gorithm with two other algorithms: the traditional QGA and theimproved QGA without using eye tracking data (denoted as IQGA).When eye tracking data are not utilized, global feature extractionwas conducted on each image in Dataset1. All the algorithms were

beach, (c) buildings, (d) buses, (e) dinosaurs, (f) elephants, (g) flowers, (h) horses,

Table 6Results of coarse feature selection with IQGA-E.

Feature index 1 4 5 8 14 16

Feature name General histogram General histogram General histogram General histogram Accumulativehistogram

Accumulativehistogram

(in JPEG/YCbCr space) (in RGB color space) (in XYZ color space) (in YIQ color space) (in RGB color space) (in YCbCr color space)Dimensions 48 48 48 48 48 48

Feature index 21 22 26 30 43 52

Feature name HSV histogram HSV histogram Dominant colors Dominant colors Color moment Color layout(8:3:3 uniformquantization)


(in JPEG/YCbCrspace)

(in RGB color space) (in RGB color space) (in YcbCr space)

Dimensions 72 256 16 16 9 12

Feature index 53 57 59 65 73 74

Feature name Scalable color in 16:4:4 HSVspace

HTD Gabor texturedescriptor

Co-occurrencematrix

Legendre moment Zernike moment

Dimensions 66 62 24 8 25 10

Table 7Classification performance of three coarse feature selection algorithms (CI represents confidence interval).

Algorithm Accuracy% Precision% Recall% F1 score Feature number p-value(95%CI) (95%CI) (95%CI) (95%CI)

QGA 87.6 87.56 87.20 87.38 16 <p 0.05(87.02, 88.18) (87.10,88.02) (86.79,87.61) (86.96,87.80)

IQGA 89.2 89.59 89.20 89.39 20 <p 0.05(88.22, 90.18) (89.23,89.95) (88.79,89.61) (89.01,89.77)

IQGA-E 91.8 92.47 91.28 91.87 18 <p 0.05(91.53, 92.07) (92.28,92.66) (91.01,91.55) (91.65,92.09)


conducted 25 runs with different initial populations under thesame parametric settings as given above. As shown in Table 7, withthe help of eye tracking data, IQGA-E clearly outperforms bothIQGA and QGA. It is also observed that IQGA performs better thanQGA, which demonstrates the effectiveness of our improved mu-tation strategy.

In order to investigate whether the features selected with ourproposed IQGA-E algorithm are the best for image classification, 10rounds of experiments were performed using bootstrappingtechniques where a subset of 18 features was chosen randomly fortraining and classification in each round. Ten rounds of classifi-cation results based on 18 randomly selected features show thatwe can only achieve an average accuracy of 82.98% which is sig-nificantly lower than that (91.8%) achieved based on 18 featuresselected by our proposed IQGA-E.

4.3. Results of fine feature selection

In this section, we report the fine feature selection results fromour proposed mRMR-SVM-RFE method (referred to Algorithm 2)for the training set of Dataset2. This method sequentially imple-ments the mRMR method and the improved SVM-RFE methodwhich integrates the mRMR ranking information into the SVM-RFEranking criterion. Considering the fact that the number of featurecomponents in ( )Sc

1 is 864 (see Section 4.2), we kept 300 featurecomponents (i.e., =n 300mM ) after the mRMR selection in ourexperiments to balance the computational efficiency of the filtermodel and the wrapper model. Then the number of final selectedfeature components with mRMR-SVM-RFE, nfine, was set to 120,because when it increases, there is no clear improvement ofclassification accuracy in our experiments.

4.3.1. Parameter estimationIn our experiments, the combination coefficient β is

determined in a manner as described in Eqs. (16) and (17). In theexperiments, we used a fourth order (i.e., k ¼ 4) polynomialfunction to characterize the dependence of classification perfor-mance on β. For the robustness of our proposed algorithm, theabove-mentioned polynomial function was constructed underthree different settings for the parameter g in the RBF kernel of theSVM. These three values for g are, respectively, × −1 10 6, × −5 10 6

and × −8 10 6. With each value of g, classification performance wasobtained at 20 different values of β in the training set. By fittingthese data to Eq. (17), three dependence equations can be obtainedas follows:

β β β β β β

β β β β β

β β β β

( ) = − + − + + ( )

= − + − + + ( )

= − + − + + ( )

P P

P

1.07 1.93 0.99 0.16 0.90,

0.79 1.48 0.82 0.15 0.90,

0.97 1.93 1.20 0.26 0.88, 19

14 3 2

24 3 2

24 3 2

where β( )Pj ( = )j 1, 2, 3 is the classification accuracy under the j-thsetting of g. From these three equations, three β values maximiz-ing three accuracies in Eq. (19) can be obtained by performing theoptimization procedure of Eq. (18). Then three optimal β valuesobtained are 0.848, 0.854 and 0.877. The final optimal β value isdetermined as the arithmetic mean of these three values, i.e.,β = 0.859, while the parameter g in the final experiment is set to

× −1 10 6.

4.3.2. Features from fine selectionTable 8 gives the feature components selected from the 864-

dimension subset ( )Sc1 of 18 different features. It is observed that

totally 13 features with 120 feature components survive from ourfine selection procedure: 11 color features and 2 texture features.The missing of shape features indicates that general region basedshape features are not necessary to differentiate image categories.The top 4 most frequently selected features out of the 18 coarselyselected ones are the 16:4:4 uniformly quantized histogram in the

Table 8Results of fine feature selection with mRMR-SVM-RFE in Dataset2.


Feature name General histogram General histogram General histogram Accumulative histogram Accumulative histogram(in JPEG/YCbCr space) (in RGB color space) (in YIQ color space) (in RGB color space) (in YCbCr color space)

Selected dimensions 18 13 5 4 6


Feature name HSV histogram HSV histogram Dominant colors Dominant colors Color layout(8:3:3 uniform quantization) (16:4:4 uniform quantization) (in JPEG/YCbCr space) (in RGB color space) (in YcbCr space)

Selected dimensions 7 24 5 2 1


Feature name Scalable color in 16:4:4 HSV space HTD Gabor textureSelected dimensions 16 18 1

Fig. 5. Classification accuracies with different numbers of feature components selectedwith different fine selection algorithms (after coarse selection with IQGA-E).

Fig. 6. Classification accuracies for different image categories with the featuresselected using IQGA-E&mRMR-SVM-RFE (with eye tracking data) and IQ-GA&mRMR-SVM-RFE (without eye tracking data).


HSV space, the homogeneous texture descriptor (HTD), the generalhistogram in the JPEG/YCbCr color space, and the scalable color inthe 16:4:4 HSV color space. This result further indicates the im-portant role of colors in discriminating different types of images. Itis also observed that a large proportion of the components of thecolor features are from the HSV space, since HSV color spacecomplies better with human vision system.

4.4. Performance evaluation

4.4.1. Classification performanceIn this subsection, we present image classification results in the

test set of Dataset2 with selected features obtained using differentcombinations of methods, such as with or without the coarse se-lection and with or without eye tracking data. The SVM with theRBF kernel is also used as a classifier for performance evaluation.This classifier was also trained with the training set of Dataset2and the parameter g in the RBF kernel was set as × −1 10 6 ac-cording to a ten-fold cross validation of the classification resultswhen setting g to different values in the set{ … … }− −10 , 10 , , 10 , , 1010 9 0 10 .

The classification results are presented in Fig. 5, where themRMR&SVM-RFE method represents the hybrid one that performsfeature selection by simply implementing the filter type mRMRand the wrapper type SVM-RFE algorithm sequentially in the fineselection stage. As can be observed from this figure that themRMR&SVM-RFE method generally performs better than eitherthe mRMR based selection method or the SVM-RFE based one,which indicates the effectiveness of the hybrid method. ThemRMR-SVM-RFE method performs the best among all the com-pared methods, which owes to the fusion of ranking informationof the mRMR and the SVM-RFE methods. In particular, the per-formance improvement is more significant when a small numberof feature components are selected.

In order to study the impact of the eye tracking data on im-proving the classification performance with selected features,coarse selection was performed with and without eye trackingdata involved. As shown in Fig. 6 for testing images in Dataset2,eye tracking data are always helpful for improving image classifi-cation performance. The average accuracy for the IQGA-E&mRMR-SVM-RFE method is 94.21% which is clearly higher than 92.88% forthe IQGA&mRMR-SVM-RFE method. In particular, significant im-provement has been achieved for some categories such as guitarand motorbike. However, an exception exists for the leaf classwhere the classification accuracy for the case with eye trackingdata is lower than that without the eye tracking data. These maybe due to two reasons: the first one is that the eye tracking datawere only obtained from an image dataset with two classes of

images; the second one may be that some key features for humanvision in recognizing the leaf class are not included in our initialfeature pool with only 75 features, which therefore deterioratesthe final classification performance.

For investigating the performance improvement by using ourproposed IQGA other than the traditional QGA in the coarse se-lection stage, the features selected with these two methods incombination with our proposed mRMR-SVM-RFE fine selectionalgorithm were used for the classification of testing images inDataset2. Classification accuracies with respect to testing images

Fig. 7. Classification accuracies with the features selected using IQGA&mRMR-SVM-RFE and QGA&mRMR-SVM-RFE. Fig. 9. Comparison of different methods in terms of the area under the receiver

operating characteristic curve (AUC) in Dataset2.

Fig. 10. Comparison of different methods in terms of the area under the receiveroperating characteristic curve (AUC) in Dataset3.


in different categories in Dataset2 are shown in Fig. 7. From thisfigure, we can observe that by using our proposed IQGA as thecoarse selection method, the final classification accuracies can alsobe improved.

In order to validate the advantage of incorporating rankinginformation of mRMR into that of the SVM-RFE method, the fea-tures selected with our proposed IQGA-E coarse selection methodin combination with either the simple hybrid mRMR&SVM-RFEalgorithm or the rank fused mRMR-SVM-RFE algorithm were usedfor the classification of testing images in Dataset2. Results of thesetwo kinds of combinations are shown in Fig. 8. From this figure, itcan be observed that incorporating ranking information of themRMR method into the ranking process of the SVM-RFE method ishelpful for the feature selection and eventually improves the im-age classification accuracy.

To further evaluate different feature selection algorithms, theReceiver Operating Characteristic curve (ROC) metric is also usedfor performance comparison. Fig. 9 provides the values of the AreaUnder the receiver operating Characteristic curves (AUC) for dif-ferent feature selection methods in Dataset2. From this figure, wecan see that the AUC corresponding to our proposed IQGA-E&mRMR-SVM-RFE is the largest (0.9815), which shows its bestclassification performance among others. Similarly, Fig. 10 showsthe AUC values for different feature selection methods in Dataset3,which also demonstrates that our proposed IQGA-E&mRMR-SVM-RFE method clearly outperforms other methods.

T-tests were performed to assess whether the improvement ofour proposed method is of statistical importance. We randomlypartitioned Dataset2 into two parts in ten different ways. In eachway, half of the images were used for training the classifier with

Fig. 8. Classification accuracies with the features selected using IQGA-E&mRMR&SVM-RFE and IQGA-E&mRMR-SVM-RFE.

the features selected from different methods and the other halfwere used for testing. Performance evaluations of different featureselection algorithms in Dataset2 are presented in Table 9 wherethree other metrics (i.e., precision, recall and F1 score) are alsolisted along with the average classification accuracy. From thistable, it can be seen that, at the 0.05 level, the proposed IQGA-E&mRMR-SVM-RFE method performs the best among others. Si-milar evaluations were also conducted in Dataset3 based on oureye tracking data guided coarse selection results. As shown inTable 10, we can also conclude that, at the 0.05 level, our proposedIQGA-E&mRMR-SVM-RFE method performs the best in Dataset3,which indicates a consistent superiority of this algorithm toothers.

Similar to the coarse selection, in order to show whether thefinal 13 features selected with our proposed IQGA-E&mRMR-SVM-RFE are the best ones, 10 rounds of experiments were performedusing bootstrapping techniques where a subset of 13 features waschosen randomly for training and classification in each round. Tenrounds of classification results based on 13 randomly selectedfeatures show that we can only achieve an average accuracy of79.68% in Dataset2 and 59.4% in Dataset3 which are clearly lowerthan those (94.21% and 85.04%) achieved based on 13 featuresselected by our proposed IQGA-E&mRMR-SVM-RFE.

4.4.2. Computational timeTable 11 shows the computational times of different feature

selection methods when 120 feature components are selected. Asingle Core 2 GHz computer running Matlab2009b was used in the

Table 9Comparison of different feature selection methods in Dataset2.

Selection method Accuracy% Precision% Recall% F1 score p-value(95%CI) (95%CI) (95%CI) (95%CI)

mRMR-SVM-RFE 90.66 88.98 89.01 88.99 <p 0.5(90.51,90.81) (88.22,89.73) (88.32,89.70) (88.30,89.69)

QGA&mRMR-SVM-RFE 91.21 89.91 90.21 90.06 <p 0.5(91.00,91.42) (89.30,90.53) (89.59,90.84) (89.46,90.66)

IQGA&mRMR-SVM-RFE 92.88 92.25 92.37 92.31 <p 0.5(92.67,93.09) (91.61,92.90) (92.04,92.69) (91.86,92.77)

IQGA-E&mRMR-SVM-RFE 94.21 93.31 93.53 93.42 <p 0.5(94.01,94.41) (92.86,93.76) (93.04,94.01) (93.01,93.83)

IQGA-E&mRMR&SVM-RFE 93.15 92.17 92.56 92.36 <p 0.5(92.80,83.51) (91.01,93.33) (91.75,93.37) (91.40,93.33)

IQGA-E&mRMR 91.39 90.70 90.72 90.71 <p 0.5(91.08,91.70) (89.82,91.57) (89.76,91.67) (89.81,91.60)

IQGA-E&SVM-RFE 92.52 91.64 92.10 91.87 <p 0.5(92.27,92.77) (90.90,92.37) (91.79,92.41) (91.36,92.37)

Table 10Comparison of different feature selection methods in Dataset3.

Selection method Accuracy% Precision% Recall% F1 score p-value(95%CI) (95%CI) (95%CI) (95%CI)

mRMR-SVM-RFE 80.80 81.37 80.88 81.12 <p 0.5(79.96,81.64) (80.63,82.11) (80.06,81.70) (80.56,81.68)

QGA&mRMR-SVM-RFE 81.88 82.10 81.60 81.85 <p 0.5(81.34,82.42) (81.51,82.69) (81.25,81.95) (81.42,82.28)

IQGA&mRMR-SVM-RFE 83.12 83.51 82.72 83.11 <p 0.5(82.62,83.62) (83.04,83.98) (82.30,83.14) (82.73,83.49)

IQGA-E&mRMR-SVM-RFE 85.04 85.83 85.44 85.63 <p 0.5(84.65,85.43) (85.37,86.29) (84.87,86.01) (85.17,86.09)

IQGA-E&mRMR&SVM-RFE 83.47 84.36 83.92 84.14 <p 0.5(83.03,83.91) (84.68,85.04) (83.50,84.34) (83.62,84.66)

IQGA-E&mRMR 81.04 80.91 80.56 80.74 <p 0.5(80.59,81.49) (80.38,81.44) (79.72,81.39) (80.11,81.37)

IQGA-E&SVM-RFE 82.52 82.46 82.24 82.35 <p 0.5(82.01,83.03) (82.04,82.88) (81.97,82.51) (82.01,82.69)

Table 11Comparison of computational time.

Method Computational time (s)

With coarse selection Without coarse selection

mRMR 202 747SVM-RFE 5293 19,584mRMR-SVM-RFE 472 1094


experiments. From this table, we can see that the filter modelmRMR is the least computationally expensive while the wrappermodel SVM-RFE is the most. Although the computational cost ofthe hybrid method is higher than that of the mRMR, it is muchlower than that of the SVM-RFE. In addition, when the mRMR-SVM-RFE was applied directly to the original 75 image featureswith 3268 components, 1094 s was required to select a 120-di-mension feature set which could achieve an average classificationaccuracy of 90.66%. However, it only took 472 s for our proposedtwo-step IQGA-E&mRMR-SVM-RFE method to select a 120-di-mension feature and achieved a classification accuracy of 94.21%.Therefore, eye tracking data guided feature selection is both moreefficient and more effective than existing methods.

5. Conclusions and future work

In this paper, we present a two-stage feature selection method

for image classification by taking human factors into account.Rather than relying on mathematically plausible techniques, weidentify a number of features with the help of eye tracking datawhich reveal how humans perceive visual content. In the coarsefeature selection stage, an improved quantum genetic algorithm(IQGA) is proposed to utilize eye tracking data. In the fine featureselection stage, a novel hybrid feature selection method is pro-posed to make use of both the efficiency merit of the filter typemRMR method and the effectiveness merit of the wrapper typeSVM-RFE method by integrating the ranking information of thesetwo methods. Comprehensive experiments have been conductedwith respect to 75 visual features in three image datasets. Ex-perimental results consistently demonstrate that eye trackingdata are clearly helpful in improving the performance of imageclassification. In addition, the coarse-to-fine selection strategygreatly improves the efficiency of the whole feature selectionprocess.

Based on the promising results achieved, we discuss our futurework from three aspects. First, our future research will focus oninvestigating advanced algorithms to better model eye trackingdata from both static and dynamic perspectives. In this work, theeye tracking data were only used to guide the determination ofhROIs. Since both the order of the fixation points and gaze dura-tion are highly important for human vision, taking these twomeasures into account in our feature selection procedure will nodoubt be very helpful for improving the performance of imageclassification further. Second, we will investigate how differenttypes of images impact eye tracking data. In our current


experiments, the features for coarse selection were simply de-termined according to a two-class (object-distinctive vs non-object-distinctive) classification problem with the guidance ofthe eye tracking data. However, such a simple setting would belimited in the classification problem with a large number ofclasses. Last, we will investigate more features such as ScaleInvariant Feature Transform (SIFT) descriptors and Local BinaryPattern (LBP) features, since the ones used in our work aremainly global features.

Acknowledgments

This work is supported by the National Natural Science Foun-dation of China, No. 60871086 and No. 61473243, the NaturalScience Foundation of Jiangsu Province China, No. BK2008159 andthe Natural Science Foundation of Suzhou No. SYG201113. Theauthors thank the anonymous reviewers for their constructivecomments and valuable suggestions.

References

[1] A. Jain, D. Zongker, Feature selection: evaluation, application, and smallsample performance, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997)153–158.

[2] I. Guyon, A. Elisseeff, An introduction to feature and variable selection, J. Mach.Learn. Res. 3 (2003) 1157–1182.

[3] X. Liu, A. Mondry, An entropy based gene selection method for cancer clas-sification using microarray data, BMC Bioinform. 6 (2005) 76.

[4] Z. Xu, I. King, M.-T. Lyu, R. Jin, Discriminative semi-supervised feature se-lection via manifold regularization, IEEE Trans. Neural Netw. 21 (2010)1033–1047.

[5] M. Dash, K. Choi, P. Scheuermann, H. Liu, Feature selection for clustering—afilter solution, in: Proceedings of Second International Conference on DataMining, pp. 115–122.

[6] R. Caruana, D. Freitag, Greedy attribute selection, in: Proceedings of 11th In-ternational Conference on Machine Learning, pp. 28–36.

[7] J. Zhang, H. Deng, Gene selection for classification of microarray data based onBayes error, BMC Bioinform. 8 (2007) 370.

[8] H. Peng, P. Long, C. Ding, Feature selection based on mutual information cri-teria of max-dependency, max-relevance and min-redundancy, IEEE Trans.Pattern Anal. Mach. Intell. 27 (2005) 1226–1238.

[9] C. Ding, H. Peng, Minimum redundancy feature selection from microarraygene expression data, in: Proceedings of Second IEEE Computational SystemsBioinformatics Conference, pp. 523–528.

[10] R. Battiti, Using mutual information for selecting features in supervised neuralnet learning, IEEE Trans. Neural Netw. 5 (1994) 537–550.

[11] I. Guyou, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classifi-cation using support vector machines, Mach. Learn. 46 (2002) 389–422.

[12] M. Yousef, S. Jung, L. Showe, M. Showe, Recursive cluster elimination (RCE) forclassification and feature selection from gene expression data, BMC Bioinform.8 (2007) 114.

[13] K.B. Duan, J.C. Rajapakse, H. Wang, F. Azuaje, Multiple SVM-RFE for gene se-lection in cancer classification with expression data, IEEE Trans. Biosci. 4(2005) 228–234.

[14] M. Wahde, Z. Szallasi, A survey of methods for classification of gene ex-pression data using evolutionary algorithm, Expert Rev. Mol. Diagn. 6 (2006)101–110.

[15] P.A. Mundra, J.C. Rajapakse, SVM-RFE with MRMR filter for gene selection, IEEETrans. Nanobiosci. 9 (2010) 31–37.

[16] Y. Leung, Y. Hung, A multiple-filter-multiple-wrapper approach to gene se-lection and microarray data classification, IEEE/ACM Trans. Comput. Biol.Bioinform. 7 (2010) 108–117.

[17] C. Shang, D. Barnes, Fuzzy-rough feature selection aided support vector ma-chines for Mars image classification, Comput. Vis. Image Underst. 117 (2013)202–213.

[18] A. Vavilin, K.-H. Jo, Automatic context analysis for image classification andretrieval based on optimal feature subset selection, Neurocomputing 116(2013) 201–207.

[19] C.-Y. Chang, S.-J. Chen, M.-F. Tsai, Application of support-vector-machine-based method for feature selection and classification of thyroid nodules inultrasound images, Pattern Recognit. 43 (2010) 3494–3506.

[20] S. Zhong, Y. Liu, Y. Liu, F. Lai Chung, A semantic no-reference image sharpnessmetric based on top-down and bottom-up saliency map modeling, in: IEEE17th International Conference on Image Processing, pp. 1553–1556.

[21] L. Wang, Feature selection with kernel class separability, IEEE Trans. PatternAnal. Mach. Intell. 30 (2008) 1534–1546.

[22] O. Oyekoya, F. Stentiford, Exploring human eye behaviour using a model ofvisual attention, in: 17th International Conference on (ICPR'04), vol. 4, IEEEComputer Society, Washington, DC, USA, pp. 945–948.

[23] O. Oyekoya, F. Stentiford, Perceptual image retrieval using eye movements, in:Advances in Machine Vision, Image Processing, and Pattern Analysis, 2006, pp.281–289.

[24] Z. Liang, H. Fu, Y. Zhang, Z. Chi, D.D. Feng, Content-based image retrieval usinga combination of visual features and eye tracking data, in: Proceeding ETRA’10: Proceedings of the 2010 Symposium on Eye-Tracking Research and Ap-plications, pp. 41–44.

[25] A. Draa, S. Meshoul, H. Talbi, M. Batouche, A quantum-inspired differentialevolution algorithm for solving the n-queens problem, Int. Arab J. Inf. Technol.7 (2010) 21–27.

[26] K.E.A. van de Sande, T. Gevers, C.G.M. Snoek, Evaluating color descriptors forobject and scene recognition, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010)1582–1596.

[27] Y.D. Chun, N.C. Kim, I.H. Jang, Content-based image retrieval using multi-resolution color and texture features, IEEE Trans. Multimed. 10 (2008)1073–1084.

[28] L. Nanni, J. Shi, S. Brahnam, A. Lumini, Protein classification using texturedescriptors extracted from the protein backbone image, J. Theor. Biol. 24(2010) 1024–1032.

[29] M. Carlin, Measuring the performance of shape similarity retrieval methods,Comput. Vis. Image Underst. 84 (2001) 44–61.

[30] Text of ISO/IEC 15 938-3 Multimedia Content Description Interface—Part 3:Visual. Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, 2001. Doc. N4062.

[31] M. Swain, D. Ballard, Color indexing, Int. J. Comput. Vis. 7 (1991) 11–32.[32] L. Cieplinski, Mpeg-7 color descriptors and their applications, in: Proceedings

of 9th International Conference on Computer Analysis of Images and PatternsSeville, pp. 11–20.

[33] M. Stricker, M. Orengo, Similarity of color images, in: Proceedings of SPIEStorage and Retrieval for Image and Video Databases, pp. 381–392.

[34] J. Smith, S. Chang, Single color extraction and image query, in: IEEE Interna-tional Conference on Image Processing, vol. 3, pp. 528–531.

[35] J. Hafner, H. Sawhney, W. Equitz, M. Flickner, W. Niblack, Efficient color his-togram indexing for quadratic form distance functions, IEEE Trans. PatternAnal. Mach. Intell. 17 (1995) 729–736.

[36] C.S. Won, D.k. Park, Image block classification and variable block size seg-mentation using a model-fitting criterion, Opt. Eng. 36 (1997) 2204–2209.

[37] B.S. Manjunath, J. Ohm, V.V. Vasudevan, A. Yamada, Color and texture de-scriptors, IEEE Trans. Circuits Syst. Video Technol. 11 (2001) 703–715.

[38] H. Tamura, S. Mori, T. Yamawaki, Texture features corresponding to visualperception, IEEE Trans. Syst. Man Cybern. 8 (1978) 460–473.

[39] J. Han, K. Ma, Rotation-invariant and scale-invariant Gabor features for textureimage retrieval, Image Vis. Comput. 25 (2007) 1474–1481.

[40] M. Kreutz, H.B. Völpel, Scale-invariant image recognition based on higher-order autocorrelation features, Pattern Recognit. 29 (1996) 19–26.

[41] R.M. Haralick, Edge and region analysis for digital image data, Comput. Graph.Image Process. 12 (1980) 60–73.

[42] R.C. Gonzalez, R.E. Woods, Digital Image Processing, 2nd edition, Prentice Hall,New Jersey, 2002.

[43] K.R.R.R. Mukundan, Fast computation of Legendre and Zernike moments,Pattern Recognit. 28 (1995) 1433–1442.

[44] R.M.C.W. Chong, P. Raveendran, An efficient algorithm for fast computation ofpseudo-Zernike moments, Int. J. Pattern Recognit. Artif. Intell. 17 (2003)1011–1023.

[45] Z. Liang, H. Fu, Z. Chi, et al., Image pre-classification based on saliency map forimage retrieval, in: IEEE 7th International Conference on Information, Com-munications and Signal Processing (ICICS 2009), pp. 1–5.

[46] Y.W. Jeong, J.B. Park, S.H. Jang, A new quantum-inspired binary PSO: appli-cation to unit commitment problems for power systems, IEEE Trans. PowerSyst. 25 (2010) 1486–1495.

[47] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik, Featureselection for SVMs, in: Proceedings of NIPS'2000, pp. 668–674.

[48] K.H. Han, J.H. Kim, Genetic quantum algorithm and its application to combi-natorial optimization problem, in: IEEE Proceedings of the 2000 Congress onEvolutionary Computation, pp. 1354–1360.

[49] T.C. Lu, G.R. Yu, An adaptive population multi-objective quantum-inspiredevolutionary algorithm for multi-objective 0/1 knapsack problems, Inf. Sci.(2013) 39–56.

[50] K.-H. Han, J.-H. Kim, Quantum-inspired evolutionary algorithms with a newtermination criterion, hϵ gate and two-phase scheme, IEEE Trans. Evol. Com-put. 8 (2004) 156–169.

[51] M. Luo, L. Luo, Feature selection for text classification using + −OR SVM RFE,in: Control and Decision Conference (CCDC), pp. 1648–1652.

[52] G. Griffin, A. Holub, P. Perona, Caltech 256 Object Category Dataset, TechnicalReport UCB/CSD-04-1366, California Institute of Technology, 2007.

[53] R. Kachouri, K. Djemal, H. Maaref, Adaptive feature selection for hetero-geneous image databases, in: Conference on Image Processing Theory Toolsand Applications (IPTA), pp. 26–31.

http://refhub.elsevier.com/S0031-3203(16)30265-5/sbref1






























































































































Xuan Zhou received his B.Sc. from Nanjing University Jinling college, China, in 2012
, and is now a Master Student in Electronic Engineering in Soochow University, SuZhou,China. His current research focuses on image classification.
Xin Gao received his Ph.D. from Zhejiang University, China, in 2004, and now is a Researcher in Suzhou Institute of Biomedical Engineering and Technology, ChineseAcademy of Science. His research is manly focused on medical imaging, evaluation of radiotherapy, interventional diagnosis and treatment.

Jiajun Wang received his B.Sc. and M.Sc. both in physics, in 1992 and 1995, from Soochow University, China and his Ph.D. in Biomedical Engineering from ZhejiangUniversity, in 1999. He is currently a Professor with the School of Electronic and Information Engineering, Soochow University, China. His research is mainly focused onmedical imaging, image processing, pattern recognition and bioinformatics. He has published more than 40 scientific journal or conference papers.

Hui Yu received her B.Sc. from Soochow university, China, in 2009, and is now a Master Student in Electronic Engineering in Soochow University, SuZhou, China. Her currentresearch focuses on image classification.

Zhiyong Wang received his B.Eng. and M.Eng. degrees in Electronic Engineering from South China University of Technology, Guangzhou, China, and his Ph.D. degree fromHong Kong Polytechnic University, Hong Kong. He is a Senior Lecturer of the School of Information Technologies, the University of Sydney, after joining the school as aPostdoctoral Research Fellow. His research interests include multimedia information processing, retrieval and management, Internet multimedia computing, human-cen-tered multimedia computing, pattern recognition, and machine learning.

Zheru Chi received the B.Eng. and M.Eng. degrees from Zhejiang University, in 1982 and 1985, respectively, and the Ph.D. degree from the University of Sydney, in March1994, all in Electrical Engineering. Between 1985 and 1989, he was a Faculty Member of Department of Scientific Instruments, Zhejiang University. He worked as a SeniorResearch Assistant/Research Fellow in the Laboratory for Imaging Science and Engineering, University of Sydney, from April 1993 to January 1995. Since February 1995, hehas been with Hong Kong Polytechnic University, where he is now an Associate Professor in the Department of Electronic and Information Engineering. Since 1997, he hasserved on the Organization or Program Committees for a number of international conferences. He was an Associate Editor of the IEEE Transactions on Fuzzy Systems between2008 and 2010, and is currently an editor of the International Journal of Information Acquisition. His research interests include image processing, pattern recognition, andcomputational intelligence. He has authored/co-authored one book and 11 book chapters, and published more than 190 technical papers. He is a member of the IEEE.

eye tracking data guided feature selection for image ...enzheru/publications/pr-zhou-2017.pdf ·...

Documents