multi-informationflowcnnandattribute …downloads.hindawi.com/journals/cin/2019/7028107.pdfe ﬁrst...

Research ArticleMulti-Information Flow CNN and Attribute-Aided Reranking forPerson Reidentification

Haifeng Sang1 Chuanzheng Wang 1 Dakuo He 2 and Qing Liu2

1School of Information Science and Engineering Shenyang University of Technology Shenyang Liaoning 110870 China2College of Information Science and Engineering Northeastern University Shenyang Liaoning 110004 China

Correspondence should be addressed to Chuanzheng Wang qing_0304outlookcom

Received 7 November 2018 Accepted 8 January 2019 Published 6 February 2019

Academic Editor Elio Masciari

Copyright copy 2019 Haifeng Sang et al 0is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

0is paper presents a multi-information flow convolutional neural network (MiF-CNN) model for person reidentification (re-id)It contains several specific multilayer convolutional structures where the input and output of a convolutional layer are con-catenated together on channel dimension With this idea layers of model can go deeper and feature maps can be reused by eachsubsequent layer Inspired by an image caption a person attribute recognition network is proposed based on long-short-termmemory network and attention mechanism By fusing identification results of MiF-CNN and attribute recognition this paperintroduces the attribute-aided reranking algorithm to improve the accuracy of person re-id further Experiments on VIPeRCUHK01 and Market1501 datasets verify the proposed MiF-CNN can be trained sufficiently with small-scale datasets and obtainoutstanding accuracy of person re-id Contrast experiments also confirm the availability of the attribute-assistedreranking algorithm

1 Introduction

Person reidentification (re-id) refers to matching and rec-ognizing the identities of pedestrians captured by multi-cameras with nonoverlapping views which is significant toimprove the efficiency of the security system Owing to thelow resolution of cameras it is hard to obtain discriminativeface features so the current person re-id methods are mainlybased on visual features of pedestrians such as color andtexture [1] In practice changes in viewpoint pose andillumination among different camera views as well as partialocclusions and background clutters pose a great challenge toperson re-id [2]

Two principal person re-id methods are feature repre-sentation and metric learning Feature representation seeksto find features with stronger discrimination and betterrobustness to represent pedestrians Many kinds of featureshave been utilized for this in which appearance features arethe simplest and the most popular ones Color texture andshape are the features that can be extracted for humanappearance [3]in feature representation such as HSV color

histogram LBP texture and Gabor features and then usedfor reidentifying people with similarity among pedestrianfeatures Attribute features are also widely used in person re-id Common attributes include gender length of hair andclothing 0ese attributes are highly intuitive and un-derstandable descriptors which have proved to be successfulin several tasks such as face recognition and activity rec-ognition [4] Although attribute features are complicated interms of extraction and expression they contain rich se-mantic information and are more robust to illumination andviewpoint changes 0erefore the combination of attributefeatures and low-level features can effectively improve theaccuracy of person re-id [5] 0e metric learning methodsemploy the machine learning algorithm to learn a goodsimilarity metric which makes the feature similarity of thesame pedestrian greater than that of different pedestrians

In recent years deep learning has shown great success ina variety of tasks in image classification and frequencydomain [6] where CNN is particularly outstanding Com-pared with the traditional methods CNN has strongerfeature learning ability and the learned features are more

HindawiComputational Intelligence and NeuroscienceVolume 2019 Article ID 7028107 12 pageshttpsdoiorg10115520197028107

intrinsically representative to the original data so it hasbetter performance in extracting image features Two typesof CNNmodels are commonly employed in the community0e first type is the classification model as used in imageclassification and the second is the Siamese model usingimage pairs or triplets as input [7] Most of the existingpublic datasets of person re-id only contain thousands ofpedestrian image samples a small number of trainingsamples can easily lead to overfitting which limits theperformance of person re-id model In addition the deepneural networks for person re-id are similar in structure thatis the feature maps extracted by convolutional layer aredirectly fed into the next convolutional layer [8ndash12] Suchstructure usually ignores the correlation among features ofeach layer thus reducing the mobility of feature informationto some extent In the process of back propagation as thenumber of layers in neural network deepens the gradientupdate information may attenuate in exponential form andcause vanishing gradient problem

0is work proposes to develop a modified deep neuralnetwork model for person re-id that could reduce overfittingcaused by the lack of training samples Moreover this workaims to improve the identify accuracy of person re-id net-work with assistance of pedestrian attribute recognition Tothis end contribution of this paper is three-fold first thispaper designs a multi-information flow convolutional neuralnetwork (MiF-CNN) to solve the person re-id problem 0enetwork contains a series of multi-information flow con-volution structures which connect the input and output ofeach convolutional layer together realizes the reuse offeatures and enhances the feature information flow andgradient back propagation of the entire network Secondthis paper designs a person attribute recognition network(PARN) based on long-short-term memory (LSTM) net-work and attention mechanism 0e PARN decodes pe-destrian visual features extracted by MiF-CNN into attributefeatures and outputs the attribute words of each person0ird this paper presents an attribute-aided reranking al-gorithm which rematches attribute features among samplesto aid more positive samples rank higher in rank list so as toimprove the identify accuracy further

0e rest of this paper is organized as follows Section 2reviews the state of the art for person re-id Section 3 in-troduces the details of MiF-CNN Section 4 shows theprinciple of the PARN 0e proposed attribute-aidedreranking algorithm is detailed in Section 5 0e experi-mental results and analysis are given in Section 6 Finallyconclusion and future works are discussed in Section 7

2 Related Works

0e early person re-id methods extract the manuallydesigned features to represent pedestrians Farenzenaet al divided pedestrian images into multiple areas andextracted three complementary kinds of featuresweighted color histograms maximally stable color re-gions and recurrent high-structured patches 0enmatch these features and measure the similarity betweenpedestrian pair [13] Yang et al proposed a novel salient

color names based color descriptor (SCNCD) which wasutilized to guarantee that a higher probability will beassigned to the color name near to the color Based onSCNCD color distributions in different color spaceswere fused into feature representation for person [14]Bazzani et al proposed asymmetry-based HPE de-scriptor which accumulated HSV histogram of multiplepedestrian images as a global appearance feature anddetected patches portraying highly informative recurrentingredient in local regions as local feature [15] Wu et aldesigned a novel gradient self-similarity (GSS) featurebased on HOG to capture the patterns of pairwisesimilarities of local gradient patches 0e combination ofHOG and GSS achieved improvement in person re-idaccuracy [16]

Apart from manually designed low-level featuresattribute features that represent mid-level semantic in-formation apply to person re-id as well Compared withlow-level descriptors attributes are more robust to imagetranslations [7] Layne et al labeled 15 binary attributesfor the VIPeR dataset and trained SVM to detect attri-butes 0ey also learned a weighted L2-norm distancemetric to fix each attribute and fused them with low-levelvisual features [17] Wang et al predicted complete at-tribute vector by exploiting both visual feature andmarked attributes and obtained the overall ranking list byfusing the rank result from visual features and attributevectors separately [18] Chen et al learned attribute ofperson by part-specific CNN and merged them withanother identification CNN embedding in a tripletstructure for person re-id task [19] Wang et al proposed adeep neural network that contains an auto-encoder modelto learn hidden attributes of person from visual feature inan unsupervised manner which alleviated the re-quirement of massive annotation [20]

Deep learning has become popular for solving person re-id problems in recent years Ahmed et al present a deepCNN architecture for person re-id 0e architecture com-puted differences in feature values across the two viewsaround a neighborhood of each feature location to addrobustness to positional differences in corresponding fea-tures of the two input images [12] Cheng et al proposed anovel multichannel CNN After the first layer of CNNfeatures were divided into four equal parts that aimed tolearn features for the respective body part 0e proposedCNN was trained with improved triplet loss function [21]Lin et al used ResNet [22] as the base network to learn low-level features and attributes jointly and trained networkwith combining the person re-ID loss and attribute pre-diction loss [23] Yan et al proposed an attention blockwhich learned par-level attention on different local regionsand integrated the proposed block into existing CNNstructures for training with the identify loss [24] Inspired byabove works this paper proposes a multi-information flowconvolutional neural network to extract discriminative pe-destrian features In addition this paper designs a personattribute recognition network based on LSTM and attentionmechanism for improving person re-id results with assis-tance of attributes recognition

2 Computational Intelligence and Neuroscience

3 Multi-Information Flow ConvolutionalNeural Network

0e proposed MiF-CNN solves the person re-id problemwith classification thought 0e overall network structure isshown in Figure 1 0e structure includes 2 shallow con-volutional layers 3 multi-information flow convolutionalstructures with novel connection pattern fully connectedlayers max pooling layers and classification output layersLow-level features of pedestrian images are extracted first by2 shallow convolutional layers After deeper multi-information flow convolutional structures MiF-CNNextracted higher level features 0e final discriminativepedestrian feature vectors are obtained after reducing di-mensions by pooling layers and integrating by fully con-nected layers

31 Features Extraction In MiF-CNN all convolutionalfilters are 3times 3 with stride 1 Batch normalization and ReLUactivation function are applied after each convolutionallayer 0e operation process of convolutional layer can beformulated as

z(l)j 1113944

i

x(lminus1)i otimesw

(l)j

x(l)j σ zl

j1113872 1113873

⎧⎪⎪⎨

⎪⎪⎩lgt 1 (1)

where x(lminus1)i is the i-th feature map from the (lminus 1)-th

convolutional layer z(l)j is the convolutional output of x

(lminus1)i

x(l)j is the j-th feature map from the l-th convolutional layer

w(l)j is the filter on the j-th feature map in the l-th con-

volutional layer and otimes represents the convolutional opera-tion0e process of convolutional layers extracting features isthat neuron on the j-th feature map in the l-th convolutionallayer sum each feature map after connecting and convolutionby filter w

(l)j and map the extracted features on j-th feature

map in the l-th convolutional layer σ(middot) is the ReLU acti-vation function which is formulated as σ(x) max(0 x)Because of batch normalization bias is ignored

32Multi-Information Flow Convolutional Structure In thisstructure both output and input of the current convolu-tional layer are concatenated together and fed into the nextconvolutional layer ie the input of each layer is theconnection combination of outputs from all previous layers0e detail of multi-information flow convolutional struc-tures is shown in Figure 2

0is connection pattern makes feature maps of eachlayers be reused by all subsequent layers in forward prop-agation process which makes the whole CNN model learnmore feature information of pedestrian images It can beconsidered as a special ldquoData Augmentationrdquo in featuremaps so as to enhance the information mobility of thenetwork In back propagation process gradient of input ineach layer contains derivative of loss function with respect toinput which makes propagation of gradient more effectiveand network easier to be trained In multi-information flow

convolutional structure the number of feature maps thateach layer outputs is a constant value ρ so the number offeature maps that l-th layers outputs is ρ0 + ρ(lminus 1) where ρ0is the number of feature maps in the initial layer Supposingthe feature map of l-th channel in the initial layer is x

(0)i

where i isin (1 ρ0) then the feature map of j-th channel thatinitial layer outputs can be expressed as

z(1)j 1113944

i

x(0)i otimesw

(1)j (2)

where w(1)j is the weight of the initial layer 0e output after

activation function is

a(1)j σ z

(1)j1113872 1113873 (3)

where j isin (1 ρ) 0e feature map that the l-th layer outputscan be expressed as

z(l)q 1113944

p

x(lminus1)p otimesw(l)

q

a(l)q σ z(l)

q1113872 1113873

⎧⎪⎪⎨

⎪⎪⎩lgt 0 (4)

where p isin (1 ρ0 + ρ(lminus 1)) q isin (1 ρ) 0e output of thel-th layer after concatenate operation is

x(l)r x

(lminus1)p a

(l)q1113960 1113961 lgt 0 (5)

where r isin (1 ρ0 + ρ middot l) [middot middot] represents the concatenateoperation on channel dimension

In the process of back propagation supposing Δx(l)r is

the derivative of loss function with respect to x(l)r Due to x(l)

r

containing x(lminus1)p and a(l)

q it produces two parts of gradientas shown below

Δa(l)q Δx(l)

r middotzx(l)

r

za(l)q

Δx(lminus1)p Δx(l)

r middotzx(l)

r

zx(lminus1)p

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(6)

where Δa(l)q is the gradient of output from the l-th layer after

activation functionΔx(lminus1)p is the gradient of output from the

(lminus 1)-th layer 0e gradient of weight in the l-th layer is

Δz(l)q Δa(l)

q middotza(l)

q

zz(l)q

Δx(l)r middot

zx(l)r

za(l)q

middot σprime z(l)q1113872 1113873

Δw(l)q Δz(l)

q middotzz(l)

q

zw(l)q

Δz(l)q middot x

(lminus1)p

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)

where Δz(l)q is the gradient of the convolution result in the

l-th layer σprime(z(l)q ) is the derivative of the activation function

with respect to z(l)q andΔw(l)

q is the gradient of weights in thel-th layer 0e network utilizes Δw(l)

q to update weights ofeach layer which is formulated as

w(l)q1113872 1113873new w

(l)q minus η middot Δw(l)

q (8)

Computational Intelligence and Neuroscience 3

where η is the learning rate Gradient keeps back propa-gation to the (lminus 1)-th layer 0e gradient of x(lminus1)

p is

Δx(lminus1)p Δz(l)

q middotzz(l)

q

zx(lminus1)p

Δz(l)q middot w

(l)q (9)

As shown in equations (6) and (9) the loss functionproduces two flows of gradient with respect to outputs ofeach convolutional layer which makes error informationpropagating more effective in network and restrains thevanishing gradient to a certain extent

0e proposed MiF-CNN includes three multi-information flow convolutional structures Between everytwo multi-information flow convolutional structures amiddle pooling layer is applied to compares the redundancyfeatures Hyperparameter ρ is set with a small value When itcomes to a new multi-information flow convolutionalstructure ρ is doubled Such a design makes each convo-lution layer learn a small quantity of features and reduce theredundant features so as to optimize the efficiency of net-work With deeper layers the network can learn more high-level and complex pedestrian features and improve the finalidentification accuracy

33 Loss Function Current deep learning algorithms usuallyuse cross-entropy loss as cost function which is formulatedas

LS minus1

M1113944

M

i1y

(i) logeθ

Ty(i)x

(i)

1113936kj1e

θTj x(i)

(10)

where y(i) is the ground truth of pedestrian categories intraining set θ is the parameter of the last fully connectedlayer x(i) is the feature vector of training samples k is thenumber of pedestrian categories in the training set andM isthe batch size

However in practice when using cross-entropy lossmerely if the quality of extracted features is not goodenough it will lead to intraclass distance being greater thaninterclass distance Aiming at this problem Wen et alproposed center loss in 2016 [25] Combination of cross-entropy loss and center loss can enhance the discriminationand generalization ability of the network Center loss isdefined as follows

LC 12M

1113944

M

i1xi minus cj

2

2 (11)

where cj is the center of the j-th pedestrian feature and xi isthe feature vector of pedestrian Center loss minimizes thedistance between feature and its center in order to reduce theintraclass distance Center cj is updated with equation (12)

Δctj

1113936Mi1δ yi j( 1113857 middot ct

j minus xi1113872 1113873

1 + 1113936Mi1δ yi j( 1113857

ct+1j ct

j minus β middot Δctj

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

(12)

where β is the update rate δ(yi j) is 1 if prediction equalsto ground truth otherwise is 0 0at is to say center isupdated only when network predicts correctly

(ρ0) (ρ) (ρ0 + ρ) (ρ) (ρ0 + 2ρ) (ρ0 + l middot ρ)(ρ)

Input Conv3 times 3

Conv3 times 3

Conv3 times 3amp amp amp

1st layer 2nd layer lndashth layer

Figure 2 Multi-information flow convolutional structures where ldquoamprdquo represents concatenate operation on channel dimension

MiF P MiF P MiF

Input image

Pooling2 times 2

FC1024

Somaxk

Conv3 times 3

Conv3 times 3

Figure 1 Architecture for MiF-CNN where green rectangle denotes the multi-information flow convolutional structures orange rectangledenotes the middle pooling layer and k is the number of pedestrian categories in the training set


4 Person Attributes Recognition Network

0e rank list of MiF-CNN is shown in Figure 3 In theincorrect identification results of rank 1 (pedestrian B andpedestrian C) there is a big difference in attribute featuresbetween the top-ranking negative samples and the queryimage including gender clothing whether carrying hand-bag or not and so on Hence this paper recognizes personattributes for improving accuracy of person re-id

Based on Encoder-Decoder idea Xu et al [26] proposeda neural network model that can learn and generate thecontent of images 0e model utilized CNN as encoder toextract features of images which were then fed into a re-current neural network (RNN) for decoding into languagecaptions of images Inspired by that this paper presents aperson attribute recognition network (PARN) with LSTMand attention mechanism 0e proposed PARN takes pe-destrian features that are extracted by MiF-CNN as inputand outputs the attributes information of pedestrian images0e architecture of PARN is demonstrated in Figure 4

41 Input of PARN In PARN the input is the feature mapsthat are before the last fully connected layer in the MiF-CNN0e input feature is split into n feature vectors each ofwhich corresponds to a part of the image Each feature vectoris a DX-dimensional vector which is represented as

X x1 x2 xn1113864 1113865 xi isin RDX (13)

Referring to the natural language processing methodPARN transforms words in person attribute labels into wordembedding As a part of the input of PARN each wordembedding is a DY-dimensional vector

y y1 y2 ym1113864 1113865 yi isin RDY (14)

where m is the number of attribute words in each pedestrianimage and yi is the word embedding corresponding to eachattribute word

For attribute words common one-hot encoding con-siders each word as an individual which ignores the cor-relation among words However word embeddingrepresents each word as a continuous dense vector whichmakes those correlative words closer in space

42 Attention Mechanism Attention mechanism has beenwidely used in natural language processing and computer visionBy measuring the correlation between the output and differentparts of the input attention mechanism gives different weightsto different parts of the input enabling the network to use moreimportant feature information for prediction and reduce thedimension of input data [27]

In practice a certain attribute of pedestrian is onlycorresponding to a certain part of the image For examplewhen recognizing whether a pedestrian is wearing a hatpeople only pay attention to the area above the head of thepedestrian usually instead of other areas irrelevant to theattribute 0erefore before feeding the image features intothe LSTM network attention mechanism is introduced to

calculate the correlation between different positions ofimage features and the hidden state of LSTM at the previoustime 0e schematic diagram of attention mechanism isshown in Figure 5

At time t the fully connected layer and tanh function areused to integrate the information of the input feature vectorand the hidden state of LSTM at the previous time which isformulated as

vi tanh WattXxi + Watthhtminus1( 1113857 (15)

where WattX and Watth represent the fully connected layerweights of xi and htminus1 respectively 0e attention weights αi

is obtained by supplying softmax function on score vector vi

αi softmax Wattvvi( 1113857 (16)

where Wattv represents the fully connected layer weights ofvi 0e output Z is the weighted sums of all αi

Z 1113944n

i1αixi (17)

Feature Z highlights the local features that are helpful topredict and suppress the other local features that makesmall contribution to prediction which reduces the di-mension of features to some extent and makes LSTMnetwork focus on the part of input features that havegreater correlation with prediction while recognizingperson attributes so that improves the efficiency andprediction accuracy of the network

43 LSTM Normal RNN updates network parameters withback propagation which easily suffers from the vanishinggradient when there is a long gap between relevant informationand the current position to predict LSTM network solves thisproblem well because of its own structure advantage 0e ar-chitecture of LSTM unit is shown in Figure 6

Here c is the cell state for storing and transferring in-formation f is the forget gate which decides what should beabandoned in the cell state i is the input gate which decideswhat should be stored in the cell state g represents a vector ofnew candidate values that should be added to the cell state o isthe output gate which decides what parts of the cell stateshould be output to the next time h is the hidden state ofLSTM x is the input of LSTM and t denotes the current time

In PARN at any time t the input of LSTM consists oftwo parts word embedding ytminus1 at the previous time andimage features Zt at the current time 0e operation pro-cessing of LSTM in PARN is formulated as

f sigmoid Wf htminus1 ytminus1 Zt1113858 1113859 + bf1113872 1113873

i sigmoid Wi htminus1 ytminus1 Zt1113858 1113859 + bi( 1113857

g sigmoid Wg htminus1 ytminus1 Zt1113858 1113859 + bg1113872 1113873

o sigmoid Wo htminus1 ytminus1 Zt1113858 1113859 + bo( 1113857

ct f⊙ ctminus1 + i⊙g

ht o⊙ tanh ct( 1113857

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(18)


where W and b represent the weights and biases of eachgate respectively Sigmoid function outputs a numberbetween 0 and 1 to decide the quantity scale of values thatgo through these gates Tanh function is the activationfunction of input 0is design makes LSTM give differentweights for information at different times so that it canchoose what part of information to remember or forget in along-term sequence

0e time step of LSTM in PARN is set as the number ofattributes of each pedestrian imagem 0e output hidden stateht is calculated by a fully connected layer and softmax functionat each time step to predict pedestrian attribute words

It is worth noting that at each time step the input wordembedding of LSTM is the one that LSTM learned at theprevious time step which takes advantage of the LSTM whensolving the long-term sequence problem Because of the

LSTMt = 1

LSTMt = 2

LSTMt = 3

LSTMt = m

Attention Attention Attention Attention

MiF-CNN

X

y1h1

y2h2

y3h3

ymndash1hmndash1

u1 u2 u3 um

y0h0

Figure 4 Architecture of PARN

Query Rank list1 2 3 4 5 6 7 8 9 10

A

B

C

Figure 3 Identification results of MiF-CNN (green bounding box represents positive samples in gallery)


correlation among attributes of pedestrian like the pedestrianowning attribute ldquomalerdquo generally own attribute ldquoshort hairrdquoand attribute ldquopantsrdquo LSTM can utilize the previous attributeresult to predict the current attribute more accurately

44 Network Optimization 0e loss function of PARN in-cludes two parts one is the cross-entropy between pre-diction and ground truth of network which is expressed as

Lsa 1113944m

i1uprime log

eθTaiu

1113936nw

j1eθT

aju⎛⎝ ⎞⎠ (19)

where u is the prediction value of the network uprime is theground truth of pedestrian labels m is the time step ofLSTM and nw is the total number of attribute words in eachdataset As shown in equation (19) the loss of a single imagein single epoch is the cumulated loss value afterm iterations0e other one is the loss function of attention mechanism asdescribed in [40]

Lα 12n

1113944

n

i11minus 1113944

m

i1αj

i⎛⎝ ⎞⎠

2

(20)

where αji represents the attention weights of the image

feature xi and n is the number of parts in the image feature0e object function of PARN to optimize is

J minus1

M1113944

M

i1L

isa + λ middot L

iα1113872 1113873 (21)

whereM is the batch size and λ is the rate of Lα in the objectfunction

5 Attribute-Aided Reranking Algorithm

Given a probe image q0 and gallery image setG g1 g2 gN1113864 1113865 including N pedestrian images theinitial similarity distance between q0 and gi is computedwith the Euclidean distance as

d q0 gi( 1113857

1113944

N

i1xq0minusxgi

1113872 11138732

11139741113972

(22)

where xq0and xgi

represent the features of q0 and gi re-spectively that extracted by MiF-CNN 0e initial ranklist R0 g

0prime1 g

0prime2 g

0primeN is obtained by sorting similarity

distance d(q0 gi) in ascending order whered(q0 g

0primei )ltd(q0 g

0primei+1) If the top-1 gallery g

0prime1 is the positive

sample it means rank 1 is correct If the positive sample isnot the top-1 gallery but the top-5 gallery it means rank 5 iscorrect

0e attribute-aided reranking algorithm reranks therank list R0 according to the attribute feature similaritybetween q0 and gi so that more positive gallery getshigher ranking in R0 which could improve the perfor-mance of person re-id To be specific when rank 5 iscorrect but rank 1 is not the proposed algorithm dis-tributes the initial feature score sf for top-5 gallery g

0prime1 simg

0prime5

according to their ranking in R0 the higher the rankingthe higher the sf 0en the proposed algorithm dis-tributes attribute score sa for g

0prime1simg

0prime5 according to the

number of their attributes that are the same as q0 themore same attributes the higher sa 0e total score ofeach gallery is s sf(1minus c) + sac where c is the scoreweight 0e reranking rank list Rlowast0 is obtained byreranking g

0prime1simg

0prime5 in descending order of their total score

s For M query images the whole processing of theattribute-aided reranking algorithm is shown asAlgorithm 1

6 Experiments

0is paper evaluates the attribute recognition accuracy ofPARN on person re-id public datasets first and then com-pares our results with other person attribute recognitionmethods 0en this paper evaluates the performance ofMiF-CNN on three people re-id datasets and the availabilityof the proposed attribute-aided reranking algorithm forimproving the accuracy of person re-id 0e analysis ofexperiment results is also given after comparing our resultswith the state-of-the-art person re-id methods

61 Datasets and Evaluation Protocol 0is paper evaluatesthe proposed methods on three challenging person re-id

α3α2α1

Z

Xx1 x2 x3 xn

Softmax Softmax Softmax Softmax

tanh tanh tanh tanh

Watt Watt Watt Watt

htndash1 htndash1 htndash1 htndash1

v1 v2 v3 vn

αn

Figure 5 Schematic diagram of attention mechanism

Wf Wi Wg Wo

f i g o

htndash1 ht

tanh

tanh

ctndash1 ct

xt

σ σ σ

Figure 6 Architecture of the LSTM unit


public datasets including VIPeR [28] CUHK01 [29] andMarket1501 [30]

VIPeR Dataset It contains 1264 images of 632 per-sons Because of the lack of image samples low-resolution a great change in pose view and illu-mination it is one of the hardest difficult person re-iddatasets In experiment the VIPeR dataset is ran-domly split into two parts images of 316 persons fortraining and the images of remaining 316 persons fortestingCUHK01 Dataset It consists of 3884 images of 971persons which are captured by two cameras withdifferent views 485 pedestrians are randomlychosen for training and other 486 pedestrians fortestingMarket1501 Dataset It is one of the largest person re-id datasets that include 32268 images of 1501 personsEach person has a number of images that are capturedby six cameras with different views 0e dataset isdivided into two parts 12936 images of 751 personsfor training and 19732 images of 750 persons fortestingEvaluation Protocol For the single-shot datasets VI-PeR and CUHK01 cumulative match characteristic(CMC) is used to record the ranks of correct identify[31] For the multishot dataset Market1501 apartfrom CMC mean Average Precision (mAP) is uti-lized to evaluate the performance of the proposedmethods

62 Experimental Results on Attribute Recognition 0e at-tributes in PARN refer to pedestrian identity-level attri-butes For the VIPeR dataset attributes include ldquogenderrdquoldquolength of hairrdquo ldquolower clothingrdquo ldquoupper clothingrdquo ldquoback-pack or notrdquo and ldquocarrying anything or notrdquo For theCUHK01 dataset attributes include ldquogenderrdquo ldquolength ofhairrdquo ldquobackpack or notrdquo ldquohandbag or notrdquo ldquocolor of upperrdquoand ldquocolor of lowerrdquo For the Market1501 dataset attributescontain ldquogenderrdquo ldquolength of hairrdquo ldquowearing hat or notrdquoldquolower clothingrdquo ldquobackpack or notrdquo ldquohandbag or notrdquoldquolength of sleeverdquo and ldquolength of lower clothingrdquo Eachperson in each dataset owns an attribute word list as shownin Figure 7

0is paper evaluates the attribute recognition accuracy ofPARN on the Market1501 dataset and compares our resultswith outstanding person attribute recognition method APR[22] 0e experiment results are reported in Table 1 whereldquoLslvrdquo represents the ldquolength of sleeverdquo and ldquoLlowrdquo repre-sents the ldquolength of lower clothingrdquo

It can be observed from Table 1 that PARN obtainssuperior recognition accuracy of 8 attributes especiallyldquolength of hairrdquo ldquohandbag or notrdquo and mean accuracy arehigher than APR whereas accuracy of other attributes iscloser with APR Comparison with APR demonstrates theoutstanding attribute recognition performance of PARNwhich plays an important role in improving the accuracy ofperson re-id

63 Experimental Results on Person re-id 0is paper eval-uates the performance of the proposed methods on threedatasets 0e experimental results of the proposed methodsand other state-of-the-art methods are presented inTable 2

As shown in Table 2 the proposed MiF-CNN modelobtains a great performance among state-of-the-artmethods and on the contrary comparison between MiFand MiF + PARN demonstrates that the proposed attribute-aided reranking algorithm is helpful to increase the personre-id accuracy 0e improvement effect is especially obviouson VIPeR and CUHK01 datasets where the identificationaccuracy of rank 1 rank 5 and rank 10 improves by 602633 and 222 and 494 494 and 226 respectively0e proposed MiF-CNN model with the attribute-aidedreranking algorithm gets the best rank 5 accuracy on theVIPeR dataset the best rank 1 and rank 5 accuracy on theCUHK01 dataset and the best mAP on the Market1501dataset among various methods

64 Analysis of Experimental Results Because of the smallquantity of training samples in VIPeR and CUHK01 data-sets a deep CNN model is hard to be trained and easilysuffers from overfitting To handle this the FT-JSTL+DGDmethod based on deep CNN learned deep features frommultiple domains jointly by merging all the datasets togetherand fine-tuned the pretrained model on VIPeR andCUHK01 separately 0e structure and hyperparameters ofthe deep CNN model in the M3TCP method need to beadjusted manually for adapting the scale of different datasetsso as to overcome the training problem of the deep neuralnetwork

In contrast the proposed MiF-CNN has an immobilestructure and can be trained on smaller datasets directlywithout any fine tuning In spite of the deep structure themodel can converge quickly and well MiF-CNNwithout theattribute-aided reranking algorithm outperforms FT-JSTL +DGD andM3TCP by 027 and 1317 respectivelyat rank 1 accuracy on the CUHK01 dataset It also out-performs FT-JSTL+DGD by 222 at rank 1 accuracy onthe VIPeR dataset which indicates that the proposed MiF-CNN model has an excellent ability of reducing overfittingand generalization and an outstanding performance inperson re-id

Figure 8 demonstrates the rank list of MiF andMiF + PARN It can be seen that the MiF-CNN with theattribute-aided reranking method enables the lower rankedpositive sample in rank list of MiF-CNN obtain a higherranking so as to improve the accuracy of person re-idwhich verifies the effectiveness of the attribute-aidedreranking algorithm for improving performance of per-son re-id model

7 Conclusion

0is paper studies and discusses the training problem ofdeep neural network in person re-id task and the using ofpedestrian attributes for further improving the accuracy of


person re-id 0e proposed MiF-CNN model realizes thereuse of feature maps and gradient information whichenhances the feature mobility of network and improves theefficiency of gradient propagation 0e designed person at-tribute recognition network uses an attention mechanism tomeasure the correlation between input feature maps and thehidden state of LSTM at previous time for reducing the di-mension of features It also employs LSTM to decode imagefeatures into pedestrian attributes On the basis of the at-tribute recognition results the attribute-aided reranking

algorithm is presented which rematches attribute featuresamong samples to aid more positive samples rank higher inrank list so as to improve the identify accuracy further

0e experimental results on three public person re-iddatasets indicate the outstanding performance of MiF-CNNmodel in person re-id 0e attribute-aided reranking algo-rithm makes a major contribution to improve the accuracyof person re-id In the future more improvement and op-timization will be done on pedestrian attribute recognitionnetwork Moreover the caption of person images or videos

Input Initial rank list R0Output 0e reranking rank list Rlowast

(1) for i 1 2 M do(2) if rank 1 correct(3) Rlowast R0(4) end if(5) elif rank 5 correct(6) Distributing initial feature score sf for top-5 gallery g

iprime1simg

iprime5 according to their ranking in R0

(7) Distributing attribute score sa for giprime1simg

iprime5 according to the number of their attributes that are the same as qi

(8) s sf(1minus c) + sac

(9) Reranking giprime1simg

iprime5 in descending order of their total score s and obtain Rlowast

(10) end if(11) elif rank 10 correct(12) Changing g

iprime1simg

iprime5 to g

iprime1simg

iprime10 repeat 6ndash10

(13) end if(14) else(15) Changing g

iprime1simg

iprime5 to g

iprime1simg

iprimeM repeat 6ndash10

(16) end if(17) end for

ALGORITHM 1 0e attribute-aided reranking algorithm

Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)

Figure 7 Attribute label of the pedestrian in datasets

Table 1 Recognition accuracy of different attributes in PARN and APR ()

Method Gender Hair Clothes Hat Backpack Handbag Lslv Llow MeanAPR 8578 8346 9136 8821 8632 7601 9412 9264 8723PARN 8493 7910 8935 9667 8386 8518 9284 8845 8755


can be obtained by the LSTM model with natural languageprocess ideas which make person re-id methods moresignificant

Data Availability

0e (attributes recognition accuracy on the Market1501dataset and person re-id accuracy on VIPeR CUHK01 andMarket1501 datasets) data used to support the findings ofthis study are included within the article

Conflicts of Interest

0e authors declare that they have no conflicts of interest

Acknowledgments

0is work was supported by the National Natural ScienceFoundation of China (no 61773105) the Natural ScienceFoundation of Liaoning Province (no 20170540675) andthe Scientific Research Project of Liaoning EducationalDepartment (no LQGD2017023)

Table 2 Comparison of various methods with the proposed methods on three datasets ()

MethodsVIPeR CUHK01 Market1501

Rank 1 Rank 5 Rank 10 Rank 1 Rank 5 Rank 10 Rank 1 Rank 5 Rank 10 mAPDNS [32] 4228 7146 8294 6498 8496 8992 6796 mdash mdash 4189DLDA [33] 4411 7259 8166 6712 8945 9168 4815 mdash mdash 2994FT-JSTL +DGD [11] 386 mdash mdash 6660 mdash mdash mdash mdash mdash mdashPDC [34] 5127 7405 8418 mdash mdash mdash 8414 9273 9492 6341K-means-CNN [35] 4650 6930 8070 5350 8250 9120 mdash mdash mdash mdashSpindle [9] 5380 7410 8320 mdash mdash mdash 7690 9150 9460 mdashPersonNet [36] mdash mdash mdash 7114 9007 9500 3721 mdash mdash 1857CSBT [37] 3660 6620 8830 5120 7630 9180 4290 mdash mdash 2030SDH-CNN [38] mdash mdash mdash mdash mdash mdash 5812 6850 8082 4820M3TCP [21] mdash mdash mdash 5370 8430 9100 mdash mdash mdash mdashDM3 [39] 4270 7430 8510 4970 7730 8610 7580 8910 9240 5320MiF 4082 6835 8101 6687 8539 8951 7892 8646 8928 6600MiF + PARN 4684 7468 8323 7181 9033 9177 7939 8895 9136 6646ldquoMiFrdquo represents the MiF-CNN method and ldquoMiF + PARNrdquo represents the MiF-CNN with the attribute-aided reranking method

Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list

Figure 8 Rank list comparison between MiF and MiF + PARN


References

[1] L Li and J Guo ldquoPedestrian re-identification based onreinforced deep feature fusionrdquo Information Technologyvol 42 no 7 pp 15ndash19 2018 in Chinese

[2] J Dai Y Zhang H Lu and H Wang ldquoCross-view semanticprojection learning for person re-identificationrdquo PatternRecognition vol 75 pp 63ndash76 2018

[3] T T T Pham T-L Le H Vu et al ldquoFully-automated personre-identification in multi-camera surveillance system with arobust kernel descriptor and effective shadow removalmethodrdquo Image and Vision Computing vol 59 pp 44ndash622017

[4] P DaoDao and H J Lee ldquoDeep multi-task network forlearning person identity and attributesrdquo IEEE Access vol 6pp 60801ndash60811 2018

[5] J Li L Zhuo J Zhang F Li and H Zhang ldquoA survey ofperson re-identificationrdquo Acta Automatica Sinica vol 44no 9 pp 1554ndash1568 2018 in Chinese

[6] L Wu Y Wang J Gao et al ldquoDeep adaptive feature em-bedding with local sample distributions for person re-iden-tificationrdquo Pattern Recognition vol 73 pp 275ndash288 2018

[7] L Li Y Yang and A G Hauptmann ldquoPerson re-identification past present and futurerdquo 2016 httparxivorgabs161002984

[8] D Wu S J Zheng C A Yuan and D S Huang ldquoA deepmodel with combined losses for person re-identificationrdquoCognitive Systems Research vol 54 pp 74ndash82 2018

[9] H Zhao M Tian S Sun et al ldquoSpindle net person re-identification with human body region guided feature de-composition and fusionrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1077ndash1085 Honolulu HI USA 2017

[10] W Chen X Chen J Zhang and K Huang ldquoBeyond tripletloss a deep quadruplet network for person re-identificationrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) vol 28 Honolulu HIUSA July 2017

[11] T Xiao H Li W Ouyang et al ldquoLearning deep featurerepresentations with domain guided dropout for person re-identificationrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR)pp 1249ndash1258 Las Vegas NV USA June 2016

[12] E Ahmed M Jones and T K Marks ldquoAn improved deeplearning architecture for person re-identificationrdquo in Pro-ceedings of the 2015 IEEE Conference on Computer Vision andPattern Recognition (CVPR) pp 3908ndash3916 Boston MAUSA June 2015

[13] M Farenzena L Bazzani A Perina et al ldquoPerson re-identification by symmetry-driven accumulation of localfeaturesrdquo in Proceedings of the 2010 IEEE Conference onComputer Vision and Pattern Recognition (CVPR)pp 2360ndash2367 San Francisco CA USA June 2010

[14] Y Yang J Yang J Yan et al ldquoSalient color names for personre-identificationrdquo in Proceedings of Computer Vision-ECCV2014 pp 536ndash551 Zurich Switzerland September 2014

[15] L Liao M Cristani A Perina et al ldquoMultiple-shot person re-identification by chromatic and epitomic analysesrdquo PatternRecognition Letters vol 33 no 7 pp 898ndash903 2012

[16] S Murino R Laganiere and P Payeur ldquoImproving pedes-trian detection with selective gradient self-similarity featurerdquoPattern Recognition vol 48 no 8 pp 2364ndash2376 2015

[17] R Layne T M Hospedales S Gong et al ldquoPerson re-identification by attributesrdquo in Proceedings of the 23th

BritishMachine Vision Conference (BMVC) vol 2 no 3 p 83Surrey UK September 2012

[18] Z Wang R Hu Y Yu C Liang and W Huang ldquoMulti-levelfusion for person Re-identification with incomplete marksrdquoin Proceedings of the 23rd ACM international conference onMultimedia pp 1267ndash1270 Brisbane Australia March 2018

[19] Y Chen S Duffner A Stoian et al ldquoDeep and low-levelfeature based attribute learning for person re-identificationrdquoImage and Vision Computing vol 79 pp 25ndash34 2018

[20] Z Dufour X Bai M Ye and S Satoh ldquoIncremental deephidden attribute learningrdquo in Proceedings of the 26th ACMinternational conference on Multimedia pp 72ndash80 SeoulSouth Korea October 2018

[21] D Cheng Y Gong S Zhou et al ldquoPerson re-identification bymulti-channel parts-based cnn with improved triplet lossfunctionrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR)pp 1335ndash1344 Las Vegas NV USA June 2016

[22] K He X Zhang S Ren et al ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the 2016 IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR)pp 770ndash778 Las Vegas NV USA June 2016

[23] Y Lin L Zheng Z Zheng et al ldquoImproving person re-identification by attribute and identity learningrdquo 2017 httparxivorgabs170307220

[24] Y Yan B Ni J Liu and X Yang ldquoMulti-level attentionmodelfor person re-identificationrdquo Pattern Recognition Letters2018 In press

[25] Y Wen K Zhang Z Li et al ldquoA discriminative featurelearning approach for deep face recognitionrdquo in Proceedingsof 14th European Conference pp 499ndash515 AmsterdamNetherlands October 2016

[26] K Xu J Ba R Kiros et al ldquoShow attend and tell neuralimage caption generation with visual attentionrdquo in Pro-ceedings of the 32nd International Conference on MachineLearning (ICML) pp 2048ndash2057 Lille France July 2015

[27] H Choi K Cho and Y Bengio ldquoFine-grained attentionmechanism for neural machine translationrdquoNeurocomputingvol 284 pp 171ndash176 2018

[28] D Gray and H Tao ldquoViewpoint invariant pedestrian rec-ognition with an ensemble of localized featuresrdquo in Pro-ceedings of European Conference on Computer Vision LectureNotes in Computer Science pp 262ndash275 Marseille FranceOctober 2008

[29] W Li R Zhao and X Wang ldquoHuman reidentification withtransferred metric learningrdquo in Proceedings of the 2008 AsianConference on Computer Vision (ACCV) pp 31ndash44 DaejeonKorea November 2012

[30] L Zheng L Shen L Tian et al ldquoScalable person re-identification a benchmarkrdquo in Proceedings of the 2015IEEE Conference on Computer Vision and Pattern Recognition(CVPR) pp 1116ndash1124 Boston MA USA December 2015

[31] W Li R Zhao T Xiao et al ldquoDeepReID deep filter pairingneural network for person re-identificationrdquo in Proceedings ofthe 2014 Computer Vision and Pattern Recognition (CVPR)pp 152ndash159 Columbus OH USA June 2014

[32] L Zhang T Xiang and S Gong ldquoLearning a discriminativenull space for person re-identificationrdquo in Proceedings of theIEEE Conference on Computer Cision and Pattern Recognition(CVPR) pp 1239ndash1248 Las Vegas NV USA June 2016

[33] L Wu C Shen and A van den Hengel ldquoDeep linear dis-criminant analysis on Fisher networks a hybrid architecturefor person re-identificationrdquo Pattern Recognition vol 65pp 238ndash250 2017


[34] C Su J Li S Zhang et al ldquoPose-driven deep convolutionalmodel for person re-identificationrdquo in Proceedings of the 2016International Conference on Computer Vision (ICCV)pp 3980ndash3989 Venice Italy October 2017

[35] Z Zhao B Zhao and F Su ldquoPerson re-identification viaintegrating patch-based metric learning and local saliencelearningrdquo Pattern Recognition vol 75 pp 90ndash98 2018

[36] L Wu C Shen and A Hengel ldquoPersonnet person re-identification with deep convolutional neural networksrdquo2016 httparxivorgabs160107255

[37] J Chen Y Wang J Qin et al ldquoFast person re-identificationvia cross-camera semantic binary transformationrdquo in Pro-ceedings of the 2017 IEEE Conference on Computer Vision andPattern Recognition (CVPR) pp 5330ndash5339 Honolulu HIUSA July 2017

[38] L Wu Y Wang Z Ge et al ldquoStructured deep hashing withconvolutional neural networks for fast person re-identifica-tionrdquo Computer Vision and Image Understanding vol 167no 10 pp 63ndash73 2018

[39] Z Hu R Hu C Chen Y Yu et al ldquoPerson reidentification viadiscrepancy matrix and matrix metricrdquo IEEE Transactions onCybernetics vol 48 no 10 pp 3006ndash3020 2018

[40] D Jiang K Cho and Y Bengio ldquoNeural machine translationby jointly learning to align and translaterdquo 2014 httparxivorgabs14090473


Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems


Volume 2018


ReconfigurableComputing



Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018


Civil EngineeringAdvances in


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia


Biomedical Imaging



Engineering Mathematics


RoboticsJournal of



Computational Intelligence and Neuroscience


Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018


Human-ComputerInteraction

Advances in


Scientic Programming

Submit your manuscripts atwwwhindawicom

intrinsically representative to the original data so it hasbetter performance in extracting image features Two typesof CNNmodels are commonly employed in the community0e first type is the classification model as used in imageclassification and the second is the Siamese model usingimage pairs or triplets as input [7] Most of the existingpublic datasets of person re-id only contain thousands ofpedestrian image samples a small number of trainingsamples can easily lead to overfitting which limits theperformance of person re-id model In addition the deepneural networks for person re-id are similar in structure thatis the feature maps extracted by convolutional layer aredirectly fed into the next convolutional layer [8ndash12] Suchstructure usually ignores the correlation among features ofeach layer thus reducing the mobility of feature informationto some extent In the process of back propagation as thenumber of layers in neural network deepens the gradientupdate information may attenuate in exponential form andcause vanishing gradient problem

0is work proposes to develop a modified deep neuralnetwork model for person re-id that could reduce overfittingcaused by the lack of training samples Moreover this workaims to improve the identify accuracy of person re-id net-work with assistance of pedestrian attribute recognition Tothis end contribution of this paper is three-fold first thispaper designs a multi-information flow convolutional neuralnetwork (MiF-CNN) to solve the person re-id problem 0enetwork contains a series of multi-information flow con-volution structures which connect the input and output ofeach convolutional layer together realizes the reuse offeatures and enhances the feature information flow andgradient back propagation of the entire network Secondthis paper designs a person attribute recognition network(PARN) based on long-short-term memory (LSTM) net-work and attention mechanism 0e PARN decodes pe-destrian visual features extracted by MiF-CNN into attributefeatures and outputs the attribute words of each person0ird this paper presents an attribute-aided reranking al-gorithm which rematches attribute features among samplesto aid more positive samples rank higher in rank list so as toimprove the identify accuracy further

0e rest of this paper is organized as follows Section 2reviews the state of the art for person re-id Section 3 in-troduces the details of MiF-CNN Section 4 shows theprinciple of the PARN 0e proposed attribute-aidedreranking algorithm is detailed in Section 5 0e experi-mental results and analysis are given in Section 6 Finallyconclusion and future works are discussed in Section 7

2 Related Works

0e early person re-id methods extract the manuallydesigned features to represent pedestrians Farenzenaet al divided pedestrian images into multiple areas andextracted three complementary kinds of featuresweighted color histograms maximally stable color re-gions and recurrent high-structured patches 0enmatch these features and measure the similarity betweenpedestrian pair [13] Yang et al proposed a novel salient

color names based color descriptor (SCNCD) which wasutilized to guarantee that a higher probability will beassigned to the color name near to the color Based onSCNCD color distributions in different color spaceswere fused into feature representation for person [14]Bazzani et al proposed asymmetry-based HPE de-scriptor which accumulated HSV histogram of multiplepedestrian images as a global appearance feature anddetected patches portraying highly informative recurrentingredient in local regions as local feature [15] Wu et aldesigned a novel gradient self-similarity (GSS) featurebased on HOG to capture the patterns of pairwisesimilarities of local gradient patches 0e combination ofHOG and GSS achieved improvement in person re-idaccuracy [16]

Apart from manually designed low-level featuresattribute features that represent mid-level semantic in-formation apply to person re-id as well Compared withlow-level descriptors attributes are more robust to imagetranslations [7] Layne et al labeled 15 binary attributesfor the VIPeR dataset and trained SVM to detect attri-butes 0ey also learned a weighted L2-norm distancemetric to fix each attribute and fused them with low-levelvisual features [17] Wang et al predicted complete at-tribute vector by exploiting both visual feature andmarked attributes and obtained the overall ranking list byfusing the rank result from visual features and attributevectors separately [18] Chen et al learned attribute ofperson by part-specific CNN and merged them withanother identification CNN embedding in a tripletstructure for person re-id task [19] Wang et al proposed adeep neural network that contains an auto-encoder modelto learn hidden attributes of person from visual feature inan unsupervised manner which alleviated the re-quirement of massive annotation [20]

Deep learning has become popular for solving person re-id problems in recent years Ahmed et al present a deepCNN architecture for person re-id 0e architecture com-puted differences in feature values across the two viewsaround a neighborhood of each feature location to addrobustness to positional differences in corresponding fea-tures of the two input images [12] Cheng et al proposed anovel multichannel CNN After the first layer of CNNfeatures were divided into four equal parts that aimed tolearn features for the respective body part 0e proposedCNN was trained with improved triplet loss function [21]Lin et al used ResNet [22] as the base network to learn low-level features and attributes jointly and trained networkwith combining the person re-ID loss and attribute pre-diction loss [23] Yan et al proposed an attention blockwhich learned par-level attention on different local regionsand integrated the proposed block into existing CNNstructures for training with the identify loss [24] Inspired byabove works this paper proposes a multi-information flowconvolutional neural network to extract discriminative pe-destrian features In addition this paper designs a personattribute recognition network based on LSTM and attentionmechanism for improving person re-id results with assis-tance of attributes recognition





z(l)j 1113944

i

x(lminus1)i otimesw

(l)j

x(l)j σ zl

j1113872 1113873

⎧⎪⎪⎨

⎪⎪⎩lgt 1 (1)



(lminus1)i









(0)i


z(1)j 1113944

i

x(0)i otimesw

(1)j (2)



a(1)j σ z

(1)j1113872 1113873 (3)


z(l)q 1113944

p


q

a(l)q σ z(l)

q1113872 1113873

⎧⎪⎪⎨

⎪⎪⎩lgt 0 (4)


x(l)r x

(lminus1)p a

(l)q1113960 1113961 lgt 0 (5)




r



Δa(l)q Δx(l)

r middotzx(l)

r

za(l)q


r middotzx(l)

r

zx(lminus1)p

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(6)




Δz(l)q Δa(l)

q middotza(l)

q

zz(l)q

Δx(l)r middot

zx(l)r

za(l)q


Δw(l)q Δz(l)

q middotzz(l)

q

zw(l)q

Δz(l)q middot x

(lminus1)p

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)






w(l)q1113872 1113873new w


q (8)



p is


q middotzz(l)

q

zx(lminus1)p

Δz(l)q middot w

(l)q (9)




LS minus1

M1113944

M

i1y

(i) logeθ

Ty(i)x

(i)

1113936kj1e

θTj x(i)

(10)



LC 12M

1113944

M

i1xi minus cj

2

2 (11)


Δctj


j minus xi1113872 1113873

1 + 1113936Mi1δ yi j( 1113857

ct+1j ct


⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

(12)



Input Conv3 times 3

Conv3 times 3




MiF P MiF P MiF

Input image

Pooling2 times 2

FC1024

Somaxk

Conv3 times 3

Conv3 times 3







X x1 x2 xn1113864 1113865 xi isin RDX (13)


y y1 y2 ym1113864 1113865 yi isin RDY (14)












Z 1113944n

i1αixi (17)











⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(18)





LSTMt = 1

LSTMt = 2

LSTMt = 3

LSTMt = m


MiF-CNN

X

y1h1

y2h2

y3h3

ymndash1hmndash1

u1 u2 u3 um

y0h0


Query Rank list1 2 3 4 5 6 7 8 9 10

A

B

C





Lsa 1113944m

i1uprime log

eθTaiu

1113936nw

j1eθT

aju⎛⎝ ⎞⎠ (19)


Lα 12n

1113944

n

i11minus 1113944

m

i1αj

i⎛⎝ ⎞⎠

2

(20)



J minus1

M1113944

M

i1L

isa + λ middot L

iα1113872 1113873 (21)




d q0 gi( 1113857

1113944

N

i1xq0minusxgi

1113872 11138732

11139741113972

(22)

where xq0and xgi


0prime1 g

0prime2 g



0primei )ltd(q0 g





0prime1 simg

0prime5


0prime1simg



0prime1simg



6 Experiments



α3α2α1

Z

Xx1 x2 x3 xn


tanh tanh tanh tanh

Watt Watt Watt Watt


v1 v2 v3 vn

αn


Wf Wi Wg Wo

f i g o

htndash1 ht

tanh

tanh

ctndash1 ct

xt

σ σ σ













7 Conclusion








iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in







z(l)j 1113944

i

x(lminus1)i otimesw

(l)j

x(l)j σ zl

j1113872 1113873

⎧⎪⎪⎨

⎪⎪⎩lgt 1 (1)



(lminus1)i









(0)i


z(1)j 1113944

i

x(0)i otimesw

(1)j (2)



a(1)j σ z

(1)j1113872 1113873 (3)


z(l)q 1113944

p


q

a(l)q σ z(l)

q1113872 1113873

⎧⎪⎪⎨

⎪⎪⎩lgt 0 (4)


x(l)r x

(lminus1)p a

(l)q1113960 1113961 lgt 0 (5)




r



Δa(l)q Δx(l)

r middotzx(l)

r

za(l)q


r middotzx(l)

r

zx(lminus1)p

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(6)




Δz(l)q Δa(l)

q middotza(l)

q

zz(l)q

Δx(l)r middot

zx(l)r

za(l)q


Δw(l)q Δz(l)

q middotzz(l)

q

zw(l)q

Δz(l)q middot x

(lminus1)p

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)






w(l)q1113872 1113873new w


q (8)



p is


q middotzz(l)

q

zx(lminus1)p

Δz(l)q middot w

(l)q (9)




LS minus1

M1113944

M

i1y

(i) logeθ

Ty(i)x

(i)

1113936kj1e

θTj x(i)

(10)



LC 12M

1113944

M

i1xi minus cj

2

2 (11)


Δctj


j minus xi1113872 1113873

1 + 1113936Mi1δ yi j( 1113857

ct+1j ct


⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

(12)



Input Conv3 times 3

Conv3 times 3




MiF P MiF P MiF

Input image

Pooling2 times 2

FC1024

Somaxk

Conv3 times 3

Conv3 times 3







X x1 x2 xn1113864 1113865 xi isin RDX (13)


y y1 y2 ym1113864 1113865 yi isin RDY (14)












Z 1113944n

i1αixi (17)











⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(18)





LSTMt = 1

LSTMt = 2

LSTMt = 3

LSTMt = m


MiF-CNN

X

y1h1

y2h2

y3h3

ymndash1hmndash1

u1 u2 u3 um

y0h0


Query Rank list1 2 3 4 5 6 7 8 9 10

A

B

C





Lsa 1113944m

i1uprime log

eθTaiu

1113936nw

j1eθT

aju⎛⎝ ⎞⎠ (19)


Lα 12n

1113944

n

i11minus 1113944

m

i1αj

i⎛⎝ ⎞⎠

2

(20)



J minus1

M1113944

M

i1L

isa + λ middot L

iα1113872 1113873 (21)




d q0 gi( 1113857

1113944

N

i1xq0minusxgi

1113872 11138732

11139741113972

(22)

where xq0and xgi


0prime1 g

0prime2 g



0primei )ltd(q0 g





0prime1 simg

0prime5


0prime1simg



0prime1simg



6 Experiments



α3α2α1

Z

Xx1 x2 x3 xn


tanh tanh tanh tanh

Watt Watt Watt Watt


v1 v2 v3 vn

αn


Wf Wi Wg Wo

f i g o

htndash1 ht

tanh

tanh

ctndash1 ct

xt

σ σ σ













7 Conclusion








iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in





p is


q middotzz(l)

q

zx(lminus1)p

Δz(l)q middot w

(l)q (9)




LS minus1

M1113944

M

i1y

(i) logeθ

Ty(i)x

(i)

1113936kj1e

θTj x(i)

(10)



LC 12M

1113944

M

i1xi minus cj

2

2 (11)


Δctj


j minus xi1113872 1113873

1 + 1113936Mi1δ yi j( 1113857

ct+1j ct


⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

(12)



Input Conv3 times 3

Conv3 times 3




MiF P MiF P MiF

Input image

Pooling2 times 2

FC1024

Somaxk

Conv3 times 3

Conv3 times 3







X x1 x2 xn1113864 1113865 xi isin RDX (13)


y y1 y2 ym1113864 1113865 yi isin RDY (14)












Z 1113944n

i1αixi (17)











⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(18)





LSTMt = 1

LSTMt = 2

LSTMt = 3

LSTMt = m


MiF-CNN

X

y1h1

y2h2

y3h3

ymndash1hmndash1

u1 u2 u3 um

y0h0


Query Rank list1 2 3 4 5 6 7 8 9 10

A

B

C





Lsa 1113944m

i1uprime log

eθTaiu

1113936nw

j1eθT

aju⎛⎝ ⎞⎠ (19)


Lα 12n

1113944

n

i11minus 1113944

m

i1αj

i⎛⎝ ⎞⎠

2

(20)



J minus1

M1113944

M

i1L

isa + λ middot L

iα1113872 1113873 (21)




d q0 gi( 1113857

1113944

N

i1xq0minusxgi

1113872 11138732

11139741113972

(22)

where xq0and xgi


0prime1 g

0prime2 g



0primei )ltd(q0 g





0prime1 simg

0prime5


0prime1simg



0prime1simg



6 Experiments



α3α2α1

Z

Xx1 x2 x3 xn


tanh tanh tanh tanh

Watt Watt Watt Watt


v1 v2 v3 vn

αn


Wf Wi Wg Wo

f i g o

htndash1 ht

tanh

tanh

ctndash1 ct

xt

σ σ σ













7 Conclusion








iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in








X x1 x2 xn1113864 1113865 xi isin RDX (13)


y y1 y2 ym1113864 1113865 yi isin RDY (14)












Z 1113944n

i1αixi (17)











⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(18)





LSTMt = 1

LSTMt = 2

LSTMt = 3

LSTMt = m


MiF-CNN

X

y1h1

y2h2

y3h3

ymndash1hmndash1

u1 u2 u3 um

y0h0


Query Rank list1 2 3 4 5 6 7 8 9 10

A

B

C





Lsa 1113944m

i1uprime log

eθTaiu

1113936nw

j1eθT

aju⎛⎝ ⎞⎠ (19)


Lα 12n

1113944

n

i11minus 1113944

m

i1αj

i⎛⎝ ⎞⎠

2

(20)



J minus1

M1113944

M

i1L

isa + λ middot L

iα1113872 1113873 (21)




d q0 gi( 1113857

1113944

N

i1xq0minusxgi

1113872 11138732

11139741113972

(22)

where xq0and xgi


0prime1 g

0prime2 g



0primei )ltd(q0 g





0prime1 simg

0prime5


0prime1simg



0prime1simg



6 Experiments



α3α2α1

Z

Xx1 x2 x3 xn


tanh tanh tanh tanh

Watt Watt Watt Watt


v1 v2 v3 vn

αn


Wf Wi Wg Wo

f i g o

htndash1 ht

tanh

tanh

ctndash1 ct

xt

σ σ σ













7 Conclusion








iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in







LSTMt = 1

LSTMt = 2

LSTMt = 3

LSTMt = m


MiF-CNN

X

y1h1

y2h2

y3h3

ymndash1hmndash1

u1 u2 u3 um

y0h0


Query Rank list1 2 3 4 5 6 7 8 9 10

A

B

C





Lsa 1113944m

i1uprime log

eθTaiu

1113936nw

j1eθT

aju⎛⎝ ⎞⎠ (19)


Lα 12n

1113944

n

i11minus 1113944

m

i1αj

i⎛⎝ ⎞⎠

2

(20)



J minus1

M1113944

M

i1L

isa + λ middot L

iα1113872 1113873 (21)




d q0 gi( 1113857

1113944

N

i1xq0minusxgi

1113872 11138732

11139741113972

(22)

where xq0and xgi


0prime1 g

0prime2 g



0primei )ltd(q0 g





0prime1 simg

0prime5


0prime1simg



0prime1simg



6 Experiments



α3α2α1

Z

Xx1 x2 x3 xn


tanh tanh tanh tanh

Watt Watt Watt Watt


v1 v2 v3 vn

αn


Wf Wi Wg Wo

f i g o

htndash1 ht

tanh

tanh

ctndash1 ct

xt

σ σ σ













7 Conclusion








iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in






Lsa 1113944m

i1uprime log

eθTaiu

1113936nw

j1eθT

aju⎛⎝ ⎞⎠ (19)


Lα 12n

1113944

n

i11minus 1113944

m

i1αj

i⎛⎝ ⎞⎠

2

(20)



J minus1

M1113944

M

i1L

isa + λ middot L

iα1113872 1113873 (21)




d q0 gi( 1113857

1113944

N

i1xq0minusxgi

1113872 11138732

11139741113972

(22)

where xq0and xgi


0prime1 g

0prime2 g



0primei )ltd(q0 g





0prime1 simg

0prime5


0prime1simg



0prime1simg



6 Experiments



α3α2α1

Z

Xx1 x2 x3 xn


tanh tanh tanh tanh

Watt Watt Watt Watt


v1 v2 v3 vn

αn


Wf Wi Wg Wo

f i g o

htndash1 ht

tanh

tanh

ctndash1 ct

xt

σ σ σ













7 Conclusion








iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in














7 Conclusion








iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in









iprime1simg








iprime1simg

iprime5 to g

iprime1simg



iprime1simg

iprime5 to g

iprime1simg




Male

Hat

Pants

Stripes

Backpack

Carrying nothing

(a)

Male

Short hair

Pants

No hat

No backpack

No handbag

Short sleeve

Short lower

(b)






Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in





Data Availability




Acknowledgments





Query

1 2 3 4 5 6 7 8 9 10

Query

MiF

MiF + PARN

MiF

MiF + PARN

Rank list



References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in




References

















































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in

















Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in




multi-informationflowcnnandattribute …downloads.hindawi.com/journals/cin/2019/7028107.pdfe ﬁrst...

Documents