automatic vehicle classification using center strengthened

Automatic Vehicle Classification using Center

Strengthened Convolutional Neural Network Kuan-Chung Wang, Yoga Dwi Pranata, and Jia-Ching Wang

Department of Computer Science and Information Engineering, National Central University, Taiwan

Abstract—Vehicle classification is one of the major part

for the smart road management system and traffic

management system. The use of appropriate algorithms

has a significant impact in the process of classification. In

this paper, we propose a deep neural network, named

center strengthened convolutional neural network (CS-

CNN), for handling central part image feature

enhancement with non-fixed size input. The main

hallmark of this proposed architecture is center

enhancement that extract additional feature from central

of image by ROI pooling. Another, our CS-CNN, based

on VGG network architecture, joint with ROI pooling

layer to get elaborate feature maps. Our proposed

method will be compared with other typical deep learning

architecture like VGG-s and VGG-Verydeep-16. In the

experiments, we show the outstanding performance

which getting more than 97% accuracy on vehicle

classification with only few training data from Caltech256

datasets.

Keywords- Deep learning, Convolutional Neural

Network, ROI pooling, Vehicle classification

I. INTRODUCTION

Nowadays, motorists rarely pay attention to the traffic

signs that exist. Motorists make shortcuts to get to the

destination quickly. But it can unconsciously cause some

harm to both the driver of the vehicle itself and others.

Whereas every rider of the vehicle already knows the rules

on the road but ignore it. As transportation system has

become increasingly intelligent with the rapid increase of

traffic demand in these years, applying Intelligent

Transportation System (ITS) technology becomes one of the

fundamental measures to make use of the existing

transportation infrastructures reasonably and scientifically.

Meanwhile, vehicle detection and classification technology is

an important component of Intelligent Transportation System,

which provides initial and necessary information of the traffic

for Intelligent Transportation System.

Up to date, there have been numbers of the proposed

approaches to discuss the problem of vehicle classification

[1]–[5]. Most of the proposed approaches can be seen as

sensor-based and visual-based approach. The sensor-based

approach needs some particular sensor installation in the road

networks. This method seems easy to implement but we need

consider some factor likes high cost, less flexibility in the

system, and the weather forecast. Generally, by using some

sensors installed on the road networks (e.g. magnetic sensor

[1], piezoelectric sensor [2], Traffic Inductive sensors[3],

infrared transceivers, or other sensor devices), these methods

obtain relevant physical parameters of the vehicles such as

the width, height, and the number of tiers, and then use that

information to directly classify the vehicles type.

In the visual-based approach, it needs some visual

appearance of the vehicle to the system to classify the vehicle.

The advantages of the visual-based are low-cost and have a

high accuracy to classify the vehicle. Visual information

about the vehicle can be represented that computer can

identify the image, then the type of vehicles can be obtained

by running a particular classification algorithm. Surendra et

al[4] proposed a vision-based vehicle classification using

segmentation and blob-tracking. Andrew et al.[5] classify the

vehicle using a rectangle in the images and estimate the

dimension of the vehicle. Jun and Yong[6] proposed two

steps of vehicle classification that is inter-class vehicle

classification and intra-class vehicle classification.

For some algorithm, there are some limitations for the

visual-based approach to classifying the vehicle. Canny

algorithm there are limitations that cannot recognize the

vehicle when night comes or in the dark with a long response

time, while for the algorithm Robert and Prewitt very bad in

recognizing the moving object.

In recent years, deep learning method have many

successes in the areas of classification such as speech and

image. Specially, convolutional neural network base method

performs state-of-the-art on many image classification task.

Since from 2012 on ImageNet Large Scale Visual

Recognition Competition (ILSVR), there were many typical

architectures coming out [6]-[8]. In view of data-driven

learning, deep learning also make problem more easy than

designing an algorithm by ourselves. Based on these reasons,

our proposal depends on deep learning method and develop

CS-CNN.

In this paper, we proposed an end-to-end Convolutional

Neural Network to classify the vehicle based on VGG net.

We use VGG network architecture as the pre-trained model.

We combine ROI pooling from SPP [12] network and

develop center strengthened net. Before fully connected layer,

we spread the feature maps into two, the first one is to get the

feature from the images and the second is to get the feature

Proceedings of APSIPA Annual Summit and Conference 2017 12 - 15 December 2017, Malaysia

978-1-5386-1542-3@2017 APSIPA APSIPA ASC 2017

Figure 1. CS-CNN object classification architecture. Our system (1) takes an non-fixed input image, (2) computes VGG feature

representation, (3) compute full and center ROI features (4) classifies using combined two ROI feature

from the center of the images (centroid). Given that resizing

images lead to object deformation, our input is the non-fixed

size two-dimensional RGB images. We choice vehicle

classes from the Caltech-256 datasets as training and testing

data. The result will be compared with the others visual-based

classification method.

The rest of the research is organized as follows. Section

2 provides the methods. Section 3 includes experiment and

discussion. Section 4 will discuss the conclusion to classify

the six vehicle classes from Caltech-256 dataset.

II. METHODS

Recently, deep learning becomes a famous method for

many tasks in image processing. Some papers also have the

great result when they tried to apply deep learning in vehicle

classification [9, 10].

Automatic vehicle classification is important for making

fast and accurate vehicle type in intelligent transportations

system. The purpose of the vehicle classification is to help the

system analyze the type of vehicle from the input images. The

proposed vehicle classification method using Convolutional

Neural Network for classifying the vehicle type. We use the

ROI pooling from the spatial pyramid-pooling network to get

the region of interest from the input images. Each Step in the

entire process is explained in detail in the following

subsections.

A. The Proposed Architecture

Our CS-CNN is illustrated in 0. In this work, we proposed

a robust CNN architecture to get the outstanding classification

result. Generally, deep learning methods from ILSVRC

competition use fixed size as input using crop or resize. We

concern about that would cause bad image deformation, so our

work resizes input images through same

proportional scale. However, extreme size will make network

work bad. In this case, small image would cause feature maps

too coarse for classification and by contrast, big size has

problem of out of memory. So our work resizes short side to

400 and limits another side not bigger than 800. In the light

of VGGNet [8]’s excellent performance on ILSVRC

challenge, We choice it as base model of our architecture.

B. Center Strengthened RoI pooling

The center Strengthened ROI is illustrated in 0. After

computing the feature maps from multiple convolution and

pooling, we use ROI pooling from SPP network [12] to get

fixed size feature maps. In addition to one ROI over entire

feature maps, other stream is ROI that focus on the centroid

region of same features maps. Like the SPPnet in [12], we

follow ROI idea and add it to our architecture. Different from

normal pooling operator, the ROI pooling performs dynamic

max pooling over a × b output bins and get fixed scale output.

In our work, we choice 7x7 as our ROI size. Besides output

fixed map making it easy connect to FC layer, ROI pooling’s

calculation is fast and simple. Because of these characteristics,

making it popular with some difficult tasks like object

bounding box detection. Unlike SPPnet, we only perform one

ROI size of 7x7 scale instead of combining multi small size of

4x4、2x2、1x1 that may losing too much information.

Figure 2. Center Enhanced RoI pooling

In 0, our proposal introduce CE ROI. After last

convolution from VGG-16, we crop central region of feature

maps that we observe that almost object have meaningful

information in central part of image. The crop center



http://www.image-net.org/challenges/LSVRC/

(1)

enhanced width range is from 1

8W to

7

8W and height is from

1

8H to

7

8H, where W and H is input image’s size.

The result of two feature concatenate together then

flatten to one-dimensional vector become the input of FC

layer. Last stage, we use three FC layer and follow softmax

to do multi classes classification (Fire truck, Motorbikes,

School bus, Segway, Bike, and Car).

C. Testing Step

The testing process is the process of using classification

weight and bias of the training process results. There are two

steps in this testing. The first one is testing the result using the

model from the training step. The second one calculates the

accuracy of the classification. This process is not much

different from the training process. The differences are there

is no backpropagation process after feedforward process. So

the result of this process are the accuracy of the classification,

data which failed to be classified, the image number failed to

be classified, and form a network formed from the

feedforward process.

With the weight and bias of the new feedforward process

then generates the output layer. The output layer is fully

connected with the label. Results are fully connected data

obtained which failed and successfully classified.

III. EXPERIMENT AND DISCUSSION

A. Experiment on Caltech-256

Our CNN training procedure follows[8], learning on

ILSVRC-2012 using gradient descent with momentum. Our

experiment parameter setting is momentum 0.9; weight decay

1x10−4; initial learning rate 5x10−4 , which is decreased to

5x10−5 after 20 epoch. Our training batch size is 10 per epoch.

Some modified layers are initialized from a Normal

distribution with a zero mean and standard deviation equal to

1x10−2.

We evaluate CS-CNN on six Caltech-256 datasets and

compare performance with other method [8][13][14]. Our task

focus on six classes that are the bicycle, school bus, car,

motorbike, segway, and the fire truck with total 1,422 images.

Each class the have different number of images. Each image

also have different size. In this Work, we will use three

different number of images for the experiment . And our

testing time is about 13 images per second with one GTX 1080.

Figure 3. Examples of vehicle subset in Caltech-256 dataset.

B. Performance

The proposed architecture network training has 50 epochs

for Caltech-256 datasets in the training step. We use the last

epoch training weights for testing and classification step

because the graph showed a convergent result. Figure 4 and 5

represent the objective and accuracy result of each epoch from

training step on Caltech-256 datasets.

Figure 4. Objective on training data along with 50 epochs.

left: 10 data/per class, mid: 20 data/per class, right: 30

data/per class

Figure 5. Accuracy on training data along with 50 epochs.

left: 10 data/per class, mid: 20 data/per class, right: 30

data/per class

The classification result of each architecture can be

calculated with the following equation.

𝑦 = 𝐸

𝑇 𝑋 100%

where 𝑦 is the accuracy, 𝐸 is the images that failed to be

classified and 𝑇 is the total testing images. The result of

classification of each architecture from Caltech-256 datesets



from the proposed architecture show in the Table I. We have

highest accuracy results for the classification that is 93.9% for

the 10 each class training images, 96.93% for 20 each class

training images and 97.75% for 30 each class training images.

TABLE I. CLASSIFICATION RESULT

10 images 20 images 30 images

VGG-s 88.74% 92.70% 96.38%

VGG-Verydeep-16 91.18% 94.55% 90.18%

CS-CNN 93.75% 96.93% 97.75%

Besides above experiments, we also proved that deep

learning is better feature extraction method by comparing with

non-deep learning method CE-SPM [14] which tested on

same datasets. In figure 6, deep learning methods perform

superior results which get more than 10 percent above the

recognition rate when comparing with CE-SPM.

Figure 6. The testing results of our proposed method with

different training images compared with VGG-s [13], VGG-

Verydeep-16 [8] and CE-SPP [14]

IV. CONCLUSIONS

In this work, we have present CS-CNN, an end-to-end

deep convolutional neural network architecture for joint with

a center strengthened method. Center strengthened enhance

center region feature over feature maps of last convolutional

layer. Another, by joining with ROI pooling, CS-CNN can

receive non-fixed size data as network input that avoid object

deformation in images. In the experiments, our method get

the outstanding result compared with VGG-s 、 VGG-

Verydeep-16 and CE-SPM . We also get over 97% testing

accuracy with only few training data on Caltech-256 dataset.

REFERENCES

[1] Y. He, Y. Du,L. Sun,"Vehicle Classification Method Based onSingle-

Point Magnetic Sensor,"International Conference on Traffic and

Transportation Studies Changsha, (2012)

[2] S. A. Rajab, A. S. Othman, and H. H. Refai,"Novel vehicle and

motorcycle classification using single element piezoelectric sensor," in

Proceedings of IEEE Conference on Intelligent Transportation Systems

(ITSC), pp. 496-501 (2012)

[3] J. J. Lamas-Seco, P. M. Castro, A. Dapena, F. J. Vazquez-

Araujo,"Vehicle Classification Using the Discrete Fourier Transform

with Traffic Inductive Sensors," sensors open access, (2015)

[4] S. Gupte, O. Masoud, N. P. Papanikolopoulos,"Vision-Based Vehicle

Classification," IEEE Intelligent Transportation Systems Conference

Proceedings Dearborn (MI), USA, (2000)

[5] Andrew H. S. Lai, George S. K. Fung and Nelson H. C. Yung,"Vehicle

Type Classification from Visual-Based Dimension Estimation," IEEE

Intelligent Transportation Systems Conference Proceedings - Oakland

(CA), USA, (2001)

[6] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet

classification with deep convolutional neural networks." Advances in

neural information processing systems. 2012.

[7] Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition.

2015.

[8] K. Simonyan and A. Zisserman ,"Very Deep Convolutional Networks

for Large-Scale Image Recognition," in ICLR (2015).

[9] Jun Yee Ng, Yong Haur Tay ,"Image-based Vehicle Classification

System," 11th asia-pacific ITS Forum & Exhibition, (2011)

[10] A. Dehghan, S. Z. Masood, G. Shu, E. G. Ortiz ,"View Independent

Vehicle, Make, Model, and Color Recognition Using Convolutional

Neural Network," ArXiv, (2017)

[11] Y. Gao, H. J. Lee,"Local Tiled Deep Networks for Recognition of

Vehicle Make and Model," Sensors, 16, 226; doi:10.3390/s16020226,

(2016)

[12] K. He, X. Zhang, S. Ren, J. Sun,"Spatial Pyramid Pooling in Deep

Convolutional Networks for Visual Recognition," arXiv:1406.4729v4,

(2015)

[13] Chatfield, Ken, et al. "Return of the devil in the details: Delving deep

into convolutional nets." arXiv preprint arXiv:1405.3531 (2014).

[14] Santoso, Andri, et al. "Kernel Sparse Representation Classifier with

Center Enhanced SPM for Vehicle Classification." Computer Software

and Applications Conference (COMPSAC), 2015 IEEE 39th Annual.

Vol. 2. IEEE, 2015.

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

6 0 1 2 0 1 8 0

REC

OG

NIT

ION

RA

TE(%

)

N TRAINING IMAGES

CE-SPM + K-SRC CE-SPM + Libsvm

VGG-s VGG-Verydeep-16

CS-CNN



automatic vehicle classification using center strengthened

Documents