an intelligent ship image/video detection and classification...

11
Research Article An Intelligent Ship Image/Video Detection and Classification Method with Improved Regressive Deep Convolutional Neural Network Zhijian Huang , 1,2 Bowen Sui , 1 Jiayi Wen, 1 and Guohe Jiang 1 1 Lab of Intelligent Control and Computation, Shanghai Maritime University, Shanghai 201306, China 2 Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA Correspondence should be addressed to Zhijian Huang; [email protected] and Guohe Jiang; [email protected] Received 22 December 2019; Revised 6 March 2020; Accepted 12 March 2020; Published 9 April 2020 Academic Editor: ´ Atila Bueno Copyright©2020ZhijianHuangetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e shipping industry is developing towards intelligence rapidly. An accurate and fast method for ship image/video detection and classification is of great significance for not only the port management, but also the safe driving of Unmanned Surface Vehicle (USV). us, this paper makes a self-built dataset for the ship image/video detection and classification, and its method based on an improved regressive deep convolutional neural network is presented. is method promotes the regressive convolutional neural network from four aspects. First, the feature extraction layer is lightweighted by referring to YOLOv2. Second, a new feature pyramid network layer is designed by improving its structure in YOLOv3. ird, a proper frame and scale suitable for ships are designed with a clustering algorithm to reduced 60% anchors. Last, the activation function is verified and optimized. en, the detecting experiment on 7 types of ships shows that the proposed method has advantage compared with the YOLO series networks and other intelligent methods. is method can solve the problem of low recognition rate and real-time performance for ship image/video detection and classification with a small dataset. On the testing-set, the final mAP is 0.9209, the Recall is 0.9818, the AIOU is 0.7991, and the FPS is 78–80 in video detection. us, this method provides a highly accurate and real-time ship detection method for the intelligent port management and visual processing of the USV. In addition, the proposed regressive deep convolutional network also has a better comprehensive performance than that of YOLOv2/v3. 1. Introduction In the age of artificial intelligence, the shipping industry is developing towards intelligence rapidly. e ship image/ video detection and classification with the help of computer vision have been applied in the port supervision service and Unmanned Surface Vehicle (USV) technology. An accurate and rapid detection method is of great significance to not only the port management, but also the safe operation of the USV. e traditional methods of ship detection and classifi- cation are as the following two: (1) the method based on the structure and shape characteristics of ships. In 2012, Fefi- latyev et al. presented a novel algorithm for the open-sea. e ship detection precision of 88% is achieved on a large dataset collected from a prototype system [1]. In 2013, Chen et al. improved an RCS density-coding method when ac- quiring ship features and completed the ship identification task with a high-resolution Synthetic aperture radar (SAR) dataset [2]. e accuracy of this method reached 91.54%. In 2016, Y¨ uksel et al. extracted ship features from the contour image of a 3D ship model, and extracted ship features from optical image for ship recognition [3]. Also in 2016, Li et al. proposed a novel method for the inshore ship detection via the ship head classification and body boundary determi- nation [4]. In 2017, Zhang et al. developed a new ship target- detection algorithm of visual maritime surveillance. e three main steps, including the horizon detection, Hindawi Complexity Volume 2020, Article ID 1520872, 11 pages https://doi.org/10.1155/2020/1520872

Upload: others

Post on 31-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

Research ArticleAn Intelligent Ship ImageVideo Detection and ClassificationMethod with Improved Regressive Deep ConvolutionalNeural Network

Zhijian Huang 12 Bowen Sui 1 Jiayi Wen1 and Guohe Jiang 1

1Lab of Intelligent Control and Computation Shanghai Maritime University Shanghai 201306 China2Department of Electrical Computer and Biomedical Engineering University of Rhode Island Kingston RI 02881 USA

Correspondence should be addressed to Zhijian Huang zjhuangshmtueducn and Guohe Jiang ghjiangshmtueducn

Received 22 December 2019 Revised 6 March 2020 Accepted 12 March 2020 Published 9 April 2020

Academic Editor Atila Bueno

Copyright copy 2020 ZhijianHuang et alis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

e shipping industry is developing towards intelligence rapidly An accurate and fast method for ship imagevideo detection andclassification is of great significance for not only the port management but also the safe driving of Unmanned Surface Vehicle(USV)us this paper makes a self-built dataset for the ship imagevideo detection and classification and its method based on animproved regressive deep convolutional neural network is presented is method promotes the regressive convolutional neuralnetwork from four aspects First the feature extraction layer is lightweighted by referring to YOLOv2 Second a new featurepyramid network layer is designed by improving its structure in YOLOv3 ird a proper frame and scale suitable for ships aredesigned with a clustering algorithm to reduced 60 anchors Last the activation function is verified and optimized en thedetecting experiment on 7 types of ships shows that the proposedmethod has advantage compared with the YOLO series networksand other intelligent methods is method can solve the problem of low recognition rate and real-time performance for shipimagevideo detection and classification with a small dataset On the testing-set the final mAP is 09209 the Recall is 09818 theAIOU is 07991 and the FPS is 78ndash80 in video detectionus this method provides a highly accurate and real-time ship detectionmethod for the intelligent port management and visual processing of the USV In addition the proposed regressive deepconvolutional network also has a better comprehensive performance than that of YOLOv2v3

1 Introduction

In the age of artificial intelligence the shipping industry isdeveloping towards intelligence rapidly e ship imagevideo detection and classification with the help of computervision have been applied in the port supervision service andUnmanned Surface Vehicle (USV) technology An accurateand rapid detection method is of great significance to notonly the port management but also the safe operation of theUSV

e traditional methods of ship detection and classifi-cation are as the following two (1) the method based on thestructure and shape characteristics of ships In 2012 Fefi-latyev et al presented a novel algorithm for the open-sea

e ship detection precision of 88 is achieved on a largedataset collected from a prototype system [1] In 2013 Chenet al improved an RCS density-coding method when ac-quiring ship features and completed the ship identificationtask with a high-resolution Synthetic aperture radar (SAR)dataset [2] e accuracy of this method reached 9154 In2016 Yuksel et al extracted ship features from the contourimage of a 3D ship model and extracted ship features fromoptical image for ship recognition [3] Also in 2016 Li et alproposed a novel method for the inshore ship detection viathe ship head classification and body boundary determi-nation [4] In 2017 Zhang et al developed a new ship target-detection algorithm of visual maritime surveillance ethree main steps including the horizon detection

HindawiComplexityVolume 2020 Article ID 1520872 11 pageshttpsdoiorg10115520201520872

background modeling and background subtraction are allbased on the discrete cosine transform [5] (2) e methodbased on threshold It is usually very practical to detect shipsdirectly with the threshold method In 1996 Eldhusetproposed a method based on the local threshold which takesthe ship out of the background and uses filtering windowmethod in detection [6] In 1999 Zhou et al designed aglobal threshold algorithm which can complete the adaptivecalculation and ship detection using the statistical charac-teristics of dataset images that is the adaptive thresholdmethod [7] In 2013 Rey used statistical data to solve featurewhen calculating the overall threshold value of ship imageswhich is a method based on the probability density functionto detect ships on water [8] In 2018 Li and Li proposed amethod based on the high and low thresholds to detect shipedge feature and achieved a high accuracy of ship edgedetection [9]

Although the above studies have achieved good resultsthe traditional methods are mostly based on the shipstructure and shape for manual feature design Even if thebest nonlinear classifier is used to classify these manuallydesigned features the accuracy of ship detection cannotmeet the practical needs erefore these methods cannotachieve good results in the case of complex backgroundand small hull differences in a real environment and therecognition rate of multiple-ship classification is also notideal

Fortunately after a development of more than tenyears the target detection based on the deep ConvolutionalNeural Network (CNN) has made a great progress in theapplication of human face pedestrian and other scenese CNN was first proposed by professor LeCun fromToronto university in Canada e depth and width of theCNN have been continuously increased and its accuracyfor image recognition has also been continuously in-creased e commonly used CNN includes the Lenet-5[10] AlexNet [11] VGG [12] GoogLenet [13] ResNet [14]and DenseNet [15] At the same time there are some re-searches in the application of the deep CNN for shiprecognition and detectione deep convolutional networkfor target detection can be divided into two categories (1)the region-based methods such as the RndashCNN [16] Fast-RCNN [17] and Faster-RCNN [18] (2) the regression-based methods such as the SSD [19] YOLO [20] YOLOv2[21] and YOLOv3 [22] e regression-based deep con-volutional network uses the CNN as a regression andreturns the position information of the target in the imagethrough an end-to-end training and gets the final boundingbox and classification results

In 2017 Kang et al presented a contextual region-basedCNN with multilayer fusion for SAR ship detection [23] In2018 Wang et al proposed a ship detection algorithmcombining the CFAR and CNN is algorithm is moreaccurate and faster in the remote-sensing ocean satellite-image with complex distribution [24] In 2018 Li et aldeveloped a HSF-Net is net finds the multiscale deepfeature embedding for ship detection in optical remote-sensing imagery [25] Also in 2018 Yang et al proposed an

automatic ship detection of remote-sensing images fromGoogle Earth based on multiscale rotation dense featurepyramid networks [26] In 2019 Gao et al applied theFaster R-CNN to detect ships without the need for landmasking by incorporating a large number of images con-taining only terrestrial regions as negative samples withoutany manual marking [27] Also in 2019 Lin et al proposeda squeeze and excitation rank Faster R-CNN for ship de-tection in SAR images which shows a much better de-tection effect and speed than the traditional state-of-the-artmethods [28]

e above detection methods which are mainly based onremote sensing or radar images hardly meet real-time re-quirement due to timeliness of image acquisition us in2016 Zhao et al proposed a real-time algorithm based on thedeep CNN and combined with the HOG and HSV algo-rithms to achieve a good ship identification effect [29] In2017 Yang et al used the Faster R-CNN to achieve the videodetection of river vessels [30] In 2018 Shao et al built a newlarge-scale dataset of ships which is designed for trainingand evaluating ship object detection algorithms e datasetcurrently consists of 31455 images and covers six commonship types [31] In 2019 Shao et al proposed to use visualimages captured by an on-land surveillance camera networkto achieve real-time detection based on a saliency-awareCNN framework [32]

However with the improvement of the accuracy andreal-time requirements of ship detection and classificationin the practical application it is necessary to propose a shipimagevideo detection and classification method based onan improved regressive deep convolution network usthis paper makes a self-built dataset for 7 kinds of shipimagevideo detection and classification and its methodbased on an improved regressive deep CNN is presentedis method promotes the regressive CNN from four as-pects First the feature extraction layer is lightweighted byreferring to YOLOv2 Second a new Feature PyramidNetwork (FPN) layer is designed by improving its networkstructure in YOLOv3 ird a proper frame and scalesuitable for the ships are designed with the clustering al-gorithm to reduce 60 anchors Last the optimal activa-tion function is verified and optimized en this methodcan solve the problem of low recognition rate and real-timeperformance for ship imagevideo detection and classifi-cation through an end-to-end training e experiment on7 types of ships shows that the proposed method is better inship imagevideo detection and classification comparedwith the YOLO series network and other intelligentmethods On the testing-set the final mAP is 09209 theRecall is 09818 the AIOU is 07991 and the FPS is 78ndash80 invideo detection which takes into account both the accuracyand real-time performance for the ship detectionus thismethod provides a highly accurate and real-time shipdetection method for the intelligent port management andvisual processing of the USV In addition this paper alsoproposes a regressive deep convolutional network with abetter comprehensive performance than YOLOv2 andYOLOv3

2 Complexity

2 The Regressive Deep Convolutional NeuralNetwork (RDCNN)

e basic structure of the regressive deep CNN is mainlyconsisted of the input layer convolution layer pooling layerfull-connection layer and output layer

21 e Input Layer e function of the input layer is toreceive input image and store it in matrix form Assumingthat the regressive deep CNN has a structure of L layer thenxl represents the feature of No l layer l 1 2 L In it xl

is composed of multiple feature graphs which can berepresented as xl xl

1 xlj1113966 1113967 j is the number of the

feature graphs in l layerus the corresponding feature of acolor input image can be represented as x1 x1

1 x12 x1

31113864 1113865where x1

1 x12 and x1

3 represents the data of red green andblue channels respectively

22 e Convolutional Layer e function of the convo-lution layer is to extract features through convolution op-eration With a proper design the feature expression abilityof the regressive deep CNN will be strengthened with theincreasing of convolution layers e feature graph of No l

convolution layer can be calculated as

xlj f 1113944

jminus1

i1G

lij k

lij otimesx

lminus1i1113872 1113873 + b

lj

⎛⎝ ⎞⎠ (1)

where klij and bl

jare the weights of the convolution kerneland biases of the convolution layer respectively Gl

ij is theconnection matrix between No l convolution layer and thefeature graph of the previous l minus 1 convolution layer thesymbol otimes represents the convolution operation and f(x) isthe activation functionWhen Gl

ij is 1 xlminus1i is associated with

xlj when Gl

ij is 0 they are no correlations

23ePooling Layer e function of the pooling layer is toreduce the feature dimension e pooling layer is generallylocated behind the convolutional layer and the poolingoperation can maintain a certain spatial invariance efeature graph xl

j of the pooling operation in the l layer can becalculated as

xlj p x

lminus1j1113872 1113873 (2)

where p(x) represents the pooling operation

24 e Fully Connected Layer e function of the fullyconnected layer is to transform the deep feature obtained inthe front layers into a feature vector us this layer isusually set behind the feature extraction layer e featurevector xl in the fully connected layer can be calculated as

xl

f wlx

lminus 1+ b

l1113872 1113873 (3)

where wl is the connecting weight between two adjacentnetwork layers and bl is the offset and f(x) is the activationfunction

25eLoss Function e regressive deep CNN obtains thepredicted value through a forward propagation en theerror between the predicted value and real value is usuallycalculated with the following cross-entropy loss function

Loss minus1n

1113944x

[y ln 1113957y +(1 minus y)ln(1 minus 1113957y)] (4)

where x are the input samples y is the predicted output 1113957y isthe actual output and n represents the total number of theinput samples in one batch

26eNetwork Performance Index For the regressive deepCNN the IOU represents the overlap rate between thedetection window (Bgt) generated by the network model andthe actually marked window (Bdt) that is the ratio of theirintersection and union areas area(middot) means the area and theIOU can be calculated as

IOU area Bdt capBgt1113872 1113873

area Bdt cupBgt1113872 1113873 (5)

For the experiment of this paper the detection result ofIOUge 05 is set as a real positive sample and the detectionresult of IOUle 05 is set as a false-negative sample

As there are many kinds of targets detected in this paperthe AIOU (the average value of IOU) is used that is theaverage ratio between the intersection and union areas of thepredicted and actual boundary boxes on the testing-setwhich is denoted as

AIOU 1n

1113944

nminus1

i0IOU (6)

where n represents the number of detected targetse Recall (R) rate is used to represent the percentage of

the positive samples in the samples that are correctlypredicted

Recall tp

tp + fn

(7)

where tp represents a true positive sample and fn representsa false-negative sample

e Precision (P) indicates how many samples of thepositive prediction are truly positive samples

Precision tp

tp + fp

(8)

where fp represents a false positive samplee AP is an index used to measure the network

identification accuracy which is generally represented by thearea enclosed by the Recall rate and Precision curves As-suming that the curve of the recall rate and precision rate isPR then

AP 11139461

0PR dr (9)

Complexity 3

As there are 7 targets detected in this paper the mAP isused to represent the network identification accuracy that isthe average value of AP

mAP

11139461

0PR dr1113888 1113889

N

(10)

where n represents the number of the predicted categoriesthat is 7

In addition in order to measure the network speed forvideo detection the frames per second (FPS) is also used as aperformance index

3 The Improved RDCNN Based on YOLOv2v3

is research presents an improved RDCNN mainly basedon the YOLO series which also refers to the advantages ofthe current popular regression deep convolution networksBy promoting the feature extraction layer of YOLOv2 andthe FPN of YOLOv3 the improved network overcomes thedetection shortcomings of YOLOv2 and the training andrecognition speed shortcomings of YOLOv3 e improvednetwork also redesigns the anchors with the clustering al-gorithm and optimizes the effects of the activation functionboth according to the ship imagevideo detection andclassification Finally this algorithm achieves a good accu-racy and real-time performance in the ship imagevideodetection and classification

e improved network structure built in this research isshown in Figure 1 is network structure mainly consists ofthree parts the feature extraction layer FPN layer andprediction layer which are specifically described below

31 e Lightweighted Feature Extraction Layer e featureextraction layer is very important in building the networkstructure If the feature extraction layer is too large it mayget better deep features but it will also slow down the speedof the whole network For example in YOLOv3 the darknet-53 is used as the feature extraction layeris extraction layeris relatively slow in training and detection speed due to thedeep layer numbers In order to improve the presentednetwork with a lightweight feature extraction layer first thisnetwork adopts the Darknet-19 feature extraction layer ofYOLOv2 and the structure is shown in the left of Figure 1is feature extraction layer has the advantage of relativelyfew network layers and faster calculation speed and can alsoextract deep features well when inputting a color ship imageor video of 416times 416times 3 size

In addition with the increase of the feature extractionlayer numbers the network generally can obtain deeperfeatures with a more expressive power However simplyincreasing the number of network layers will result in agradient dispersion or explosion phenomena In order tosolve this problem in the later experiment a batch nor-malization strategy is added between the convolution(Conv2d) and activation (Leaky-Relu) of each convolutionoperation in the Darknet-19 feature extraction layer which

is shown in Figure 2 is strategy can effectively control thegradient problem caused by the network deepening

32eNew FPN Layer with a Clustering Algorithm For thefeature extraction the feature information in shallow layer isrelatively small but its location is accurate is has theadvantage for predicting small objects On the contrary thefeature information in deep layer is rich but its location isrelatively rough is is suitable for predicting large objects

us in order to make the network obtain a betterdetection result the improved network promotes themultiscale prediction idea of YOLOv3 to design a new FPNlayer which is shown in the right of Figure 1 is methodup samples the deep feature map into 26times 26 size afterpredicting a deep feature map of 13times13 size from the featureextraction layer and then merges the upsampled 26times 26feature map with the shallow 26times 26 feature map Finallythe network can detect and forecast the input image at twoscales

In addition to get a better network structure theclustering algorithm is also used and the effect of the col-lected data is fine-tuned and optimized for the ship imagevideo detection and classification Finally the obtainedanchor values are shown in Table 1 which predicts thefeature maps of 13times13 and 26times 26 scales setting 5 differentanchor frames on each scale

erefore for a 416times 416 size image the improvednetwork predicts a total of 4225 fixed prediction framescompared with YOLOv3 which has 9 anchor frames on 3scales and 10647 fixed prediction frames in total Obviouslythe number of anchor frames in the improved network isreduced by 6422 that is about 60

33 e Prediction Layer rough the prediction on theconvolution layer the spatial information can be well pre-served For the improved network the prediction method ofYOLOv2 is adopted in the prediction layer Each predictingframe predicts 7 ship categories and 5 frame information(tx ty tw th to) of which the first four parameters are thedetecting object coordinates and to is the predicting con-fidence In this paper the loss function of YOLOv2 is alsoused in the prediction layer

34 e Optimization of the Activation Function for the Im-proved RDCNN In order to optimize the influence of theactivation function combined with the network structureproposed in this paper the ELU and Leak-Relu activationfunctions of equations (11) and (12) are also used and testedexcept for the commonly used Relu

ELU(x) ex minus 1 if xle 0

x if xgt 01113896 (11)

Leak Relu(x) 01x if xle 0

x if xgt 01113896 (12)

4 Complexity

rough the experimental comparison the activationfunction with the best ship imagevideo detection andclassification effect can be optimized e results on thetesting-set are obtained which is shown in Table 2

In the experiment the Leaky-Relu activation functionhas the best comprehensive detection effect and is lessoperable than the Relu and ELU activation functions usthe Leaky-Relu is selected as the optimized activationfunction

4 The Making of Ship Dataset andExperimental Environment

41 e Making of Ship Dataset At present the populartarget-detection datasets are VOC and COCO but thesedatasets classify ships as only one kind In a specific ap-plication it often needs to classify ships more preciselyerefore in this research the dataset of ship images is builtafter collecting and labeling by ourselves

e main way to collect the ship images is the InternetAs the images are found from the Internet the pixels res-olution are different and the size of the images are alsodifferent such as 500times 400times 3 and 500times 318times 3e imagescontaining the ships are cut roughly according to the lengthto width ratio of 1 1 e scale of ship proportion to the

whole image in each image is also different even verydifferent which can be seen from Figures 3ndash5 of the databaseimages or the detected images ese naturally producedimages of different specification and quality are moreconducive to the training effect and generalization abilityBefore training they were all resized to 416times 416times 3 sizeimages

After the dataset is collected it needs to be labeled beforeusing as the network input e labeling tool used in thispaper is LabelIMG In the LabelIMG the target object can beselected in the image with a rectangle box and be saved witha label en a file with the suffix of xml can be got is filecontains the path name resolution of the original image aswell as the coordinates and name information of the targetobject in the image

ere are many types of ships in real application Inorder to facilitate research and save costs this paper onlycollects 7 representative types of ships the sailing shipcontainer ship yacht cruise ship ferry coast guard ship andfishing boat After filtering and classification the finaldataset size is 4200 manual-selected images which includes600 images in each categorye 480 images in each categoryare randomly selected as the training-set and eachremaining 120 images are set as the testing-set In this waythe total size of the training-set is 3360 images and the totalsize of the testing-set is 840 images e typical images ofeach category in the dataset are shown in Figure 3

42 e Experimental Environment Configuration e ex-perimental environment of this research is configured asfollows e CPU Intel i7-7700 with 42GHz main fre-quency the memory 16G the GPU two of Nvidia GTX1080Ti the operating system Ubantu 1604 In order to make fulluse of the GPU to accelerate the network training the

PredictionlayerFeature extraction layers

FPN layer

ConvolutionMax pooling Upsamping

416 times 416

208 times 208

10 times 104 52 times 5226 times 26 13 times 13

26 times 26

13 times 13

Reorg

Figure 1 e network structure of the improved RDCNN by promoting YOLOv2 and v3

Batchnormalization

Convolutional

Leaky-ReluConv2d

Figure 2 e addition of batch normalization into the Darknet-19and other convolution layers

Table 1 e anchor values of different scales proper for the shipimagevideo detection and classification

ID 13times13 26times 261 (134221 152115) (134221 152115)2 (342217 430406) (342217 430406)3 (505587 809892) (505587 809892)4 (976561 430316) (976561 430316)5 (1152521 111835) (1152521 111835)

Table 2 e experiment effects of different activation functions

Activation function AIOU mAP RecallRelu 07247 07932 09492ELU 07306 08005 09559Leaky-Relu 07329 07987 09616

Complexity 5

CUDA 90 and its matching CUDNN are installed in thesystem In addition the OpenCV34 is also installed in theenvironment to display the results of the network detectionand classification

During the experiment the setting of the experimentalparameters is very important ere are many parameters tobe set in our improved RDCNN and YOLOv2v3 such as thebatch number down sampling size momentum parameterand learning rate e setting of these parameters will affectnot only the normal operation of the network but also thetraining effect For example when the setting number of thebatch is too large the network will not run if the memory ofthe workstation is not big enough

Considering the conditions of our experimental envi-ronment and also for comparing convenience the sameparameters are set for the improved RDCNN and YOLOv2v3e network parameters are set as follows the number ofsmall batch is 64 and divided into 8 sub-batches the iterationnumber is 8000 the momentum parameter is 09 the weightattenuation is 00005 and the learning rate is 0001 whichare shown in the following Table 3

5 The Training and DetectionBased on YOLOv2v3

51e IterativeConvergenceTraining Generally whether anetwork meets the training requirements is judged by the

convergence of the loss function In this experiment due tothe small size of the dataset and sufficient computing abilitythe convergence with only 8000 times of iterations isachieved which takes about only 1 hour and 40minuteseLoss and AIOU curves of the feedforward training processare shown in Figures 6 and 7 respectively It can be seenfrom Figures 6 and 7 that the training has converged steadilywhen the number of the network training reaches 8000times

e training time of YOLOv3 is relatively long and ittakes about 3 hours and 40 minutes for 8000 times of it-eration convergence process e Loss and AIOU curves ofthe feedforward training process are shown in Figures 8 and9 respectively It can also be seen from Figures 8 and 9 thatafter 8000 times of iterative training the Loss and AIOU ofthe network also have converged steadily

Finally the weight parameters obtained through 8000network iterations in the feedforward training are saved inthe experiment

52 e Detection Performance Testing After the networktraining is stable it is necessary to verify its detection effecton the testing-set especially to avoid a decline of the de-tection effect caused by overfitting First the network in-dexes obtained with the weights of No 8000 iteration underthe testing-set are taken as the evaluation criteria especific values are shown in Table 4

Figure 3 e dataset image demonstration the coast guard ship container ship cruise ship ferry fishing boat sailboat and yacht

Figure 4 e representative detection results of the improved RDCNN network method

(a) (b) (c)

Figure 5 e comparison of detection effect among the method of YOLOv2 (a) YOLOv3 (b) and the improved RDCNN (c)

6 Complexity

As the network cannot measure its weight parameters inreal time under the training-set during its feedforwardrunning the network parameters generated in the Nos 400600 800 1000 2000 3000 4000 5000 6000 7000 and 8000training iterations are also taken here to load into thenetwork for a later test and verification In order to betteranalyze the detection effect of YOLOv2v3 in this task theAIOU and mAP parameters of the network are compared indifferent testing iterations under the testing-set which areshown in Figures 10 and 11

From theAIOU andmAP curves on the testing-set it canbeen seen that the performance indexes of the network onthe testing-set have been stable ere is also no overfitting

phenomenon caused by too many training times roughcomparison we can see that YOLOv3 as an improvedversion of YOLOv2 has advantages in the AIOU and mAPperformance indexes at is it has 00057 higher in theAIOU and 00115 higher in the mAP than that of YOLOv2However as the advantages of YOLOv3 are obtained bydeepening and improving its network structure its detectionspeed is 49 FPS lower than that of YOLOv2

6 The Experiment and Analysis of theImproved RDCNN

61 e Network Performance Experiment e improvedRDCNN takes 20 more minutes to complete the 8000training iterations convergence process compared withYOLOv2 However the training time is much lower thanthat of YOLOv3 e Loss and AIOU curves of the feed-forward training process are shown in Figures 12 and 13respectively It can also be seen from Figures 12 and 13 thatafter 8000 times of iterative training the Loss and AIOU ofthe improved network have been converged steadily

In order to verify the detection effect of the RDCNN onthe testing-set the network weight parameters generated inthe Nos 400 600 800 1000 2000 3000 4000 5000 60007000 and 8000 training iterations are taken here to load intothe improved network for a later test and verification enthe AIOU and mAP parameter curves under the testing-setare tested in different testing iterations of the network whichare shown in Figures 14 and 15 is paper applies the twoeditions of YOLO networks as well as the presented im-proved RDCNN based on YOLO into the ship imagevideodetection and classification us the comparing

Table 3 e same training parameters for our improved RDCNN and YOLOv2v3

Batch number Down sampling size Image size Momentum parameter Weight attenuation Learning rate Iteration number64 8 416times 416times 3 09 00005 0001 8000

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 6 e iterative convergence process of the Loss curve forYOLOv2 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 7 e iterative convergence process of the AIOU curve forYOLOv2 network training

0

5

10

0 2000 4000 6000 8000

Loss

Iteration

Figure 8 e iterative convergence process of the Loss curve forYOLOv3 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 9 e iterative convergence process of the AIOU curve forYOLOv3 network training

Table 4 e comparison of performance indexes betweenYOLOv2v3 after 8000th iteration

Network AIOU mAP Recall FPS Training timeYOLOv2 07838 09165 09425 94ndash95 1 h 40minYOLOv3 07895 09280 09674 45ndash46 3 h 40min

Complexity 7

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 2: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

background modeling and background subtraction are allbased on the discrete cosine transform [5] (2) e methodbased on threshold It is usually very practical to detect shipsdirectly with the threshold method In 1996 Eldhusetproposed a method based on the local threshold which takesthe ship out of the background and uses filtering windowmethod in detection [6] In 1999 Zhou et al designed aglobal threshold algorithm which can complete the adaptivecalculation and ship detection using the statistical charac-teristics of dataset images that is the adaptive thresholdmethod [7] In 2013 Rey used statistical data to solve featurewhen calculating the overall threshold value of ship imageswhich is a method based on the probability density functionto detect ships on water [8] In 2018 Li and Li proposed amethod based on the high and low thresholds to detect shipedge feature and achieved a high accuracy of ship edgedetection [9]

Although the above studies have achieved good resultsthe traditional methods are mostly based on the shipstructure and shape for manual feature design Even if thebest nonlinear classifier is used to classify these manuallydesigned features the accuracy of ship detection cannotmeet the practical needs erefore these methods cannotachieve good results in the case of complex backgroundand small hull differences in a real environment and therecognition rate of multiple-ship classification is also notideal

Fortunately after a development of more than tenyears the target detection based on the deep ConvolutionalNeural Network (CNN) has made a great progress in theapplication of human face pedestrian and other scenese CNN was first proposed by professor LeCun fromToronto university in Canada e depth and width of theCNN have been continuously increased and its accuracyfor image recognition has also been continuously in-creased e commonly used CNN includes the Lenet-5[10] AlexNet [11] VGG [12] GoogLenet [13] ResNet [14]and DenseNet [15] At the same time there are some re-searches in the application of the deep CNN for shiprecognition and detectione deep convolutional networkfor target detection can be divided into two categories (1)the region-based methods such as the RndashCNN [16] Fast-RCNN [17] and Faster-RCNN [18] (2) the regression-based methods such as the SSD [19] YOLO [20] YOLOv2[21] and YOLOv3 [22] e regression-based deep con-volutional network uses the CNN as a regression andreturns the position information of the target in the imagethrough an end-to-end training and gets the final boundingbox and classification results

In 2017 Kang et al presented a contextual region-basedCNN with multilayer fusion for SAR ship detection [23] In2018 Wang et al proposed a ship detection algorithmcombining the CFAR and CNN is algorithm is moreaccurate and faster in the remote-sensing ocean satellite-image with complex distribution [24] In 2018 Li et aldeveloped a HSF-Net is net finds the multiscale deepfeature embedding for ship detection in optical remote-sensing imagery [25] Also in 2018 Yang et al proposed an

automatic ship detection of remote-sensing images fromGoogle Earth based on multiscale rotation dense featurepyramid networks [26] In 2019 Gao et al applied theFaster R-CNN to detect ships without the need for landmasking by incorporating a large number of images con-taining only terrestrial regions as negative samples withoutany manual marking [27] Also in 2019 Lin et al proposeda squeeze and excitation rank Faster R-CNN for ship de-tection in SAR images which shows a much better de-tection effect and speed than the traditional state-of-the-artmethods [28]

e above detection methods which are mainly based onremote sensing or radar images hardly meet real-time re-quirement due to timeliness of image acquisition us in2016 Zhao et al proposed a real-time algorithm based on thedeep CNN and combined with the HOG and HSV algo-rithms to achieve a good ship identification effect [29] In2017 Yang et al used the Faster R-CNN to achieve the videodetection of river vessels [30] In 2018 Shao et al built a newlarge-scale dataset of ships which is designed for trainingand evaluating ship object detection algorithms e datasetcurrently consists of 31455 images and covers six commonship types [31] In 2019 Shao et al proposed to use visualimages captured by an on-land surveillance camera networkto achieve real-time detection based on a saliency-awareCNN framework [32]

However with the improvement of the accuracy andreal-time requirements of ship detection and classificationin the practical application it is necessary to propose a shipimagevideo detection and classification method based onan improved regressive deep convolution network usthis paper makes a self-built dataset for 7 kinds of shipimagevideo detection and classification and its methodbased on an improved regressive deep CNN is presentedis method promotes the regressive CNN from four as-pects First the feature extraction layer is lightweighted byreferring to YOLOv2 Second a new Feature PyramidNetwork (FPN) layer is designed by improving its networkstructure in YOLOv3 ird a proper frame and scalesuitable for the ships are designed with the clustering al-gorithm to reduce 60 anchors Last the optimal activa-tion function is verified and optimized en this methodcan solve the problem of low recognition rate and real-timeperformance for ship imagevideo detection and classifi-cation through an end-to-end training e experiment on7 types of ships shows that the proposed method is better inship imagevideo detection and classification comparedwith the YOLO series network and other intelligentmethods On the testing-set the final mAP is 09209 theRecall is 09818 the AIOU is 07991 and the FPS is 78ndash80 invideo detection which takes into account both the accuracyand real-time performance for the ship detectionus thismethod provides a highly accurate and real-time shipdetection method for the intelligent port management andvisual processing of the USV In addition this paper alsoproposes a regressive deep convolutional network with abetter comprehensive performance than YOLOv2 andYOLOv3

2 Complexity

2 The Regressive Deep Convolutional NeuralNetwork (RDCNN)

e basic structure of the regressive deep CNN is mainlyconsisted of the input layer convolution layer pooling layerfull-connection layer and output layer

21 e Input Layer e function of the input layer is toreceive input image and store it in matrix form Assumingthat the regressive deep CNN has a structure of L layer thenxl represents the feature of No l layer l 1 2 L In it xl

is composed of multiple feature graphs which can berepresented as xl xl

1 xlj1113966 1113967 j is the number of the

feature graphs in l layerus the corresponding feature of acolor input image can be represented as x1 x1

1 x12 x1

31113864 1113865where x1

1 x12 and x1

3 represents the data of red green andblue channels respectively

22 e Convolutional Layer e function of the convo-lution layer is to extract features through convolution op-eration With a proper design the feature expression abilityof the regressive deep CNN will be strengthened with theincreasing of convolution layers e feature graph of No l

convolution layer can be calculated as

xlj f 1113944

jminus1

i1G

lij k

lij otimesx

lminus1i1113872 1113873 + b

lj

⎛⎝ ⎞⎠ (1)

where klij and bl

jare the weights of the convolution kerneland biases of the convolution layer respectively Gl

ij is theconnection matrix between No l convolution layer and thefeature graph of the previous l minus 1 convolution layer thesymbol otimes represents the convolution operation and f(x) isthe activation functionWhen Gl

ij is 1 xlminus1i is associated with

xlj when Gl

ij is 0 they are no correlations

23ePooling Layer e function of the pooling layer is toreduce the feature dimension e pooling layer is generallylocated behind the convolutional layer and the poolingoperation can maintain a certain spatial invariance efeature graph xl

j of the pooling operation in the l layer can becalculated as

xlj p x

lminus1j1113872 1113873 (2)

where p(x) represents the pooling operation

24 e Fully Connected Layer e function of the fullyconnected layer is to transform the deep feature obtained inthe front layers into a feature vector us this layer isusually set behind the feature extraction layer e featurevector xl in the fully connected layer can be calculated as

xl

f wlx

lminus 1+ b

l1113872 1113873 (3)

where wl is the connecting weight between two adjacentnetwork layers and bl is the offset and f(x) is the activationfunction

25eLoss Function e regressive deep CNN obtains thepredicted value through a forward propagation en theerror between the predicted value and real value is usuallycalculated with the following cross-entropy loss function

Loss minus1n

1113944x

[y ln 1113957y +(1 minus y)ln(1 minus 1113957y)] (4)

where x are the input samples y is the predicted output 1113957y isthe actual output and n represents the total number of theinput samples in one batch

26eNetwork Performance Index For the regressive deepCNN the IOU represents the overlap rate between thedetection window (Bgt) generated by the network model andthe actually marked window (Bdt) that is the ratio of theirintersection and union areas area(middot) means the area and theIOU can be calculated as

IOU area Bdt capBgt1113872 1113873

area Bdt cupBgt1113872 1113873 (5)

For the experiment of this paper the detection result ofIOUge 05 is set as a real positive sample and the detectionresult of IOUle 05 is set as a false-negative sample

As there are many kinds of targets detected in this paperthe AIOU (the average value of IOU) is used that is theaverage ratio between the intersection and union areas of thepredicted and actual boundary boxes on the testing-setwhich is denoted as

AIOU 1n

1113944

nminus1

i0IOU (6)

where n represents the number of detected targetse Recall (R) rate is used to represent the percentage of

the positive samples in the samples that are correctlypredicted

Recall tp

tp + fn

(7)

where tp represents a true positive sample and fn representsa false-negative sample

e Precision (P) indicates how many samples of thepositive prediction are truly positive samples

Precision tp

tp + fp

(8)

where fp represents a false positive samplee AP is an index used to measure the network

identification accuracy which is generally represented by thearea enclosed by the Recall rate and Precision curves As-suming that the curve of the recall rate and precision rate isPR then

AP 11139461

0PR dr (9)

Complexity 3

As there are 7 targets detected in this paper the mAP isused to represent the network identification accuracy that isthe average value of AP

mAP

11139461

0PR dr1113888 1113889

N

(10)

where n represents the number of the predicted categoriesthat is 7

In addition in order to measure the network speed forvideo detection the frames per second (FPS) is also used as aperformance index

3 The Improved RDCNN Based on YOLOv2v3

is research presents an improved RDCNN mainly basedon the YOLO series which also refers to the advantages ofthe current popular regression deep convolution networksBy promoting the feature extraction layer of YOLOv2 andthe FPN of YOLOv3 the improved network overcomes thedetection shortcomings of YOLOv2 and the training andrecognition speed shortcomings of YOLOv3 e improvednetwork also redesigns the anchors with the clustering al-gorithm and optimizes the effects of the activation functionboth according to the ship imagevideo detection andclassification Finally this algorithm achieves a good accu-racy and real-time performance in the ship imagevideodetection and classification

e improved network structure built in this research isshown in Figure 1 is network structure mainly consists ofthree parts the feature extraction layer FPN layer andprediction layer which are specifically described below

31 e Lightweighted Feature Extraction Layer e featureextraction layer is very important in building the networkstructure If the feature extraction layer is too large it mayget better deep features but it will also slow down the speedof the whole network For example in YOLOv3 the darknet-53 is used as the feature extraction layeris extraction layeris relatively slow in training and detection speed due to thedeep layer numbers In order to improve the presentednetwork with a lightweight feature extraction layer first thisnetwork adopts the Darknet-19 feature extraction layer ofYOLOv2 and the structure is shown in the left of Figure 1is feature extraction layer has the advantage of relativelyfew network layers and faster calculation speed and can alsoextract deep features well when inputting a color ship imageor video of 416times 416times 3 size

In addition with the increase of the feature extractionlayer numbers the network generally can obtain deeperfeatures with a more expressive power However simplyincreasing the number of network layers will result in agradient dispersion or explosion phenomena In order tosolve this problem in the later experiment a batch nor-malization strategy is added between the convolution(Conv2d) and activation (Leaky-Relu) of each convolutionoperation in the Darknet-19 feature extraction layer which

is shown in Figure 2 is strategy can effectively control thegradient problem caused by the network deepening

32eNew FPN Layer with a Clustering Algorithm For thefeature extraction the feature information in shallow layer isrelatively small but its location is accurate is has theadvantage for predicting small objects On the contrary thefeature information in deep layer is rich but its location isrelatively rough is is suitable for predicting large objects

us in order to make the network obtain a betterdetection result the improved network promotes themultiscale prediction idea of YOLOv3 to design a new FPNlayer which is shown in the right of Figure 1 is methodup samples the deep feature map into 26times 26 size afterpredicting a deep feature map of 13times13 size from the featureextraction layer and then merges the upsampled 26times 26feature map with the shallow 26times 26 feature map Finallythe network can detect and forecast the input image at twoscales

In addition to get a better network structure theclustering algorithm is also used and the effect of the col-lected data is fine-tuned and optimized for the ship imagevideo detection and classification Finally the obtainedanchor values are shown in Table 1 which predicts thefeature maps of 13times13 and 26times 26 scales setting 5 differentanchor frames on each scale

erefore for a 416times 416 size image the improvednetwork predicts a total of 4225 fixed prediction framescompared with YOLOv3 which has 9 anchor frames on 3scales and 10647 fixed prediction frames in total Obviouslythe number of anchor frames in the improved network isreduced by 6422 that is about 60

33 e Prediction Layer rough the prediction on theconvolution layer the spatial information can be well pre-served For the improved network the prediction method ofYOLOv2 is adopted in the prediction layer Each predictingframe predicts 7 ship categories and 5 frame information(tx ty tw th to) of which the first four parameters are thedetecting object coordinates and to is the predicting con-fidence In this paper the loss function of YOLOv2 is alsoused in the prediction layer

34 e Optimization of the Activation Function for the Im-proved RDCNN In order to optimize the influence of theactivation function combined with the network structureproposed in this paper the ELU and Leak-Relu activationfunctions of equations (11) and (12) are also used and testedexcept for the commonly used Relu

ELU(x) ex minus 1 if xle 0

x if xgt 01113896 (11)

Leak Relu(x) 01x if xle 0

x if xgt 01113896 (12)

4 Complexity

rough the experimental comparison the activationfunction with the best ship imagevideo detection andclassification effect can be optimized e results on thetesting-set are obtained which is shown in Table 2

In the experiment the Leaky-Relu activation functionhas the best comprehensive detection effect and is lessoperable than the Relu and ELU activation functions usthe Leaky-Relu is selected as the optimized activationfunction

4 The Making of Ship Dataset andExperimental Environment

41 e Making of Ship Dataset At present the populartarget-detection datasets are VOC and COCO but thesedatasets classify ships as only one kind In a specific ap-plication it often needs to classify ships more preciselyerefore in this research the dataset of ship images is builtafter collecting and labeling by ourselves

e main way to collect the ship images is the InternetAs the images are found from the Internet the pixels res-olution are different and the size of the images are alsodifferent such as 500times 400times 3 and 500times 318times 3e imagescontaining the ships are cut roughly according to the lengthto width ratio of 1 1 e scale of ship proportion to the

whole image in each image is also different even verydifferent which can be seen from Figures 3ndash5 of the databaseimages or the detected images ese naturally producedimages of different specification and quality are moreconducive to the training effect and generalization abilityBefore training they were all resized to 416times 416times 3 sizeimages

After the dataset is collected it needs to be labeled beforeusing as the network input e labeling tool used in thispaper is LabelIMG In the LabelIMG the target object can beselected in the image with a rectangle box and be saved witha label en a file with the suffix of xml can be got is filecontains the path name resolution of the original image aswell as the coordinates and name information of the targetobject in the image

ere are many types of ships in real application Inorder to facilitate research and save costs this paper onlycollects 7 representative types of ships the sailing shipcontainer ship yacht cruise ship ferry coast guard ship andfishing boat After filtering and classification the finaldataset size is 4200 manual-selected images which includes600 images in each categorye 480 images in each categoryare randomly selected as the training-set and eachremaining 120 images are set as the testing-set In this waythe total size of the training-set is 3360 images and the totalsize of the testing-set is 840 images e typical images ofeach category in the dataset are shown in Figure 3

42 e Experimental Environment Configuration e ex-perimental environment of this research is configured asfollows e CPU Intel i7-7700 with 42GHz main fre-quency the memory 16G the GPU two of Nvidia GTX1080Ti the operating system Ubantu 1604 In order to make fulluse of the GPU to accelerate the network training the

PredictionlayerFeature extraction layers

FPN layer

ConvolutionMax pooling Upsamping

416 times 416

208 times 208

10 times 104 52 times 5226 times 26 13 times 13

26 times 26

13 times 13

Reorg

Figure 1 e network structure of the improved RDCNN by promoting YOLOv2 and v3

Batchnormalization

Convolutional

Leaky-ReluConv2d

Figure 2 e addition of batch normalization into the Darknet-19and other convolution layers

Table 1 e anchor values of different scales proper for the shipimagevideo detection and classification

ID 13times13 26times 261 (134221 152115) (134221 152115)2 (342217 430406) (342217 430406)3 (505587 809892) (505587 809892)4 (976561 430316) (976561 430316)5 (1152521 111835) (1152521 111835)

Table 2 e experiment effects of different activation functions

Activation function AIOU mAP RecallRelu 07247 07932 09492ELU 07306 08005 09559Leaky-Relu 07329 07987 09616

Complexity 5

CUDA 90 and its matching CUDNN are installed in thesystem In addition the OpenCV34 is also installed in theenvironment to display the results of the network detectionand classification

During the experiment the setting of the experimentalparameters is very important ere are many parameters tobe set in our improved RDCNN and YOLOv2v3 such as thebatch number down sampling size momentum parameterand learning rate e setting of these parameters will affectnot only the normal operation of the network but also thetraining effect For example when the setting number of thebatch is too large the network will not run if the memory ofthe workstation is not big enough

Considering the conditions of our experimental envi-ronment and also for comparing convenience the sameparameters are set for the improved RDCNN and YOLOv2v3e network parameters are set as follows the number ofsmall batch is 64 and divided into 8 sub-batches the iterationnumber is 8000 the momentum parameter is 09 the weightattenuation is 00005 and the learning rate is 0001 whichare shown in the following Table 3

5 The Training and DetectionBased on YOLOv2v3

51e IterativeConvergenceTraining Generally whether anetwork meets the training requirements is judged by the

convergence of the loss function In this experiment due tothe small size of the dataset and sufficient computing abilitythe convergence with only 8000 times of iterations isachieved which takes about only 1 hour and 40minuteseLoss and AIOU curves of the feedforward training processare shown in Figures 6 and 7 respectively It can be seenfrom Figures 6 and 7 that the training has converged steadilywhen the number of the network training reaches 8000times

e training time of YOLOv3 is relatively long and ittakes about 3 hours and 40 minutes for 8000 times of it-eration convergence process e Loss and AIOU curves ofthe feedforward training process are shown in Figures 8 and9 respectively It can also be seen from Figures 8 and 9 thatafter 8000 times of iterative training the Loss and AIOU ofthe network also have converged steadily

Finally the weight parameters obtained through 8000network iterations in the feedforward training are saved inthe experiment

52 e Detection Performance Testing After the networktraining is stable it is necessary to verify its detection effecton the testing-set especially to avoid a decline of the de-tection effect caused by overfitting First the network in-dexes obtained with the weights of No 8000 iteration underthe testing-set are taken as the evaluation criteria especific values are shown in Table 4

Figure 3 e dataset image demonstration the coast guard ship container ship cruise ship ferry fishing boat sailboat and yacht

Figure 4 e representative detection results of the improved RDCNN network method

(a) (b) (c)

Figure 5 e comparison of detection effect among the method of YOLOv2 (a) YOLOv3 (b) and the improved RDCNN (c)

6 Complexity

As the network cannot measure its weight parameters inreal time under the training-set during its feedforwardrunning the network parameters generated in the Nos 400600 800 1000 2000 3000 4000 5000 6000 7000 and 8000training iterations are also taken here to load into thenetwork for a later test and verification In order to betteranalyze the detection effect of YOLOv2v3 in this task theAIOU and mAP parameters of the network are compared indifferent testing iterations under the testing-set which areshown in Figures 10 and 11

From theAIOU andmAP curves on the testing-set it canbeen seen that the performance indexes of the network onthe testing-set have been stable ere is also no overfitting

phenomenon caused by too many training times roughcomparison we can see that YOLOv3 as an improvedversion of YOLOv2 has advantages in the AIOU and mAPperformance indexes at is it has 00057 higher in theAIOU and 00115 higher in the mAP than that of YOLOv2However as the advantages of YOLOv3 are obtained bydeepening and improving its network structure its detectionspeed is 49 FPS lower than that of YOLOv2

6 The Experiment and Analysis of theImproved RDCNN

61 e Network Performance Experiment e improvedRDCNN takes 20 more minutes to complete the 8000training iterations convergence process compared withYOLOv2 However the training time is much lower thanthat of YOLOv3 e Loss and AIOU curves of the feed-forward training process are shown in Figures 12 and 13respectively It can also be seen from Figures 12 and 13 thatafter 8000 times of iterative training the Loss and AIOU ofthe improved network have been converged steadily

In order to verify the detection effect of the RDCNN onthe testing-set the network weight parameters generated inthe Nos 400 600 800 1000 2000 3000 4000 5000 60007000 and 8000 training iterations are taken here to load intothe improved network for a later test and verification enthe AIOU and mAP parameter curves under the testing-setare tested in different testing iterations of the network whichare shown in Figures 14 and 15 is paper applies the twoeditions of YOLO networks as well as the presented im-proved RDCNN based on YOLO into the ship imagevideodetection and classification us the comparing

Table 3 e same training parameters for our improved RDCNN and YOLOv2v3

Batch number Down sampling size Image size Momentum parameter Weight attenuation Learning rate Iteration number64 8 416times 416times 3 09 00005 0001 8000

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 6 e iterative convergence process of the Loss curve forYOLOv2 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 7 e iterative convergence process of the AIOU curve forYOLOv2 network training

0

5

10

0 2000 4000 6000 8000

Loss

Iteration

Figure 8 e iterative convergence process of the Loss curve forYOLOv3 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 9 e iterative convergence process of the AIOU curve forYOLOv3 network training

Table 4 e comparison of performance indexes betweenYOLOv2v3 after 8000th iteration

Network AIOU mAP Recall FPS Training timeYOLOv2 07838 09165 09425 94ndash95 1 h 40minYOLOv3 07895 09280 09674 45ndash46 3 h 40min

Complexity 7

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 3: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

2 The Regressive Deep Convolutional NeuralNetwork (RDCNN)

e basic structure of the regressive deep CNN is mainlyconsisted of the input layer convolution layer pooling layerfull-connection layer and output layer

21 e Input Layer e function of the input layer is toreceive input image and store it in matrix form Assumingthat the regressive deep CNN has a structure of L layer thenxl represents the feature of No l layer l 1 2 L In it xl

is composed of multiple feature graphs which can berepresented as xl xl

1 xlj1113966 1113967 j is the number of the

feature graphs in l layerus the corresponding feature of acolor input image can be represented as x1 x1

1 x12 x1

31113864 1113865where x1

1 x12 and x1

3 represents the data of red green andblue channels respectively

22 e Convolutional Layer e function of the convo-lution layer is to extract features through convolution op-eration With a proper design the feature expression abilityof the regressive deep CNN will be strengthened with theincreasing of convolution layers e feature graph of No l

convolution layer can be calculated as

xlj f 1113944

jminus1

i1G

lij k

lij otimesx

lminus1i1113872 1113873 + b

lj

⎛⎝ ⎞⎠ (1)

where klij and bl

jare the weights of the convolution kerneland biases of the convolution layer respectively Gl

ij is theconnection matrix between No l convolution layer and thefeature graph of the previous l minus 1 convolution layer thesymbol otimes represents the convolution operation and f(x) isthe activation functionWhen Gl

ij is 1 xlminus1i is associated with

xlj when Gl

ij is 0 they are no correlations

23ePooling Layer e function of the pooling layer is toreduce the feature dimension e pooling layer is generallylocated behind the convolutional layer and the poolingoperation can maintain a certain spatial invariance efeature graph xl

j of the pooling operation in the l layer can becalculated as

xlj p x

lminus1j1113872 1113873 (2)

where p(x) represents the pooling operation

24 e Fully Connected Layer e function of the fullyconnected layer is to transform the deep feature obtained inthe front layers into a feature vector us this layer isusually set behind the feature extraction layer e featurevector xl in the fully connected layer can be calculated as

xl

f wlx

lminus 1+ b

l1113872 1113873 (3)

where wl is the connecting weight between two adjacentnetwork layers and bl is the offset and f(x) is the activationfunction

25eLoss Function e regressive deep CNN obtains thepredicted value through a forward propagation en theerror between the predicted value and real value is usuallycalculated with the following cross-entropy loss function

Loss minus1n

1113944x

[y ln 1113957y +(1 minus y)ln(1 minus 1113957y)] (4)

where x are the input samples y is the predicted output 1113957y isthe actual output and n represents the total number of theinput samples in one batch

26eNetwork Performance Index For the regressive deepCNN the IOU represents the overlap rate between thedetection window (Bgt) generated by the network model andthe actually marked window (Bdt) that is the ratio of theirintersection and union areas area(middot) means the area and theIOU can be calculated as

IOU area Bdt capBgt1113872 1113873

area Bdt cupBgt1113872 1113873 (5)

For the experiment of this paper the detection result ofIOUge 05 is set as a real positive sample and the detectionresult of IOUle 05 is set as a false-negative sample

As there are many kinds of targets detected in this paperthe AIOU (the average value of IOU) is used that is theaverage ratio between the intersection and union areas of thepredicted and actual boundary boxes on the testing-setwhich is denoted as

AIOU 1n

1113944

nminus1

i0IOU (6)

where n represents the number of detected targetse Recall (R) rate is used to represent the percentage of

the positive samples in the samples that are correctlypredicted

Recall tp

tp + fn

(7)

where tp represents a true positive sample and fn representsa false-negative sample

e Precision (P) indicates how many samples of thepositive prediction are truly positive samples

Precision tp

tp + fp

(8)

where fp represents a false positive samplee AP is an index used to measure the network

identification accuracy which is generally represented by thearea enclosed by the Recall rate and Precision curves As-suming that the curve of the recall rate and precision rate isPR then

AP 11139461

0PR dr (9)

Complexity 3

As there are 7 targets detected in this paper the mAP isused to represent the network identification accuracy that isthe average value of AP

mAP

11139461

0PR dr1113888 1113889

N

(10)

where n represents the number of the predicted categoriesthat is 7

In addition in order to measure the network speed forvideo detection the frames per second (FPS) is also used as aperformance index

3 The Improved RDCNN Based on YOLOv2v3

is research presents an improved RDCNN mainly basedon the YOLO series which also refers to the advantages ofthe current popular regression deep convolution networksBy promoting the feature extraction layer of YOLOv2 andthe FPN of YOLOv3 the improved network overcomes thedetection shortcomings of YOLOv2 and the training andrecognition speed shortcomings of YOLOv3 e improvednetwork also redesigns the anchors with the clustering al-gorithm and optimizes the effects of the activation functionboth according to the ship imagevideo detection andclassification Finally this algorithm achieves a good accu-racy and real-time performance in the ship imagevideodetection and classification

e improved network structure built in this research isshown in Figure 1 is network structure mainly consists ofthree parts the feature extraction layer FPN layer andprediction layer which are specifically described below

31 e Lightweighted Feature Extraction Layer e featureextraction layer is very important in building the networkstructure If the feature extraction layer is too large it mayget better deep features but it will also slow down the speedof the whole network For example in YOLOv3 the darknet-53 is used as the feature extraction layeris extraction layeris relatively slow in training and detection speed due to thedeep layer numbers In order to improve the presentednetwork with a lightweight feature extraction layer first thisnetwork adopts the Darknet-19 feature extraction layer ofYOLOv2 and the structure is shown in the left of Figure 1is feature extraction layer has the advantage of relativelyfew network layers and faster calculation speed and can alsoextract deep features well when inputting a color ship imageor video of 416times 416times 3 size

In addition with the increase of the feature extractionlayer numbers the network generally can obtain deeperfeatures with a more expressive power However simplyincreasing the number of network layers will result in agradient dispersion or explosion phenomena In order tosolve this problem in the later experiment a batch nor-malization strategy is added between the convolution(Conv2d) and activation (Leaky-Relu) of each convolutionoperation in the Darknet-19 feature extraction layer which

is shown in Figure 2 is strategy can effectively control thegradient problem caused by the network deepening

32eNew FPN Layer with a Clustering Algorithm For thefeature extraction the feature information in shallow layer isrelatively small but its location is accurate is has theadvantage for predicting small objects On the contrary thefeature information in deep layer is rich but its location isrelatively rough is is suitable for predicting large objects

us in order to make the network obtain a betterdetection result the improved network promotes themultiscale prediction idea of YOLOv3 to design a new FPNlayer which is shown in the right of Figure 1 is methodup samples the deep feature map into 26times 26 size afterpredicting a deep feature map of 13times13 size from the featureextraction layer and then merges the upsampled 26times 26feature map with the shallow 26times 26 feature map Finallythe network can detect and forecast the input image at twoscales

In addition to get a better network structure theclustering algorithm is also used and the effect of the col-lected data is fine-tuned and optimized for the ship imagevideo detection and classification Finally the obtainedanchor values are shown in Table 1 which predicts thefeature maps of 13times13 and 26times 26 scales setting 5 differentanchor frames on each scale

erefore for a 416times 416 size image the improvednetwork predicts a total of 4225 fixed prediction framescompared with YOLOv3 which has 9 anchor frames on 3scales and 10647 fixed prediction frames in total Obviouslythe number of anchor frames in the improved network isreduced by 6422 that is about 60

33 e Prediction Layer rough the prediction on theconvolution layer the spatial information can be well pre-served For the improved network the prediction method ofYOLOv2 is adopted in the prediction layer Each predictingframe predicts 7 ship categories and 5 frame information(tx ty tw th to) of which the first four parameters are thedetecting object coordinates and to is the predicting con-fidence In this paper the loss function of YOLOv2 is alsoused in the prediction layer

34 e Optimization of the Activation Function for the Im-proved RDCNN In order to optimize the influence of theactivation function combined with the network structureproposed in this paper the ELU and Leak-Relu activationfunctions of equations (11) and (12) are also used and testedexcept for the commonly used Relu

ELU(x) ex minus 1 if xle 0

x if xgt 01113896 (11)

Leak Relu(x) 01x if xle 0

x if xgt 01113896 (12)

4 Complexity

rough the experimental comparison the activationfunction with the best ship imagevideo detection andclassification effect can be optimized e results on thetesting-set are obtained which is shown in Table 2

In the experiment the Leaky-Relu activation functionhas the best comprehensive detection effect and is lessoperable than the Relu and ELU activation functions usthe Leaky-Relu is selected as the optimized activationfunction

4 The Making of Ship Dataset andExperimental Environment

41 e Making of Ship Dataset At present the populartarget-detection datasets are VOC and COCO but thesedatasets classify ships as only one kind In a specific ap-plication it often needs to classify ships more preciselyerefore in this research the dataset of ship images is builtafter collecting and labeling by ourselves

e main way to collect the ship images is the InternetAs the images are found from the Internet the pixels res-olution are different and the size of the images are alsodifferent such as 500times 400times 3 and 500times 318times 3e imagescontaining the ships are cut roughly according to the lengthto width ratio of 1 1 e scale of ship proportion to the

whole image in each image is also different even verydifferent which can be seen from Figures 3ndash5 of the databaseimages or the detected images ese naturally producedimages of different specification and quality are moreconducive to the training effect and generalization abilityBefore training they were all resized to 416times 416times 3 sizeimages

After the dataset is collected it needs to be labeled beforeusing as the network input e labeling tool used in thispaper is LabelIMG In the LabelIMG the target object can beselected in the image with a rectangle box and be saved witha label en a file with the suffix of xml can be got is filecontains the path name resolution of the original image aswell as the coordinates and name information of the targetobject in the image

ere are many types of ships in real application Inorder to facilitate research and save costs this paper onlycollects 7 representative types of ships the sailing shipcontainer ship yacht cruise ship ferry coast guard ship andfishing boat After filtering and classification the finaldataset size is 4200 manual-selected images which includes600 images in each categorye 480 images in each categoryare randomly selected as the training-set and eachremaining 120 images are set as the testing-set In this waythe total size of the training-set is 3360 images and the totalsize of the testing-set is 840 images e typical images ofeach category in the dataset are shown in Figure 3

42 e Experimental Environment Configuration e ex-perimental environment of this research is configured asfollows e CPU Intel i7-7700 with 42GHz main fre-quency the memory 16G the GPU two of Nvidia GTX1080Ti the operating system Ubantu 1604 In order to make fulluse of the GPU to accelerate the network training the

PredictionlayerFeature extraction layers

FPN layer

ConvolutionMax pooling Upsamping

416 times 416

208 times 208

10 times 104 52 times 5226 times 26 13 times 13

26 times 26

13 times 13

Reorg

Figure 1 e network structure of the improved RDCNN by promoting YOLOv2 and v3

Batchnormalization

Convolutional

Leaky-ReluConv2d

Figure 2 e addition of batch normalization into the Darknet-19and other convolution layers

Table 1 e anchor values of different scales proper for the shipimagevideo detection and classification

ID 13times13 26times 261 (134221 152115) (134221 152115)2 (342217 430406) (342217 430406)3 (505587 809892) (505587 809892)4 (976561 430316) (976561 430316)5 (1152521 111835) (1152521 111835)

Table 2 e experiment effects of different activation functions

Activation function AIOU mAP RecallRelu 07247 07932 09492ELU 07306 08005 09559Leaky-Relu 07329 07987 09616

Complexity 5

CUDA 90 and its matching CUDNN are installed in thesystem In addition the OpenCV34 is also installed in theenvironment to display the results of the network detectionand classification

During the experiment the setting of the experimentalparameters is very important ere are many parameters tobe set in our improved RDCNN and YOLOv2v3 such as thebatch number down sampling size momentum parameterand learning rate e setting of these parameters will affectnot only the normal operation of the network but also thetraining effect For example when the setting number of thebatch is too large the network will not run if the memory ofthe workstation is not big enough

Considering the conditions of our experimental envi-ronment and also for comparing convenience the sameparameters are set for the improved RDCNN and YOLOv2v3e network parameters are set as follows the number ofsmall batch is 64 and divided into 8 sub-batches the iterationnumber is 8000 the momentum parameter is 09 the weightattenuation is 00005 and the learning rate is 0001 whichare shown in the following Table 3

5 The Training and DetectionBased on YOLOv2v3

51e IterativeConvergenceTraining Generally whether anetwork meets the training requirements is judged by the

convergence of the loss function In this experiment due tothe small size of the dataset and sufficient computing abilitythe convergence with only 8000 times of iterations isachieved which takes about only 1 hour and 40minuteseLoss and AIOU curves of the feedforward training processare shown in Figures 6 and 7 respectively It can be seenfrom Figures 6 and 7 that the training has converged steadilywhen the number of the network training reaches 8000times

e training time of YOLOv3 is relatively long and ittakes about 3 hours and 40 minutes for 8000 times of it-eration convergence process e Loss and AIOU curves ofthe feedforward training process are shown in Figures 8 and9 respectively It can also be seen from Figures 8 and 9 thatafter 8000 times of iterative training the Loss and AIOU ofthe network also have converged steadily

Finally the weight parameters obtained through 8000network iterations in the feedforward training are saved inthe experiment

52 e Detection Performance Testing After the networktraining is stable it is necessary to verify its detection effecton the testing-set especially to avoid a decline of the de-tection effect caused by overfitting First the network in-dexes obtained with the weights of No 8000 iteration underthe testing-set are taken as the evaluation criteria especific values are shown in Table 4

Figure 3 e dataset image demonstration the coast guard ship container ship cruise ship ferry fishing boat sailboat and yacht

Figure 4 e representative detection results of the improved RDCNN network method

(a) (b) (c)

Figure 5 e comparison of detection effect among the method of YOLOv2 (a) YOLOv3 (b) and the improved RDCNN (c)

6 Complexity

As the network cannot measure its weight parameters inreal time under the training-set during its feedforwardrunning the network parameters generated in the Nos 400600 800 1000 2000 3000 4000 5000 6000 7000 and 8000training iterations are also taken here to load into thenetwork for a later test and verification In order to betteranalyze the detection effect of YOLOv2v3 in this task theAIOU and mAP parameters of the network are compared indifferent testing iterations under the testing-set which areshown in Figures 10 and 11

From theAIOU andmAP curves on the testing-set it canbeen seen that the performance indexes of the network onthe testing-set have been stable ere is also no overfitting

phenomenon caused by too many training times roughcomparison we can see that YOLOv3 as an improvedversion of YOLOv2 has advantages in the AIOU and mAPperformance indexes at is it has 00057 higher in theAIOU and 00115 higher in the mAP than that of YOLOv2However as the advantages of YOLOv3 are obtained bydeepening and improving its network structure its detectionspeed is 49 FPS lower than that of YOLOv2

6 The Experiment and Analysis of theImproved RDCNN

61 e Network Performance Experiment e improvedRDCNN takes 20 more minutes to complete the 8000training iterations convergence process compared withYOLOv2 However the training time is much lower thanthat of YOLOv3 e Loss and AIOU curves of the feed-forward training process are shown in Figures 12 and 13respectively It can also be seen from Figures 12 and 13 thatafter 8000 times of iterative training the Loss and AIOU ofthe improved network have been converged steadily

In order to verify the detection effect of the RDCNN onthe testing-set the network weight parameters generated inthe Nos 400 600 800 1000 2000 3000 4000 5000 60007000 and 8000 training iterations are taken here to load intothe improved network for a later test and verification enthe AIOU and mAP parameter curves under the testing-setare tested in different testing iterations of the network whichare shown in Figures 14 and 15 is paper applies the twoeditions of YOLO networks as well as the presented im-proved RDCNN based on YOLO into the ship imagevideodetection and classification us the comparing

Table 3 e same training parameters for our improved RDCNN and YOLOv2v3

Batch number Down sampling size Image size Momentum parameter Weight attenuation Learning rate Iteration number64 8 416times 416times 3 09 00005 0001 8000

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 6 e iterative convergence process of the Loss curve forYOLOv2 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 7 e iterative convergence process of the AIOU curve forYOLOv2 network training

0

5

10

0 2000 4000 6000 8000

Loss

Iteration

Figure 8 e iterative convergence process of the Loss curve forYOLOv3 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 9 e iterative convergence process of the AIOU curve forYOLOv3 network training

Table 4 e comparison of performance indexes betweenYOLOv2v3 after 8000th iteration

Network AIOU mAP Recall FPS Training timeYOLOv2 07838 09165 09425 94ndash95 1 h 40minYOLOv3 07895 09280 09674 45ndash46 3 h 40min

Complexity 7

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 4: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

As there are 7 targets detected in this paper the mAP isused to represent the network identification accuracy that isthe average value of AP

mAP

11139461

0PR dr1113888 1113889

N

(10)

where n represents the number of the predicted categoriesthat is 7

In addition in order to measure the network speed forvideo detection the frames per second (FPS) is also used as aperformance index

3 The Improved RDCNN Based on YOLOv2v3

is research presents an improved RDCNN mainly basedon the YOLO series which also refers to the advantages ofthe current popular regression deep convolution networksBy promoting the feature extraction layer of YOLOv2 andthe FPN of YOLOv3 the improved network overcomes thedetection shortcomings of YOLOv2 and the training andrecognition speed shortcomings of YOLOv3 e improvednetwork also redesigns the anchors with the clustering al-gorithm and optimizes the effects of the activation functionboth according to the ship imagevideo detection andclassification Finally this algorithm achieves a good accu-racy and real-time performance in the ship imagevideodetection and classification

e improved network structure built in this research isshown in Figure 1 is network structure mainly consists ofthree parts the feature extraction layer FPN layer andprediction layer which are specifically described below

31 e Lightweighted Feature Extraction Layer e featureextraction layer is very important in building the networkstructure If the feature extraction layer is too large it mayget better deep features but it will also slow down the speedof the whole network For example in YOLOv3 the darknet-53 is used as the feature extraction layeris extraction layeris relatively slow in training and detection speed due to thedeep layer numbers In order to improve the presentednetwork with a lightweight feature extraction layer first thisnetwork adopts the Darknet-19 feature extraction layer ofYOLOv2 and the structure is shown in the left of Figure 1is feature extraction layer has the advantage of relativelyfew network layers and faster calculation speed and can alsoextract deep features well when inputting a color ship imageor video of 416times 416times 3 size

In addition with the increase of the feature extractionlayer numbers the network generally can obtain deeperfeatures with a more expressive power However simplyincreasing the number of network layers will result in agradient dispersion or explosion phenomena In order tosolve this problem in the later experiment a batch nor-malization strategy is added between the convolution(Conv2d) and activation (Leaky-Relu) of each convolutionoperation in the Darknet-19 feature extraction layer which

is shown in Figure 2 is strategy can effectively control thegradient problem caused by the network deepening

32eNew FPN Layer with a Clustering Algorithm For thefeature extraction the feature information in shallow layer isrelatively small but its location is accurate is has theadvantage for predicting small objects On the contrary thefeature information in deep layer is rich but its location isrelatively rough is is suitable for predicting large objects

us in order to make the network obtain a betterdetection result the improved network promotes themultiscale prediction idea of YOLOv3 to design a new FPNlayer which is shown in the right of Figure 1 is methodup samples the deep feature map into 26times 26 size afterpredicting a deep feature map of 13times13 size from the featureextraction layer and then merges the upsampled 26times 26feature map with the shallow 26times 26 feature map Finallythe network can detect and forecast the input image at twoscales

In addition to get a better network structure theclustering algorithm is also used and the effect of the col-lected data is fine-tuned and optimized for the ship imagevideo detection and classification Finally the obtainedanchor values are shown in Table 1 which predicts thefeature maps of 13times13 and 26times 26 scales setting 5 differentanchor frames on each scale

erefore for a 416times 416 size image the improvednetwork predicts a total of 4225 fixed prediction framescompared with YOLOv3 which has 9 anchor frames on 3scales and 10647 fixed prediction frames in total Obviouslythe number of anchor frames in the improved network isreduced by 6422 that is about 60

33 e Prediction Layer rough the prediction on theconvolution layer the spatial information can be well pre-served For the improved network the prediction method ofYOLOv2 is adopted in the prediction layer Each predictingframe predicts 7 ship categories and 5 frame information(tx ty tw th to) of which the first four parameters are thedetecting object coordinates and to is the predicting con-fidence In this paper the loss function of YOLOv2 is alsoused in the prediction layer

34 e Optimization of the Activation Function for the Im-proved RDCNN In order to optimize the influence of theactivation function combined with the network structureproposed in this paper the ELU and Leak-Relu activationfunctions of equations (11) and (12) are also used and testedexcept for the commonly used Relu

ELU(x) ex minus 1 if xle 0

x if xgt 01113896 (11)

Leak Relu(x) 01x if xle 0

x if xgt 01113896 (12)

4 Complexity

rough the experimental comparison the activationfunction with the best ship imagevideo detection andclassification effect can be optimized e results on thetesting-set are obtained which is shown in Table 2

In the experiment the Leaky-Relu activation functionhas the best comprehensive detection effect and is lessoperable than the Relu and ELU activation functions usthe Leaky-Relu is selected as the optimized activationfunction

4 The Making of Ship Dataset andExperimental Environment

41 e Making of Ship Dataset At present the populartarget-detection datasets are VOC and COCO but thesedatasets classify ships as only one kind In a specific ap-plication it often needs to classify ships more preciselyerefore in this research the dataset of ship images is builtafter collecting and labeling by ourselves

e main way to collect the ship images is the InternetAs the images are found from the Internet the pixels res-olution are different and the size of the images are alsodifferent such as 500times 400times 3 and 500times 318times 3e imagescontaining the ships are cut roughly according to the lengthto width ratio of 1 1 e scale of ship proportion to the

whole image in each image is also different even verydifferent which can be seen from Figures 3ndash5 of the databaseimages or the detected images ese naturally producedimages of different specification and quality are moreconducive to the training effect and generalization abilityBefore training they were all resized to 416times 416times 3 sizeimages

After the dataset is collected it needs to be labeled beforeusing as the network input e labeling tool used in thispaper is LabelIMG In the LabelIMG the target object can beselected in the image with a rectangle box and be saved witha label en a file with the suffix of xml can be got is filecontains the path name resolution of the original image aswell as the coordinates and name information of the targetobject in the image

ere are many types of ships in real application Inorder to facilitate research and save costs this paper onlycollects 7 representative types of ships the sailing shipcontainer ship yacht cruise ship ferry coast guard ship andfishing boat After filtering and classification the finaldataset size is 4200 manual-selected images which includes600 images in each categorye 480 images in each categoryare randomly selected as the training-set and eachremaining 120 images are set as the testing-set In this waythe total size of the training-set is 3360 images and the totalsize of the testing-set is 840 images e typical images ofeach category in the dataset are shown in Figure 3

42 e Experimental Environment Configuration e ex-perimental environment of this research is configured asfollows e CPU Intel i7-7700 with 42GHz main fre-quency the memory 16G the GPU two of Nvidia GTX1080Ti the operating system Ubantu 1604 In order to make fulluse of the GPU to accelerate the network training the

PredictionlayerFeature extraction layers

FPN layer

ConvolutionMax pooling Upsamping

416 times 416

208 times 208

10 times 104 52 times 5226 times 26 13 times 13

26 times 26

13 times 13

Reorg

Figure 1 e network structure of the improved RDCNN by promoting YOLOv2 and v3

Batchnormalization

Convolutional

Leaky-ReluConv2d

Figure 2 e addition of batch normalization into the Darknet-19and other convolution layers

Table 1 e anchor values of different scales proper for the shipimagevideo detection and classification

ID 13times13 26times 261 (134221 152115) (134221 152115)2 (342217 430406) (342217 430406)3 (505587 809892) (505587 809892)4 (976561 430316) (976561 430316)5 (1152521 111835) (1152521 111835)

Table 2 e experiment effects of different activation functions

Activation function AIOU mAP RecallRelu 07247 07932 09492ELU 07306 08005 09559Leaky-Relu 07329 07987 09616

Complexity 5

CUDA 90 and its matching CUDNN are installed in thesystem In addition the OpenCV34 is also installed in theenvironment to display the results of the network detectionand classification

During the experiment the setting of the experimentalparameters is very important ere are many parameters tobe set in our improved RDCNN and YOLOv2v3 such as thebatch number down sampling size momentum parameterand learning rate e setting of these parameters will affectnot only the normal operation of the network but also thetraining effect For example when the setting number of thebatch is too large the network will not run if the memory ofthe workstation is not big enough

Considering the conditions of our experimental envi-ronment and also for comparing convenience the sameparameters are set for the improved RDCNN and YOLOv2v3e network parameters are set as follows the number ofsmall batch is 64 and divided into 8 sub-batches the iterationnumber is 8000 the momentum parameter is 09 the weightattenuation is 00005 and the learning rate is 0001 whichare shown in the following Table 3

5 The Training and DetectionBased on YOLOv2v3

51e IterativeConvergenceTraining Generally whether anetwork meets the training requirements is judged by the

convergence of the loss function In this experiment due tothe small size of the dataset and sufficient computing abilitythe convergence with only 8000 times of iterations isachieved which takes about only 1 hour and 40minuteseLoss and AIOU curves of the feedforward training processare shown in Figures 6 and 7 respectively It can be seenfrom Figures 6 and 7 that the training has converged steadilywhen the number of the network training reaches 8000times

e training time of YOLOv3 is relatively long and ittakes about 3 hours and 40 minutes for 8000 times of it-eration convergence process e Loss and AIOU curves ofthe feedforward training process are shown in Figures 8 and9 respectively It can also be seen from Figures 8 and 9 thatafter 8000 times of iterative training the Loss and AIOU ofthe network also have converged steadily

Finally the weight parameters obtained through 8000network iterations in the feedforward training are saved inthe experiment

52 e Detection Performance Testing After the networktraining is stable it is necessary to verify its detection effecton the testing-set especially to avoid a decline of the de-tection effect caused by overfitting First the network in-dexes obtained with the weights of No 8000 iteration underthe testing-set are taken as the evaluation criteria especific values are shown in Table 4

Figure 3 e dataset image demonstration the coast guard ship container ship cruise ship ferry fishing boat sailboat and yacht

Figure 4 e representative detection results of the improved RDCNN network method

(a) (b) (c)

Figure 5 e comparison of detection effect among the method of YOLOv2 (a) YOLOv3 (b) and the improved RDCNN (c)

6 Complexity

As the network cannot measure its weight parameters inreal time under the training-set during its feedforwardrunning the network parameters generated in the Nos 400600 800 1000 2000 3000 4000 5000 6000 7000 and 8000training iterations are also taken here to load into thenetwork for a later test and verification In order to betteranalyze the detection effect of YOLOv2v3 in this task theAIOU and mAP parameters of the network are compared indifferent testing iterations under the testing-set which areshown in Figures 10 and 11

From theAIOU andmAP curves on the testing-set it canbeen seen that the performance indexes of the network onthe testing-set have been stable ere is also no overfitting

phenomenon caused by too many training times roughcomparison we can see that YOLOv3 as an improvedversion of YOLOv2 has advantages in the AIOU and mAPperformance indexes at is it has 00057 higher in theAIOU and 00115 higher in the mAP than that of YOLOv2However as the advantages of YOLOv3 are obtained bydeepening and improving its network structure its detectionspeed is 49 FPS lower than that of YOLOv2

6 The Experiment and Analysis of theImproved RDCNN

61 e Network Performance Experiment e improvedRDCNN takes 20 more minutes to complete the 8000training iterations convergence process compared withYOLOv2 However the training time is much lower thanthat of YOLOv3 e Loss and AIOU curves of the feed-forward training process are shown in Figures 12 and 13respectively It can also be seen from Figures 12 and 13 thatafter 8000 times of iterative training the Loss and AIOU ofthe improved network have been converged steadily

In order to verify the detection effect of the RDCNN onthe testing-set the network weight parameters generated inthe Nos 400 600 800 1000 2000 3000 4000 5000 60007000 and 8000 training iterations are taken here to load intothe improved network for a later test and verification enthe AIOU and mAP parameter curves under the testing-setare tested in different testing iterations of the network whichare shown in Figures 14 and 15 is paper applies the twoeditions of YOLO networks as well as the presented im-proved RDCNN based on YOLO into the ship imagevideodetection and classification us the comparing

Table 3 e same training parameters for our improved RDCNN and YOLOv2v3

Batch number Down sampling size Image size Momentum parameter Weight attenuation Learning rate Iteration number64 8 416times 416times 3 09 00005 0001 8000

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 6 e iterative convergence process of the Loss curve forYOLOv2 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 7 e iterative convergence process of the AIOU curve forYOLOv2 network training

0

5

10

0 2000 4000 6000 8000

Loss

Iteration

Figure 8 e iterative convergence process of the Loss curve forYOLOv3 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 9 e iterative convergence process of the AIOU curve forYOLOv3 network training

Table 4 e comparison of performance indexes betweenYOLOv2v3 after 8000th iteration

Network AIOU mAP Recall FPS Training timeYOLOv2 07838 09165 09425 94ndash95 1 h 40minYOLOv3 07895 09280 09674 45ndash46 3 h 40min

Complexity 7

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 5: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

rough the experimental comparison the activationfunction with the best ship imagevideo detection andclassification effect can be optimized e results on thetesting-set are obtained which is shown in Table 2

In the experiment the Leaky-Relu activation functionhas the best comprehensive detection effect and is lessoperable than the Relu and ELU activation functions usthe Leaky-Relu is selected as the optimized activationfunction

4 The Making of Ship Dataset andExperimental Environment

41 e Making of Ship Dataset At present the populartarget-detection datasets are VOC and COCO but thesedatasets classify ships as only one kind In a specific ap-plication it often needs to classify ships more preciselyerefore in this research the dataset of ship images is builtafter collecting and labeling by ourselves

e main way to collect the ship images is the InternetAs the images are found from the Internet the pixels res-olution are different and the size of the images are alsodifferent such as 500times 400times 3 and 500times 318times 3e imagescontaining the ships are cut roughly according to the lengthto width ratio of 1 1 e scale of ship proportion to the

whole image in each image is also different even verydifferent which can be seen from Figures 3ndash5 of the databaseimages or the detected images ese naturally producedimages of different specification and quality are moreconducive to the training effect and generalization abilityBefore training they were all resized to 416times 416times 3 sizeimages

After the dataset is collected it needs to be labeled beforeusing as the network input e labeling tool used in thispaper is LabelIMG In the LabelIMG the target object can beselected in the image with a rectangle box and be saved witha label en a file with the suffix of xml can be got is filecontains the path name resolution of the original image aswell as the coordinates and name information of the targetobject in the image

ere are many types of ships in real application Inorder to facilitate research and save costs this paper onlycollects 7 representative types of ships the sailing shipcontainer ship yacht cruise ship ferry coast guard ship andfishing boat After filtering and classification the finaldataset size is 4200 manual-selected images which includes600 images in each categorye 480 images in each categoryare randomly selected as the training-set and eachremaining 120 images are set as the testing-set In this waythe total size of the training-set is 3360 images and the totalsize of the testing-set is 840 images e typical images ofeach category in the dataset are shown in Figure 3

42 e Experimental Environment Configuration e ex-perimental environment of this research is configured asfollows e CPU Intel i7-7700 with 42GHz main fre-quency the memory 16G the GPU two of Nvidia GTX1080Ti the operating system Ubantu 1604 In order to make fulluse of the GPU to accelerate the network training the

PredictionlayerFeature extraction layers

FPN layer

ConvolutionMax pooling Upsamping

416 times 416

208 times 208

10 times 104 52 times 5226 times 26 13 times 13

26 times 26

13 times 13

Reorg

Figure 1 e network structure of the improved RDCNN by promoting YOLOv2 and v3

Batchnormalization

Convolutional

Leaky-ReluConv2d

Figure 2 e addition of batch normalization into the Darknet-19and other convolution layers

Table 1 e anchor values of different scales proper for the shipimagevideo detection and classification

ID 13times13 26times 261 (134221 152115) (134221 152115)2 (342217 430406) (342217 430406)3 (505587 809892) (505587 809892)4 (976561 430316) (976561 430316)5 (1152521 111835) (1152521 111835)

Table 2 e experiment effects of different activation functions

Activation function AIOU mAP RecallRelu 07247 07932 09492ELU 07306 08005 09559Leaky-Relu 07329 07987 09616

Complexity 5

CUDA 90 and its matching CUDNN are installed in thesystem In addition the OpenCV34 is also installed in theenvironment to display the results of the network detectionand classification

During the experiment the setting of the experimentalparameters is very important ere are many parameters tobe set in our improved RDCNN and YOLOv2v3 such as thebatch number down sampling size momentum parameterand learning rate e setting of these parameters will affectnot only the normal operation of the network but also thetraining effect For example when the setting number of thebatch is too large the network will not run if the memory ofthe workstation is not big enough

Considering the conditions of our experimental envi-ronment and also for comparing convenience the sameparameters are set for the improved RDCNN and YOLOv2v3e network parameters are set as follows the number ofsmall batch is 64 and divided into 8 sub-batches the iterationnumber is 8000 the momentum parameter is 09 the weightattenuation is 00005 and the learning rate is 0001 whichare shown in the following Table 3

5 The Training and DetectionBased on YOLOv2v3

51e IterativeConvergenceTraining Generally whether anetwork meets the training requirements is judged by the

convergence of the loss function In this experiment due tothe small size of the dataset and sufficient computing abilitythe convergence with only 8000 times of iterations isachieved which takes about only 1 hour and 40minuteseLoss and AIOU curves of the feedforward training processare shown in Figures 6 and 7 respectively It can be seenfrom Figures 6 and 7 that the training has converged steadilywhen the number of the network training reaches 8000times

e training time of YOLOv3 is relatively long and ittakes about 3 hours and 40 minutes for 8000 times of it-eration convergence process e Loss and AIOU curves ofthe feedforward training process are shown in Figures 8 and9 respectively It can also be seen from Figures 8 and 9 thatafter 8000 times of iterative training the Loss and AIOU ofthe network also have converged steadily

Finally the weight parameters obtained through 8000network iterations in the feedforward training are saved inthe experiment

52 e Detection Performance Testing After the networktraining is stable it is necessary to verify its detection effecton the testing-set especially to avoid a decline of the de-tection effect caused by overfitting First the network in-dexes obtained with the weights of No 8000 iteration underthe testing-set are taken as the evaluation criteria especific values are shown in Table 4

Figure 3 e dataset image demonstration the coast guard ship container ship cruise ship ferry fishing boat sailboat and yacht

Figure 4 e representative detection results of the improved RDCNN network method

(a) (b) (c)

Figure 5 e comparison of detection effect among the method of YOLOv2 (a) YOLOv3 (b) and the improved RDCNN (c)

6 Complexity

As the network cannot measure its weight parameters inreal time under the training-set during its feedforwardrunning the network parameters generated in the Nos 400600 800 1000 2000 3000 4000 5000 6000 7000 and 8000training iterations are also taken here to load into thenetwork for a later test and verification In order to betteranalyze the detection effect of YOLOv2v3 in this task theAIOU and mAP parameters of the network are compared indifferent testing iterations under the testing-set which areshown in Figures 10 and 11

From theAIOU andmAP curves on the testing-set it canbeen seen that the performance indexes of the network onthe testing-set have been stable ere is also no overfitting

phenomenon caused by too many training times roughcomparison we can see that YOLOv3 as an improvedversion of YOLOv2 has advantages in the AIOU and mAPperformance indexes at is it has 00057 higher in theAIOU and 00115 higher in the mAP than that of YOLOv2However as the advantages of YOLOv3 are obtained bydeepening and improving its network structure its detectionspeed is 49 FPS lower than that of YOLOv2

6 The Experiment and Analysis of theImproved RDCNN

61 e Network Performance Experiment e improvedRDCNN takes 20 more minutes to complete the 8000training iterations convergence process compared withYOLOv2 However the training time is much lower thanthat of YOLOv3 e Loss and AIOU curves of the feed-forward training process are shown in Figures 12 and 13respectively It can also be seen from Figures 12 and 13 thatafter 8000 times of iterative training the Loss and AIOU ofthe improved network have been converged steadily

In order to verify the detection effect of the RDCNN onthe testing-set the network weight parameters generated inthe Nos 400 600 800 1000 2000 3000 4000 5000 60007000 and 8000 training iterations are taken here to load intothe improved network for a later test and verification enthe AIOU and mAP parameter curves under the testing-setare tested in different testing iterations of the network whichare shown in Figures 14 and 15 is paper applies the twoeditions of YOLO networks as well as the presented im-proved RDCNN based on YOLO into the ship imagevideodetection and classification us the comparing

Table 3 e same training parameters for our improved RDCNN and YOLOv2v3

Batch number Down sampling size Image size Momentum parameter Weight attenuation Learning rate Iteration number64 8 416times 416times 3 09 00005 0001 8000

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 6 e iterative convergence process of the Loss curve forYOLOv2 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 7 e iterative convergence process of the AIOU curve forYOLOv2 network training

0

5

10

0 2000 4000 6000 8000

Loss

Iteration

Figure 8 e iterative convergence process of the Loss curve forYOLOv3 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 9 e iterative convergence process of the AIOU curve forYOLOv3 network training

Table 4 e comparison of performance indexes betweenYOLOv2v3 after 8000th iteration

Network AIOU mAP Recall FPS Training timeYOLOv2 07838 09165 09425 94ndash95 1 h 40minYOLOv3 07895 09280 09674 45ndash46 3 h 40min

Complexity 7

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 6: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

CUDA 90 and its matching CUDNN are installed in thesystem In addition the OpenCV34 is also installed in theenvironment to display the results of the network detectionand classification

During the experiment the setting of the experimentalparameters is very important ere are many parameters tobe set in our improved RDCNN and YOLOv2v3 such as thebatch number down sampling size momentum parameterand learning rate e setting of these parameters will affectnot only the normal operation of the network but also thetraining effect For example when the setting number of thebatch is too large the network will not run if the memory ofthe workstation is not big enough

Considering the conditions of our experimental envi-ronment and also for comparing convenience the sameparameters are set for the improved RDCNN and YOLOv2v3e network parameters are set as follows the number ofsmall batch is 64 and divided into 8 sub-batches the iterationnumber is 8000 the momentum parameter is 09 the weightattenuation is 00005 and the learning rate is 0001 whichare shown in the following Table 3

5 The Training and DetectionBased on YOLOv2v3

51e IterativeConvergenceTraining Generally whether anetwork meets the training requirements is judged by the

convergence of the loss function In this experiment due tothe small size of the dataset and sufficient computing abilitythe convergence with only 8000 times of iterations isachieved which takes about only 1 hour and 40minuteseLoss and AIOU curves of the feedforward training processare shown in Figures 6 and 7 respectively It can be seenfrom Figures 6 and 7 that the training has converged steadilywhen the number of the network training reaches 8000times

e training time of YOLOv3 is relatively long and ittakes about 3 hours and 40 minutes for 8000 times of it-eration convergence process e Loss and AIOU curves ofthe feedforward training process are shown in Figures 8 and9 respectively It can also be seen from Figures 8 and 9 thatafter 8000 times of iterative training the Loss and AIOU ofthe network also have converged steadily

Finally the weight parameters obtained through 8000network iterations in the feedforward training are saved inthe experiment

52 e Detection Performance Testing After the networktraining is stable it is necessary to verify its detection effecton the testing-set especially to avoid a decline of the de-tection effect caused by overfitting First the network in-dexes obtained with the weights of No 8000 iteration underthe testing-set are taken as the evaluation criteria especific values are shown in Table 4

Figure 3 e dataset image demonstration the coast guard ship container ship cruise ship ferry fishing boat sailboat and yacht

Figure 4 e representative detection results of the improved RDCNN network method

(a) (b) (c)

Figure 5 e comparison of detection effect among the method of YOLOv2 (a) YOLOv3 (b) and the improved RDCNN (c)

6 Complexity

As the network cannot measure its weight parameters inreal time under the training-set during its feedforwardrunning the network parameters generated in the Nos 400600 800 1000 2000 3000 4000 5000 6000 7000 and 8000training iterations are also taken here to load into thenetwork for a later test and verification In order to betteranalyze the detection effect of YOLOv2v3 in this task theAIOU and mAP parameters of the network are compared indifferent testing iterations under the testing-set which areshown in Figures 10 and 11

From theAIOU andmAP curves on the testing-set it canbeen seen that the performance indexes of the network onthe testing-set have been stable ere is also no overfitting

phenomenon caused by too many training times roughcomparison we can see that YOLOv3 as an improvedversion of YOLOv2 has advantages in the AIOU and mAPperformance indexes at is it has 00057 higher in theAIOU and 00115 higher in the mAP than that of YOLOv2However as the advantages of YOLOv3 are obtained bydeepening and improving its network structure its detectionspeed is 49 FPS lower than that of YOLOv2

6 The Experiment and Analysis of theImproved RDCNN

61 e Network Performance Experiment e improvedRDCNN takes 20 more minutes to complete the 8000training iterations convergence process compared withYOLOv2 However the training time is much lower thanthat of YOLOv3 e Loss and AIOU curves of the feed-forward training process are shown in Figures 12 and 13respectively It can also be seen from Figures 12 and 13 thatafter 8000 times of iterative training the Loss and AIOU ofthe improved network have been converged steadily

In order to verify the detection effect of the RDCNN onthe testing-set the network weight parameters generated inthe Nos 400 600 800 1000 2000 3000 4000 5000 60007000 and 8000 training iterations are taken here to load intothe improved network for a later test and verification enthe AIOU and mAP parameter curves under the testing-setare tested in different testing iterations of the network whichare shown in Figures 14 and 15 is paper applies the twoeditions of YOLO networks as well as the presented im-proved RDCNN based on YOLO into the ship imagevideodetection and classification us the comparing

Table 3 e same training parameters for our improved RDCNN and YOLOv2v3

Batch number Down sampling size Image size Momentum parameter Weight attenuation Learning rate Iteration number64 8 416times 416times 3 09 00005 0001 8000

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 6 e iterative convergence process of the Loss curve forYOLOv2 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 7 e iterative convergence process of the AIOU curve forYOLOv2 network training

0

5

10

0 2000 4000 6000 8000

Loss

Iteration

Figure 8 e iterative convergence process of the Loss curve forYOLOv3 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 9 e iterative convergence process of the AIOU curve forYOLOv3 network training

Table 4 e comparison of performance indexes betweenYOLOv2v3 after 8000th iteration

Network AIOU mAP Recall FPS Training timeYOLOv2 07838 09165 09425 94ndash95 1 h 40minYOLOv3 07895 09280 09674 45ndash46 3 h 40min

Complexity 7

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 7: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

As the network cannot measure its weight parameters inreal time under the training-set during its feedforwardrunning the network parameters generated in the Nos 400600 800 1000 2000 3000 4000 5000 6000 7000 and 8000training iterations are also taken here to load into thenetwork for a later test and verification In order to betteranalyze the detection effect of YOLOv2v3 in this task theAIOU and mAP parameters of the network are compared indifferent testing iterations under the testing-set which areshown in Figures 10 and 11

From theAIOU andmAP curves on the testing-set it canbeen seen that the performance indexes of the network onthe testing-set have been stable ere is also no overfitting

phenomenon caused by too many training times roughcomparison we can see that YOLOv3 as an improvedversion of YOLOv2 has advantages in the AIOU and mAPperformance indexes at is it has 00057 higher in theAIOU and 00115 higher in the mAP than that of YOLOv2However as the advantages of YOLOv3 are obtained bydeepening and improving its network structure its detectionspeed is 49 FPS lower than that of YOLOv2

6 The Experiment and Analysis of theImproved RDCNN

61 e Network Performance Experiment e improvedRDCNN takes 20 more minutes to complete the 8000training iterations convergence process compared withYOLOv2 However the training time is much lower thanthat of YOLOv3 e Loss and AIOU curves of the feed-forward training process are shown in Figures 12 and 13respectively It can also be seen from Figures 12 and 13 thatafter 8000 times of iterative training the Loss and AIOU ofthe improved network have been converged steadily

In order to verify the detection effect of the RDCNN onthe testing-set the network weight parameters generated inthe Nos 400 600 800 1000 2000 3000 4000 5000 60007000 and 8000 training iterations are taken here to load intothe improved network for a later test and verification enthe AIOU and mAP parameter curves under the testing-setare tested in different testing iterations of the network whichare shown in Figures 14 and 15 is paper applies the twoeditions of YOLO networks as well as the presented im-proved RDCNN based on YOLO into the ship imagevideodetection and classification us the comparing

Table 3 e same training parameters for our improved RDCNN and YOLOv2v3

Batch number Down sampling size Image size Momentum parameter Weight attenuation Learning rate Iteration number64 8 416times 416times 3 09 00005 0001 8000

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 6 e iterative convergence process of the Loss curve forYOLOv2 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 7 e iterative convergence process of the AIOU curve forYOLOv2 network training

0

5

10

0 2000 4000 6000 8000

Loss

Iteration

Figure 8 e iterative convergence process of the Loss curve forYOLOv3 network training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

U

Iteration

Figure 9 e iterative convergence process of the AIOU curve forYOLOv3 network training

Table 4 e comparison of performance indexes betweenYOLOv2v3 after 8000th iteration

Network AIOU mAP Recall FPS Training timeYOLOv2 07838 09165 09425 94ndash95 1 h 40minYOLOv3 07895 09280 09674 45ndash46 3 h 40min

Complexity 7

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 8: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

performance of the AIOU and mAP are also show in Fig-ures 14 and 15 for YOLOv2v3 and the improved RDCNNnetwork structuree comparison of the evaluation indexesof each network is also shown in Figure 16

According to the comparisons it can be seen that theimproved RDCNN network has surpassed YOLOv2 andYOLOv3 in the AIOU detection of positioning accuracyat is it is 00153 higher than that of YOLOv2 and 00096higher than that of YOLOv3 respectively in AIOU Inaddition the improved network is 00044 higher thanYOLOv2 in the mAP index Due to the simplified networkstructure the mAP index of the improved network is 00071lower than that of YOLOv3 but the detecting FPS index is 33higher than that of YOLOv3 erefore it can be concludedthat the overall effect of the improved network is better thanthat of YOLOv2v3 in the collected dataset of thisexperiment

erefore the experimental results show that the im-proved RDCNN network structure designed in this papersurpasses the two YOLO networks in three evaluationindexes

62 e Effect Demonstration of the Improved NetworkFor the testing-set the representative detection results of theimproved RDCNN network are shown in Figure 15 In orderto achieve a better network effect the weight parameters ofthe feature extraction layer extracted in the ImageNet [33]pretraining are loaded to train the improved RDCNN of this

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

IterationYOLOv2YOLOv3

Figure 10 e comparison of AIOU index between YOLOv2v3under the testing-set

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3

Figure 11 e comparison of mAP index between YOLOv2v3under the testing-set

0

10

20

0 2000 4000 6000 8000

Loss

Iteration

Figure 12 e iterative convergence process of the Loss curve forthe improved RDCNN training

0

02

04

06

08

1

0 2000 4000 6000 8000

AIO

UIteration

Figure 13e iterative convergence process of the AIOU curve forthe improved RDCNN training

05

06

07

08

09

1

0 2000 4000 6000 8000

AIO

U

Iteration

YOLOv2YOLOv3The improved

Figure 14 e comparison of AIOU among YOLOv2v3 and theimproved RDCNN under the testing-set

8 Complexity

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 9: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

paper rough the test on the testing-set and the video thefinal results are shown in Table 5

It can be seen that the mAP index of the improvedRDCNN is slightly lower than that of YOLOv3 when usingthe pretraining weights However the other indicators are allbetter than that of YOLOv3 especially in the video detectionspeed of FPS

In order to better display the comparison of the networkeffects the YOLOv2v3 and improved RDCNN are used todetect a image with multiple fishing boats e represen-tative results of the detection effect of the three networks areshown in Figure 16 In this paper the improved networkaccurately detects more ships Obviously the presented

network in this paper has achieved a better result which fullyproves the effectiveness of the improved RDCNN network

63 Comparison with Other Intelligent Detection and Clas-sificationMethods e proposed method is also comparedwith other intelligent methods such as Fast RndashCNNFaster RndashCNN and SSD or compared with YOLOv2under different dataset image and hardware configura-tion e work in the early published IEEE Trans paper[32] is very similar to this paper then its experimentresults can be used for the comparison e comparingresults are shown in Table 6 e proposed method hasadvantage over other intelligent methods in precision and

07

08

09

1

1000 2000 3000 4000 5000 6000 7000 8000

MA

P

Iteration

YOLOv2YOLOv3The improved

Figure 15 e comparison of mAP among YOLOv2v3 and the improved RDCNN under the testing-set

0732907987

09616

0709207922

09156

07087 07525

09425

0010203040506070809

1

AIOU mAP Recall

Method in this paperYOLOv3YOLOv2

Figure 16 e comparison of the network performance indexes among YOLOv2v3 and the improved RDCNN

Table 5 e loading test results of the pretraining weights for the improved RDCNN

Network AIOU mAP Recall FPS Training timeProposed method 07991 09209 09818 78ndash80 2 h 0min

Complexity 9

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 10: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

speed that is mAP and FPS and it can also satisfy thedetection and classification requirement in video sceneHowever our dataset size is smaller than that of Shaorsquoswork and our hardware configuration is also weaker thanthat of Shaorsquos work

7 Discussion and Conclusions

In this paper the improved RDCNN network is presented toachieve the ship imagevideo detection and classificationtask is network does not need to extract features man-ually which improves the regressive CNN from four aspectsbased on the advantages of the current popular regressiondeep convolution networks especially YOLOv2v3 usthis network only needs the dataset of the ship images and asuccessful training

is paper makes a self-built dataset for the ship imagevideo detection and classification and the method based onan improved regressive deep CNN is researched e featureextraction layer is lightweighted A new FPN layer isredesigned A proper anchor frame and size suitable for theships are redesigned which reduces the number of anchorsby 60 compared with YOLOv3 e activation function isalso optimized with the Leaky-Relu After a successfultraining the method can complete the ship image detectiontask and can also be applied to the video detection After8000 times of iterative training the Loss and AIOU of theimproved RDCNN network have been converged steadily

e experiment on 7 types of ships shows that theproposed method is better in the ship imagevideo detectionand classification compared with the YOLO series networkse improved RDCNN network has surpassed YOLOv2v3in the AIOU detection of positioning accuracy at is itrsquos00153 higher than that of YOLOv2 and 00096 higher thanthat of YOLOv3 respectively in AIOU In addition theimproved network is 00044 higher than YOLOv2 in themAP index Due to the simplified network structure themAP index of the improved network is 00071 lower thanthat of YOLOv3 but the detecting FPS index is 33 higherthan that of YOLOv3erefore it can be concluded that theoverall effect of the improved network is better than that ofYOLOv2v3 in the collected dataset of this experiment

en this method can solve the problem of low rec-ognition rate and real-time performance for ship imagevideo detection and classification us this method pro-vides a highly accurate and real-time ship detection methodfor the intelligent port management and visual processing ofthe USV In addition the proposed regressive deep

convolutional network also has a better comprehensiveperformance than YOLOv2v3

e proposed method is also compared with FastRndashCNN Faster RndashCNN SSD or YOLOv2 etc under dif-ferent datasets and hardware configurations e resultsshow that the method has advantage in precision and speedand it can also satisfy the video scene However our datasetsize is smaller us the detection in a much larger datasetcan be the future work

Data Availability

e [SELF-BUILT SHIP DATASET and SIMULATION]data used to support the findings of this study are availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the NSFC Projects of Chinaunder Grant No 61403250 No 51779136 No 51509151

References

[1] S Fefilatyev D Goldgof M Shreve and C Lembke ldquoDe-tection and tracking of ships in open sea with rapidly movingbuoy-mounted camera systemrdquo Ocean Engineering vol 54no 1 pp 1ndash12 2012

[2] W T Chen K F Ji X W Xing H X Zou and H Sun ldquoShiprecognition in high resolution SAR imagery based on featureselectionrdquo in Proceedings of the IEEE International Conferenceon Computer Vision in Remote Sensing pp 301ndash305 XiamenChina December 2013

[3] G K Yuksel B Yalıtuna F Tartar O F C Adlı K Eker andO Yoruk ldquoShip recognition and classification using silhou-ettes extracted from optical imagesrdquo in Proceedings of theIEEE Signal Processing and Communication ApplicationConference pp 1617ndash1620 Zonguldak Turkey May 2016

[4] S Li Z Zhou B Wang and F Wu ldquoA novel inshore shipdetection via ship head classification and body boundarydeterminationrdquo IEEE Geoscience and Remote Sensing Lettersvol 13 no 12 pp 1920ndash1924 2016

[5] Y Zhang Q-Z Li and F-N Zang ldquoShip detection for visualmaritime surveillance from non-stationary platformsrdquo OceanEngineering vol 141 pp 53ndash63 2017

[6] K Eldhuset ldquoAn automatic ship and ship wake detectionsystem for spaceborne SAR images in coastal regionsrdquo IEEE

Table 6 e comparison with other intelligent detection and classification methods

Methods Ship category Dataset image Hardware Configuration IOU threshold mAP FPSFast R-CNN (VGG) 6 11126 Four titan xp gt05 0710 05Faster R-CNN (ZF) 6 11126 Four titan xp gt05 0892 15Faster R-CNN (VGG) 6 11126 Four titan xp gt05 0901 6SSD 6 11126 Four titan xp gt05 0794 7YOLOv2 6 11126 Four titan xp gt05 0830 83Shaorsquos method 6 11126 Four titan xp gt05 0874 49Proposed method 7 4200 Two GTX1080 Ti gt05 09209 78ndash80

10 Complexity

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11

Page 11: An Intelligent Ship Image/Video Detection and Classification …downloads.hindawi.com/journals/complexity/2020/1520872.pdf · 2020. 4. 9. · rough the experimental comparison, the

Transactions on Geoscience and Remote Sensing vol 34 no 4pp 1010ndash1019 1996

[7] H J Zhou X Y Li X H Peng Z ZWang and T Bo ldquoDetectship targets from satellite SAR imageryrdquo Journal of NationalUniversity of Defense Technology vol 21 no 1 pp 67ndash701999

[8] M T Rey A Drosopoulos and D Petrovic ldquoA searchprocedure for ships in radarsat imageryrdquo Defence ResearchEstablishment Ottawa Ottawa ON Canada Report No 13052013

[9] X Li and S Li ldquoe ship edge feature detection based on highand low threshold for remote sensing imagerdquo in Proceedingsof the 6th International Conference on Computer-Aided De-sign Manufacturing Modeling and Simulation Busan SouthKorea May 2018

[10] L C Yann B Leon B Yoshua and H Patrick ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[11] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of the International Conference on Neural Infor-mation Processing Systems Curran Associates Inc LakeTahoe NV USA pp 1097ndash1105 2012

[12] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2015 httpsarxivorgabs14091556

[13] C Szegedy W Liu Y Jia et al ldquoGoing deeper with convo-lutionsrdquo pp 1ndash9 2014 httpsarxivorgabs14094842

[14] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778IEEE Computer Society Las Vegas NV USA June 2016

[15] G Huang Z Liu K Q Weinberger and L van der MaatenldquoDensely connected convolutional networksrdquo pp 2261ndash22692017 httpsarxivorgabs160806993

[16] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo pp 580ndash587 2013 httpsarxivorgabs13112524

[17] R Girshick ldquoFast R-CNNrdquo 2015 httpsarxivorgabs150408083

[18] S Ren K He R Girshick and J Sun ldquoFaster R-CNN To-wards real-time object detection with region proposal net-worksrdquo IEEE Transactions on Pattern Analysis amp MachineIntelligence vol 39 no 6 pp 1137ndash1149 2015

[19] W Liu D Anguelov D Erhan C Szegedy and S Reed ldquoSSDsingle shot MultiBox detectorrdquo pp 21ndash37 2016 httpsarxivorgabs151202325

[20] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 779ndash788 IEEE Las Vegas NV USAJune 2016

[21] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 6517ndash6525 IEEE Hon-olulu HI USA July 2017

[22] J Redmon and A Farhadi ldquoYOLOv3 an incremental im-provementrdquo 2018 httpsarxivorgabs180402767

[23] M Kang K Ji X Leng and Z Lin ldquoContextual region-basedconvolutional neural network with multilayer fusion for sarship detectionrdquo Remote Sensing vol 9 no 860 pp 1ndash14 2017

[24] R Wang J Li Y Duan H Cao and Y Zhao ldquoStudy on thecombined application of CFAR and deep learning in ship

detectionrdquo Journal of the Indian Society of Remote Sensingvol 4 pp 1ndash9 2018

[25] Q Li L Mou Q Liu Y Wang and X X Zhu ldquoHSF-netmultiscale deep feature embedding for ship detection inoptical remote sensing imageryrdquo IEEE Transactions onGeoscience and Remote Sensing vol 56 no 12 pp 7147ndash71612018

[26] X Yang H Sun K Fu et al ldquoAutomatic ship detection inremote sensing images from Google Earth of complex scenesbased on multiscale rotation dense feature pyramid net-worksrdquo Remote Sensing vol 10 no 1 pp 132ndash145 2018

[27] L Gao Y He X Sun Xi Jia and B Zhang ldquoIncorporatingnegative sample training for ship detection based on deeplearningrdquo Sensors vol 19 no 684 pp 1ndash20 2019

[28] Z Lin K Ji X Leng and G Kuang ldquoSqueeze and excitationrank faster R-CNN for ship detection in SAR imagesrdquo IEEEGeoscience and Remote Sensing Letters vol 16 no 5pp 751ndash755 2019

[29] L Zhao X F Wang and Y T Yuan ldquoResearch on shiprecognition method based on deep convolutional neuralnetworkrdquo Ship Science and Technology vol 38 no 8pp 119ndash123 2016

[30] M Yang Y D Ruan L K Chen P Zhang and Q M ChenldquoNew video recognition algorithms for inland river shipsbased on faster R-CNNrdquo Journal of Beijing University of Postsand Telecommunications vol 40 no S1 pp 130ndash134 2017

[31] Z Shao W Wu Z Wang W Du and C Li ldquoSeaShips alarge-scale precisely annotated dataset for ship detectionrdquoIEEE Transactions on Multimedia vol 20 no 10 pp 2593ndash2604 2018

[32] Z Shao L Wang Z Wang W Du and W Wu ldquoSaliency-aware convolution neural network for ship detection insurveillance videordquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 30 no 3 pp 781ndash794 2019

[33] O Russakovsky J Deng H Su et al ldquoImageNet large scalevisual recognition challengerdquo International Journal of Com-puter Vision vol 115 no 3 pp 211ndash252 2015

Complexity 11