arxiv:2006.11729v1 [cs.cv] 21 jun 2020 · in 1968, brown arxiv:2006.11729v1 [cs.cv] 21 jun 2020. et...

Kiwifruit detection in challenging conditions

Mahla Nejati1∗, Nicky Penhall1, Henry Williams1, Jamie Bell1,JongYoon Lim1, Ho Seok Ahn1, Bruce MacDonald1

1The Centre for Automation and Robotic Engineering Science,the University of Auckland, Auckland, New Zealand

∗[email protected]

Abstract

Accurate and reliable kiwifruit detection is oneof the biggest challenges in developing a se-lective fruit harvesting robot. The vision sys-tem of an orchard robot faces difficulties suchas dynamic lighting conditions and fruit occlu-sions. This paper presents a semantic segmen-tation approach with two novel image prepos-sessing techniques designed to detect kiwifruitunder the harsh lighting conditions found inthe canopy. The performance of the presentedsystem is evaluated on a 3D real-world imageset of kiwifruit under different lighting condi-tions (typical, glare, and overexposed). Alonethe semantic segmentation approach achievesan F1score of 0.82 on the typical lighting im-age set, but struggles with harsh lighting withan F1score of 0.13. Utilising the prepossessingtechniques the vision system under harsh light-ing improves to an F1score 0.42. To address thefruit occlusion challenge, the overall approachwas found to be capable of detecting 87.0% ofnon-occluded and 30.0% of occluded kiwifruitacross all lighting conditions.

1 Introduction

Kiwifruit growers face shortages of seasonal workers, andthe high cost of human labour is a major impediment tothis target. In addition, picking kiwifruit is a tedious andrepetitive job, and workers need to carry a heavy pickingbag which increases the likelihood of back strain andmusculoskeletal problems. From the perspective of theorchard owner, they need to train workers for picking,and health and safety in the orchard. To prevent hazardsin the orchard, they must inspect orchards frequently[Safety, 2016] which adds additional costs.

Given the above problems, the kiwifruit industrywould benefit from automation, especially around theharvesting. An autonomous kiwifruit harvesting robot

would: compensate for the lack of workers, reduce labourcosts, and increase fruit quality. The work presentedhere is a part of a wider project to develop robotsthat drive through kiwifruit orchards, while pollinat-ing and harvesting kiwifruit [Williams et al., 2019a;Williams et al., 2019b; Williams et al., 2019c; Barnettet al., 2017].

Kiwifruit is a delicate fruit and needs to be picked gen-tly. The kiwifruit orchard has a pergola structure whichkiwifruit grows downward. An example of kiwifruit or-chard is shown in figure 1. Bulk harvesting methods,like shakers, are not suitable if kiwifruit is to be sold infresh markets as they may cause bruising [Li et al., 2011].Therefore, a selective fruit harvesting robot is required.

Figure 1: A kiwifruit orchard

Automatic fruit detection is a key component in a se-lective fruit harvesting robot. Working on automatic de-tection of fruit started from 1968 [Brown and K, 1968],despite this, no commercial selective harvesting robot isavailable yet, and the performance is insufficient to com-pete with manual labours.

The main reasons for the lack of commercial harvest-ing robots and poor performance are due to the low accu-racy of fruit recognition methods and the inefficiency oflocalization methods [Zhao et al., 2016]. In 1968, Brown

arX

iv:2

006.

1172

9v1

[cs

.CV

] 2

1 Ju

n 20

20

et al. [Brown and K, 1968] reported fruit/flower occlu-sion and uneven illumination conditions as critical prob-lems. Still, after 50 years, many researchers today citethese as major challenges to produce a commercially vi-able autonomous kiwifruit harvester.

The aim of the paper is presenting the performance ofkiwifruit detection using convolutional neural networkin unpredictable lighting conditions and propose a pre-possessing to improve the performance. According todetection challenges, two datasets including 295 imageswere collected to evaluate the performance of proposeddetection method which are kiwifruit occlusion and vari-able lighting conditions. The remainder of the paper isorganised as follow: section 2 describes the related workfor kiwifruit detection and solutions to overcome variousenvironmental conditions. In section 3, the detectionapproach is presented, which uses a fully convolutionalneural network with a proposed preprocessing method.The experimental setup and our findings are presentedin section 4. Conclusions are drawn in section 5 withfuture directions.

2 Related Work

Existing methods employed for fruit detection are clas-sified into two separate groups: hand-engineered fea-ture extraction, and machine learning methods. Popu-lar hand-engineered features are: colour, shape, textureand fusion of features [Gongal et al., 2015]. The use ofa threshold value on a hand-engineered feature is oneof the basic approaches for identifying a piece of fruit.Fu et al. [Fu et al., 2015] applied Otsu’s method to ki-wifruit images, on the R-G colour space. The result wasa binary image where morphological operations are usedto reduce the noise below a given detection threshold.Via this binary image, edge detection and Circle HoughTransform (CHT) method were shown to be useful forextracting ellipses that indicated kiwifruit in the image.The author showed, during evaluation, that at the pixel-level the detection rate was 88% correct using imageswith 12 different illumination levels. In subsequent work,the authors [Fu et al., 2017] tried to detect the calyx ofkiwifruit. After finding the boundary of the kiwifruit(using the above methods), the calyx was extracted us-ing the Otsu’s method based on the V channel in HSVcolour space. Correct detection rate based on pixel-wiseevaluation in different illumination levels was reportedas 94%. Despite the high detection rate, the proposedmethod depends on the cluster size and calyx location inthe fruit. Both studies captured kiwifruit data at nighttime. The moisture on the skin of the kiwifruit causessome challenges in post-processing harvesting proceduredue to this issue kiwifruit were not harvested at nighttime. Therefore, the proposed kiwifruit detection shouldbe able to work in the daytime and under uneven lighting

conditions.

The problem with the hand-engineered feature basedtechniques is that they can work reliably only under con-trolled conditions. In contrast, learning methods arerobust to dynamic conditions such as lighting, cameraviewpoint, fruit occlusion and fruit size. Wijethungaet al. [Wijethunga et al., 2009] used a Self-OrganizingMap (SOM) model to detect kiwifruit. The normalisedpixel data in L*a*b* colour space was used for trainingthe model, and the author concluded that its perfor-mance was better compared to using the original data -although the author has not quantified the result.

Zhan et al. [Zhan, W., He, D., & Shi, 2013] applied sixweak classifiers on RGB, HSI and La*b* colour spacesby using an Adaboost algorithm to detect kiwifruit. Theauthor evaluated the method on 208 kiwifruit pixels and477 background pixels. It was reported that there was97% accuracy at the pixel-level, with a 7% false detec-tion rate. The performance of the proposed algorithmis quite high. However, it was tested on a small datasetand needed some post-processing to discover the fruit.Moreover, one of the important indicators to evaluate adetection method for a real-time robot is the processingtime which was not revealed.

Regarding the variations in illumination conditions,some researchers have attempted to solve this problemby restricting the environment or preprocessing the data.Bargoti et al. [Bargoti and Underwood, 2016] suggestedadding the azimuth of the sun as input data to a mul-tilayer perceptron. The author showed that the perfor-mance improved negligibly when the angle of the sun wasrepresented by two values on the input. The author inhis next research [Bargoti and Underwood, 2017] addedthe azimuth of the sun to a Convolutional neural networkand achieved the same conclusion. It seems more char-acteristics should be used in presenting the sun’s statusrather than two values.

Some researchers implemented preprocessing opera-tions to overcome this issue. For example, Zemmouret al. [Zemmour et al., 2017] divided the image intosub-images with a homogeneous lighting conditions andthen categorized them into groups based on their levelof illumination: low, mid or high. This method can beapplied to any images and does not require any externalhardware, although it is time-consuming. Another ap-proach is using exposure fusion to capture images [Silwalet al., 2017]. It means images with different exposuretimes were captured from the same scene. Then, thebest region was computed for each image, and those re-gions were then fused into the final image. Althoughthis method will reduce glare or dark images, the cap-turing of the image will be slow, and it is not suitablefor real-time application.

Another solution is changing the capturing environ-

ment. For example, capsicum detection in [Sa et al.,2016] was conducted in a commercial field under con-trolled lighting conditions. Jimenez et al. [A. R. Jimenezet al., 2000] suggested that using artificial lighting canreduce shadows caused by sunlight. Another suggestion,for having consistent lighting conditions, was capturingimages at nighttime [Fu et al., 2015; Fu et al., 2017;Wijethunga et al., 2009; Wang et al., 2013] or on anovercast day [Dias et al., 2018].

Another approach was using a plain background (e.g.sheet/board/cover) behind the tree to achieve robust de-tection results [Silwal et al., 2017; Nguyen et al., 2016;Baeten et al., 2008]. By using this technique, researchersachieved good results, especially on apples. However, us-ing a plain background is related to the structure of theorchard. Hence this method can not be used on the ki-wifruit orchard. Reducing the lighting to a single source,such as a camera flash, at night time would be useful.However, it would not be useful for all situations in thedaytime. In a kiwifruit orchard, the camera faces thesun, hence adding artificial lighting, in theory, may helpto capture a better image. Taking images with unevenillumination conditions is not avoidable.

With regard to the occlusion challenge, it is relatedto the production environment and the type of orchardand fruit. In New Zealand, most kiwifruit orchards arebuilt on a pergola structure which makes kiwifruit hangdownward in the canopy. The pergola structure is builtby using posts, supporting beams, and supporting wires.

Occlusion — from the perspective of the camera —usually occurs when the fruit is occluded by at leastone obstacle such as a: leaf, fruit, branch, wire, sup-porting beam or post. The density of kiwifruit, foliagecoverage and thinning method effect on the degree ofocclusion. Classifying the fruit data into occluded andnon-occluded classes is time-consuming and subjective.Despite these difficulties, it can be a good reference tocompare the performance of the fruit detection method.Researchers reported fruit occlusion as a challenge forhand-engineered feature based techniques. For example,one of the common approaches for detecting circular fruitis using CHT which is not able to detect occluded fruit[Hiroma, 2002].

3 Detection Method

This section presents the Fully Convolutional Network(FCN) for kiwifruit calyx detection and introduces thetransfer learning and preprocessing techniques related toimproving performance under challenging lighting condi-tions.

The key requirements for the detection method werethat the boundaries of kiwifruit and obstructions, suchas wires and branches in the canopy, should be detected.Bounding boxes do not define the boundaries of thin ob-

structions well when the obstructions are at an angle tothe rows and columns of the image. For example, in animage of the kiwifruit canopy, a thin wire that starts atone corner of an image and ends at the opposite cornerwould have a ground truth bounding box of the wholeimage, even though the wire only takes up a small area ofthe image and the canopy shown in the image. Instead,it was decided to use a semantic segmentation neuralnetwork which can identify the pixels associated with ki-wifruit, branches and wires. The information about theexistence of wires and branches helps the arm to do nottry to pick the kiwifruit close to obstacles or approachingthose kiwifruits with a certain angle and it will preventthe gripper to getting stuck in obstacles. Kiwifruit growsin clusters so recognizing individual fruit in a cluster wasa challenge. Hence, instead of detecting the skin of thekiwifruit, only the calyx area was identified.

After experimenting with different instances of theFCN for semantic segmentation, the architecture se-lected for use was FCN-8s [Long et al., 2015]. In orderto train FCN-8s, 63 images with 113 kiwifruits were la-belled manually with kiwifruit calyx, branch and wireclasses. In labelling for the training dataset, there wasnot any classification on calyxes such as occluded andnon-occluded and all calyxes were labelled as the calyx.The dimensions of the RGB input and indexed label im-ages were 1920×1200 pixels. The dataset was dividedinto 48 training and 15 validation images. The inputsfor training were cropped to 200×200 pixels from thefull sized images and label images in order to limit thememory used during training. An example input imageand annotation image pair are shown in figure 2.

Figure 2: An example of training data which includes (a)input image (b) annotated image green, cyan and pinkcolour indicates calyx of the kiwifruit, branch and wire,respectively

Images were taken in 2015 and 2016 from a range ofkiwifruit orchards at different times of the day and withdifferent weather. A tool was implemented for labellingsegments, with which the user selects the boundary, andthe class of the segment and the tool fills segments with acolour that has been assigned to the class. Pixel-wise an-

notation is a time-consuming process, and it took around30 minutes for labelling three classes per image.

In transfer learning, an adapted version of VGG16 net[Simonyan and Zisserman, 2014] equivalent to 19 convo-lutional layers was used, with feature windows of eightpixels. The model with this configuration is called FCN-8s and was provided for Caffe via GitHub1. The PAS-CAL VOC dataset was used for pre-training, so the num-ber of classes represented by the model was 21. ThePASCAL dataset has 20 classes (21 including the bor-ders), and then the final label was ignored because itrepresented borders between classes.

NVIDIA DIGITS (version 5.1-dev) was used for train-ing the model using the NVIDIA Caffe back-end (version0.15.14) on an Nvidia GTX-1070 8GB graphics card.The maximum image size was limited to 200×200 pixelsdue to the hardware constraints. The result was achievedafter convergence at a 0.0002 learning rate with 1000epochs. Instead of a standard stochastic gradient de-scent procedure, the Adam algorithm was used for up-dating network weights. The output of the model was ah×w× d matrix which h×w was the image size, and dvalue refered to the number of classes (21) in PASCALVOC dataset. The output matrix showed the probabil-ity of being the specified class for each pixel. Applyinga maximum function to values of d for each pixel gaveus the index of the class that the pixel belonged. Anexample of the output is presented in figure 3.

Figure 3: The output of applying FCN-8s on an image,pixels with blue, red and green colours show calyx, wireand branches, respectively

FCN-8s is a pixel-based method and indicated seg-ments show the result of the detection. In order to iden-tify and extract calyx regions, blob detection was usedto determine the centre and the radius of the calyx. Thestandard OpenCV blob detection was used for extract-ing blobs. The Minimum thresholds were set to the areasize (150 pixels) and the circularity (0.5) parameters.

1https://github.com/shelhamer/fcn.berkeleyvision.org

Due to the pergola structure of kiwifruit orchards, thecamera was facing the sun. Therefore, a wide range ofimages with various lighting conditions can be captured.Images were classified to typical, overexposed and glareimages based on lighting conditions. An example of theseimages is shown in figure 4.

The maximum image size that the trained model canbe run on is 500× 500 pixels. It is larger than the inputtraining images due to the inference not needing anybackward and forward pass. Therefore, the input im-age with the resolution of 1936 × 1216 was divided to500 × 500 sub-images. Each sub-image had an overlapwith neighbour windows to reduce the impact of objectssitting close to the edge of a crop being wrongly detected.This overlap was set to 20% of width

height . To handle overlapareas, where there may be conflicting results on whatthe pixel was, the maximum confidence and class werechosen between the crops.

The model was trained using typical images. There-fore, to overcome unexpected lighting conditions, prepro-cessing was applied to images. In overexposed images, ahigh number of pixels was saturated. Besides, the dis-tribution of light was not equal, and sometimes a regionof the image was dark, and the fruit was hard to see.

Figure 4: An example of kiwifruit images (a) a typicalimage (b) an overexposed image

The glare image is distinguished visually by a cover ofpurple colour on the majority part of the image. Thismeans the image has a high number of saturated pixelsin the blue channel (figure 5(a)).

In order to automatic classify the overexposed andglare image a definition based on the number of satu-rated pixels are defined. An image is overexposed if thenumber of saturated pixel in R (Rs), G (Gs), and B (Bs)channels are more than a quarter of the image size (S),as follows:

Bs and Gs and Rs

S≥ 0.25 (1)

An image is defined glare if the image in overexposedand the number of overexposed pixels in channel blue is

more than half of the image size.

Glare =Bs and Gs and Rs

S≥ 0 .25 and

Bs

S≥ 0 .5

(2)

Regarding the preprocessing technique for an overex-posed image, the image was converted from RGB chan-nel to YCbCr, and then histogram equalization (HE)was applied on the Y channel and converted back to theRGB channel. Then, the image was split into 500×500and the FCN-8s applied to each part.

Figure 5: An example of a (a) glare image (b) its bluechannel (c) its green channel (d) its red channel

A glare image is shown in figure 5 in different chan-nels. You can see, the green channel has more informa-tion, and the blue channel is affected by the sun. Hence,in this kind of image, the blue channel was exchangedfor the green channel. The histogram equalisation wasapplied to each 500×500 block, due to the lighting condi-tions being more dynamic when compared to the overex-posed images. At the end, FCN-8s was applied on eachblock.

4 Results

Two main challenges of fruit detection are various light-ing conditions and the fruit occlusion [A. R. Jimenezet al., 2000]. Therefore, a suitable kiwifruit detectionmethod needs to work reliably and fast in both condi-tions. In this section, first, the dataset is described andthen the performance of the proposed method is evalu-ated in different lighting conditions and with occludedfruit. Performance is based on existing evaluation met-rics.

4.1 Dataset

Images were captured by two Basler ac1920-40uc USB3.0 colour camera with Kowa lens LM6HC (similar set-ting to [Nejati et al., 2019]). In order to have consistentlighting in images, a pair of light bars were mounted

alongside the cameras. Images of the dataset were col-lected during daylight hours at a kiwifruit orchard inTauranga on the 23rd of March 2017 by a test rig. Thedataset consisted of two different sub-datasets for test-ing the performance of the detection method in variouslighting conditions and the kiwifruit occlusion.

In the kiwifruit occlusion dataset, 50 images were se-lected randomly which included natural lighting condi-tions. The definition of occlusion is when half of thecalyx is occluded by at least one of the obstacles. Thenumber of visible calyx and occluded ones were countedin each image manually. Figure 6 depicts examples ofobjects that could occlude a calyx.

Figure 6: An example of occluded calyx by a (a) branch(b) leaf (c) wire (d) fruit (e) post (f) supporting beam

Then, the proposed detection method without prepro-cessing was applied to images, and the performance onoccluded and non-occluded calyxes was compared.

To cover the various lighting conditions, anotherdataset was selected randomly from the captured im-ages. The dataset was divided into three classes whichwere typical, overexposed and glare images. The defi-nition of these terms was discussed in the prior section.Then, from each class, some images were selected ran-domly and an amateur image annotator hand-labelledthe calyxes using an object annotation toolbox, and anexpert image annotator reviewed all images.

In order to compare datasets, two parameters wereconsidered which were the number of images and densityof calyxes is shown in table 1. The density of calyxes iscalculated, as follows:

density =Total number of calyxes

Total number of images(3)

Density is the average number of calyxes per image in adataset. Generally, a high density causes poor visibilityand more occlusion in an image.

4.2 Indicators For Evaluation

Four indicators were used for comparing the proposeddetection method on different datasets includes recall,

Table 1: The number of images and density of calyxesin datasets

Dataset Number of images DensityKiwifruit occlusion 50 62.7

Typical lighting 121 59.29Overexposed lighting 63 53.6

Glare lighting 61 82.8

precision, F1score and processing time.Recall is the number of the calyxes that detected cor-

rectly (TP ) over the total number of calyxes in an image(TP + FN) and computed as follows:

Recall =TP

TP + FN(4)

Precision shows the reliability of the method. It is thenumber of the calyxes that detected correctly over thetotal number of calyxes that detected in an images (TP+FP ) and calculated as:

Precision =TP

TP + FP(5)

F1score gives a better idea of the method performance.It is the combination of the precision and recall via thefollowing equation:

F1 score = 2 · Precision · RecallPrecision + Recall

(6)

4.3 Kiwifruit Occlusion Dataset

First, the proposed detection method was applied to thedataset. The performance of the detection method wasmeasured by three indicators which were recall, preci-sion, and F1score of the overall visible calyxes (table 2).The F1score was high, so the method would be promis-ing. The dataset was split up into the occluded and non-

Table 2: The overall performance of the detectionmethod on the kiwifruit occlusion dataset

Recall Precision F1score0.75 0.92 0.82

occluded calyxes due to the performance of the detectionmethod being different between them. The number ofnon-occluded calyxes over the total number of calyxes inthe dataset were counted manually and presented as thepercentage of non-occluded calyxes in figure 8. Accord-ing to figure 8, the percentage of non-occluded calyxes is78.08% which shows by the blue colour in the pie chart;the percentage of occluded being 21.92%. Because the

goal was showing the effect of the calyx occlusion on de-tection, the detection performance was measured on theoccluded and non-occluded calyxes separately.

Table 3 shows the recall of the detection method onthe non-occluded calyxes was 0.87, while on the occludedcalyxes was 0.3. This means if the calyx was fully visible,the method could detect accurately with a high proba-bility.

Table 3: The performance of the detection method onoccluded and non-occluded kiwifruit

Detection performance Non-occluded OccludedRecall 0.87 0.30

According to table 3, the reason that 0.23 of non-occluded calyxes were not detected was due to the var-ious lighting conditions and the complexity of the or-chard, with some kiwifruit being tilted. An example ofthese conditions is shown in figure 7.

Figure 7: An example of (a) tilted (b) glare (c) overex-posed kiwifruit

In order to discover which object has more influence onocclusion, the contribution of each object in calyx occlu-sion was counted, this is illustrated in figure 8. Amongall objects which caused occlusion, leaves had occluded9.6% of calyxes which was the highest rate. The struc-tural elements had the lowest percentage due to therewere fewer of them in the orchard compared to otherobjects.

Figure 8: The distribution of the status of the calyx ofthe kiwifruit

4.4 Various Lighting Conditions

The results shown in the fruit occlusion dataset subsec-tion was conducted without applying any preprocessing.In this section, related to the category of lighting condi-tions, the relevant preprocessing was applied to images.Automatic counting was utilised to evaluate the detec-tion method instead of manually counting. The result ofthe detection method was presented by the centre andthe size of a blob which indicates the calyx region. How-ever, the ground truth was annotated by a boundingbox around the calyx area. For evaluation, finding thecorrespondence between each calyx in the predicted andground truth set was required. Therefore, the Euclideandistance (in pixel units) between each calyx in the twosets had been computed. All calyxes which had a dis-tance greater than a defined threshold was assigned withthe highest cost. In our experiments, the threshold wasset as 20 pixels.

Then, the Hungarian method [Robinson and Assign-ment, 1955] was used as an optimizer to assign each calyxin the detection set with the lowest cost matched calyx inthe ground truth set. Therefore, true positives are whenthe predicted calyxes have a correspondent in the groundtruth set. The remaining calyxes in the predicted andground truth set were false positives and true negatives,respectively.

To compare the influence of the preprocessing tech-nique, the detection method was applied to threedatasets without any preprocessing. You can see resultsin table 4; the method has the lowest recall on glareimages.

Table 4: The comparison between the performance of thedetection method (FCN-8s) and the proposed method(preprocessing and FCN-8s) on various lighting condi-tions

Method/Dataset Recall Precision F1scoreFCN/Typical 0.74 0.92 0.82FCN/Overexp 0.41 0.7 0.52

Proposed/Overexp 0.45 0.7 0.55FCN/Glare 0.07 0.64 0.13

Proposed/Glare 0.30 0.71 0.42

The precision in all conditions was quite acceptable.It means there will be a small number of false positives.However, the recall was low in glare images. After apply-ing the preprocessing to overexposed and glare images,the result had improved. The result is shown in table4. The improvement in recall for overexposed images isnegligible, while it is a significant improvement in theglare images.

4.5 Processing Time

The trained model was deployed on a PC that hasIntel Xeon(R), 64-bit 2.40GHz ×16 CPUs, QuadroP6000/PCIe/ SSE2 GPU and 32 GiB RAM on anUbuntu 16.04 Linux system. The processing time foran image includes: cropping, preprocessing, detection,and merging on an image with 1936×1216 px resolution.The average processing time of these image process with-out the preprocessing part was 1.5 s. The comparison ofprocessing time is shown in table 5.

Table 5: Average processing time for different lightingconditions

Images Average processing time (s)Typical 1.51

Overexposed 1.53Glare 1.59

The average processing time of preprocessing for glareimages is 90ms and for overexposed images is 20ms whichis negligible. Moreover, the processing time of blob de-tection is related to the number of detected calyxes.

5 Discussion

The result of the detection method depends on the de-gree of occlusion and lighting conditions. The perfor-mance of the method was discussed by showing F1scoreand processing time. Regarding the calyx occlusion, thelevel of occlusion is related to the thinning method, or-chard structure, foliage coverage and density of kiwifruit.Therefore, the performance of the detection method willbe different from orchard to orchard and from year toyear. Moreover, there are different types of objects whichcause calyx occlusion. According to figure 8, 9.6% of ca-lyxes are occluded by leaves, which the user can removeby hand or using an air blowing system [Dobrusin et al.,1992]. Wires and branches occluded around 7% of ca-lyxes. In the real world, these kiwifruits are obstructedand hard for a robotic arm to pick so maybe they wouldbe detected and picked in the second trial with anothermethod. Moreover, 5.2% of calyxes are overlapped byother calyxes, and they could become visible when otherkiwifruit are picked.

Among current research that has been done on ki-wifruit detection, Fu et al. [Fu et al., 2015; Fu et al.,2017] captured kiwifruit data at night time to controlthe lighting conditions, and no information about falsepositives was released. Only Zhan et al. [Zhan, W., He,D., & Shi, 2013] used daytime data and presented recalland precision. Although, the evaluation was done at thepixel level and it could not guarantee that the methodcan detect individual kiwifruit. In all lighting conditions,

the precision of our method is quite high; it means thereare a few false positives and in the majority of cases thedetected object is a kiwifruit.

Processing time is one of the most important indica-tors for a kiwifruit detector. Fu et al. [Fu et al., 2015;Fu et al., 2017] was the only researcher who released thatthe processing time for his proposed detection methodswas 1.5 and 0.5 second per fruit. In our method addingthe preprocessing technique has a negligible effect on theprocessing time; whereas it has a considerable influenceon F1score. In our method, different kiwifruit densitiesfrom 53 to 82 kiwifruit per image were tested. The av-erage processing time was 1.5 s per image, and it issuitable for using on a real-time robot.

One of the benefits of using FCN-8s was the require-ment of a small training dataset and achieving an accept-able result. In our method, training dataset includes 48images with 200× 200 pixels and 113 kiwifruits in to-tal. Fu et al. [Fu et al., 2018] trained a Faster R-CNNwith 1176 sub-images with 784×784 pixels and 10 ki-wifruits on average per image. It means around 11760kiwifruits were labelled to train the model which is 100times more than the number of kiwifruits trained in ourmodel. Fu et al. achieved recall and precision of 96.7%and 89.3%, respectively compared to our method on oc-clusion dataset which was 0.75 and 0.92, respectively.Due to the low number of training set our method candetect less kiwifruit than trained faster R-CNN but ourmethod is more reliable.

6 Conclusion and Future Work

We have presented a semantic segmentation method fordetecting kiwifruit in the orchard. The method has beenevaluated in different lighting conditions and fruit occlu-sion based on F1score and processing time. A prepro-cessing approach is proposed for different lighting condi-tions which improves the performance. It is shown thereare more overexposed pixels in the blue channel of glareimages so this channel is replaced with the green chan-nel. On overexposed images and glare images, histogramequalisation is applied to decrease the dynamic lightingconditions. In glare images, the intensity has changeddynamically, so the histogram equalisation is applied oneach sub-image, while in overexposed images HE is usedon the entire image. We defined occluded fruit, overex-posed images and glare images and made a dataset foreach definition. The average F1score on images is 0.82with a processing time of 1.5 s per image. However,the performance on glare images was poor, and the pro-posed preprocessing method increase F1score from 01.3to 0.42.

Future work will be adding more sensors to collectthe dataset. Integrating the colour camera with a depthcamera and training the model with RGBD data can

cause better results in typical conditions. The false pos-itives appears to happen most around wires. This canpotentially be decreased if wire detection is added. In or-der to improve the performance of detection it is worth-while to add more data with: kiwifruit occlusions, titledkiwifruit and various lighting conditions to the trainingdata set.

Acknowledgments

This research was funded by the New Zealand Ministryfor Business, Innovation and Employment (MBIE) oncontract UOAX1414.

References[A. R. Jimenez et al., 2000] A. R. Jimenez, R. Ceres,

and J. L. Pons (2000). A SURVEY OF COMPUTERVISION METHODS FOR LOCATING FRUIT ONTREES. Transactions of the ASAE.

[Baeten et al., 2008] Baeten, J., Donne, K., Boedrij, S.,Beckers, W., and Claesen, E. (2008). Autonomousfruit picking machine: A robotic apple harvester. InSpringer Tracts in Advanced Robotics.

[Bargoti and Underwood, 2016] Bargoti, S. and Under-wood, J. (2016). Image classification with orchardmetadata. In Proceedings - IEEE International Con-ference on Robotics and Automation.

[Bargoti and Underwood, 2017] Bargoti, S. and Under-wood, J. P. (2017). Image Segmentation for FruitDetection and Yield Estimation in Apple Orchards.Journal of Field Robotics, 34.6:1039–1060.

[Barnett et al., 2017] Barnett, J., Seabright, M.,Williams, H., Nejati, M., Scarfe, A., Bell, J., Jones,M., Martinson, P., and Schare, P. (2017). RoboticPollination - Targeting Kiwifruit Flowers for Com-mercial Application. International Tri-Conference forPrecision Agriculture.

[Brown and K, 1968] Brown, C. E. S. and K, G. (1968).Basic Considerations in Mechanizing Citrus Harvest.Transactions of the ASAE, 11(3):343.

[Dias et al., 2018] Dias, P. A., Tabb, A., and Medeiros,H. (2018). Apple flower detection using deep convolu-tional networks. Computers in Industry, 99:17–28.

[Dobrusin et al., 1992] Dobrusin, Y., Edan, Y., Grinsh-pun, J., Peiper, U. M., and Hetzroni, A. (1992). Real-time image processing for robotic melon harvesting.

[Fu et al., 2018] Fu, L., Feng, Y., Majeed, Y., Zhang,X., Zhang, J., Karkee, M., and Zhang, Q. (2018). Ki-wifruit detection in field images using Faster R-CNNwith ZFNet. IFAC-PapersOnLine, 51(17):45–50.

[Fu et al., 2017] Fu, L., Sun, S., Manuel, V. A., Li, S.,Li, R., and Cui, Y. (2017). Kiwifruit recognition

method at night based on fruit calyx image. NongyeGongcheng Xuebao/Transactions of the Chinese Soci-ety of Agricultural Engineering.

[Fu et al., 2015] Fu, L. S., Wang, B., Cui, Y. J., Su,S., Gejima, Y., and Kobayashi, T. (2015). Kiwifruitrecognition at nighttime using artificial lighting basedon machine vision. International Journal of Agricul-tural and Biological Engineering.

[Gongal et al., 2015] Gongal, A., Amatya, S., Karkee,M., Zhang, Q., and Lewis, K. (2015). Sensors andsystems for fruit detection and localization: A re-view. Computers and Electronics in Agriculture,116(July):8–19.

[Hiroma, 2002] Hiroma, T. (2002). A Color Model forRecognition of Apples by a Robotic Harvesting Sys-tem. JOURNAL of the JAPANESE SOCIETY ofAGRICULTURAL MACHINERY, 64(5):123–133.

[Li et al., 2011] Li, P., Lee, S.-H., and Hsu, H.-Y.(2011). Procedia Engineering Review on fruit harvest-ing method for potential use of automatic fruit har-vesting systems. Procedia Engineering, 23:351–366.

[Long et al., 2015] Long, J., Shelhamer, E., and Darrell,T. (2015). Fully Convolutional Networks for Seman-tic Segmentation. In IEEE Transactions on PatternAnalysis and Machine Intelligence, pages 3431–3440.

[Nejati et al., 2019] Nejati, M., Ahn, H. S., and Mac-Donald, B. (2019). Design of a sensing module for akiwifruit flower pollinator robot. Australasian Con-ference on Robotics and Automation, ACRA, 2019-December.

[Nguyen et al., 2016] Nguyen, T. T., Vandevoorde, K.,Wouters, N., Kayacan, E., De Baerdemaeker, J. G.,and Saeys, W. (2016). Detection of red and bicolouredapples on tree with an RGB-D camera. BiosystemsEngineering.

[Robinson and Assignment, 1955] Robinson, J. and As-signment, S. (1955). 83-97, 1955. Naval research lo-gistics quarterly.

[Sa et al., 2016] Sa, I., Ge, Z., Dayoub, F., Upcroft, B.,Perez, T., and McCool, C. (2016). DeepFruits: AFruit Detection System Using Deep Neural Networks.Sensors.

[Safety, 2016] Safety, W. (2016). for WORKPLACESAFETY PROTECTION FROM EYE INJURY.Technical report.

[Silwal et al., 2017] Silwal, A., Davidson, J. R., Karkee,M., Mo, C., Zhang, Q., and Lewis, K. (2017). Design,integration, and field evaluation of a robotic apple har-vester. Journal of Field Robotics.

[Simonyan and Zisserman, 2014] Simonyan, K. and Zis-serman, A. (2014). Very Deep Convolutional Networksfor Large-Scale Image Recognition. pages 1–14.

[Wang et al., 2013] Wang, Q., Nuske, S., Bergerman,M., and Singh, S. (2013). Automated Crop Yield Es-timation for Apple Orchards. Experimental robotics,pages 745–758.

[Wijethunga et al., 2009] Wijethunga, P., Samaras-inghe, S., Kulasiri, D., and Woodhead, I. (2009).Towards a generalized colour image segmentation forkiwifruit detection. In 2009 24th International Con-ference Image and Vision Computing New Zealand,IVCNZ 2009 - Conference Proceedings.

[Williams et al., 2019a] Williams, H., Nejati, M., Hus-sein, S., Penhall, N., Lim, J. Y., Jones, M. H., Bell,J., Ahn, H. S., Bradley, S., Schaare, P., Martinsen,P., Alomar, M., Patel, P., Seabright, M., Duke, M.,Scarfe, A., and MacDonald, B. (2019a). Autonomouspollination of individual kiwifruit flowers: Toward arobotic kiwifruit pollinator. Journal of Field Robotics,(August 2018).

[Williams et al., 2019b] Williams, H., Ting, C., Nejati,M., Jones, M. H., Penhall, N., Lim, J., Seabright, M.,Bell, J., Ahn, H. S., Scarfe, A., Duke, M., and Mac-Donald, B. (2019b). Improvements to and large-scaleevaluation of a robotic kiwifruit harvester. Journal ofField Robotics.

[Williams et al., 2019c] Williams, H. A., Jones, M. H.,Nejati, M., Seabright, M. J., Bell, J., Penhall, N. D.,Barnett, J. J., Duke, M. D., Scarfe, A. J., Ahn, H. S.,Lim, J. Y., and MacDonald, B. A. (2019c). Robotic ki-wifruit harvesting using machine vision, convolutionalneural networks, and robotic arms. Biosystems Engi-neering, 181:140–156.

[Zemmour et al., 2017] Zemmour, E., Kurtser, P., andEdan, Y. (2017). Dynamic thresholding algorithm forrobotic apple detection. In 2017 IEEE InternationalConference on Autonomous Robot Systems and Com-petitions, ICARSC 2017.

[Zhan, W., He, D., & Shi, 2013] Zhan, W., He, D., &Shi, S. (2013). [2013]-TCSAE-C7-Recognition of ki-wifruit in field based on Adaboost algorithm. Trans-actions of the Chinese Society of Agricultural Engi-neering, 29(23):140–146.

[Zhao et al., 2016] Zhao, Y., Gong, L., Huang, Y., andLiu, C. (2016). A review of key techniques of vision-based control for harvesting robot. Computers andElectronics in Agriculture, 127:311–323.

arxiv:2006.11729v1 [cs.cv] 21 jun 2020 · in 1968, brown arxiv:2006.11729v1 [cs.cv] 21 jun 2020. et...

Documents