multiscalereceptivefieldsgraphattentionnetworkforpoint...

Research ArticleMultiscale Receptive Fields Graph Attention Network for PointCloud Classification

Xi-An Li 12 Li-Yan Wang 1 and Jian Lu 3

1Department of Stomatology Foshan Woman and Childrenrsquos Hospital Foshan Guangdong 528000 China2School of Mathematical Sciences Shanghai Jiao Tong University Shanghai 200240 China3Guangdong Institute of Aeronautics and Astronautics Equipment and Technology Zhuhai 519000 China

Correspondence should be addressed to Xi-An Li lixa0415sjtueducn Li-Yan Wang wangliyankmmc163com and Jian Lulujgiaaetcom

Received 12 September 2020 Revised 19 November 2020 Accepted 22 January 2021 Published 24 February 2021

Academic Editor Rongxin Cui

Copyright copy 2021 Xi-An Li et al2is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Understanding the implication of point cloud is still challenging in the aim of classification or segmentation for point cloud due toits irregular and sparse structure As we have known PointNet architecture as a ground-breaking work for point cloud process canlearn shape features directly on unordered 3D point cloud and has achieved favorable performance such as 86 mean accuracyand 892 overall accuracy for classification task respectively However this model fails to consider the fine-grained semanticinformation of local structure for point cloud2en a multiscale receptive fields graph attention network (named after MRFGAT)by means of semantic features of local patch for point cloud is proposed in this paper and the learned feature map for our networkcan well capture the abundant features information of point cloud 2e proposed MRFGAT architecture is tested on ModelNetdatasets and results show it achieves state-of-the-art performance in shape classification tasks such as it outperforms GAPNet(Chen et al) model by 01 in terms of OA and compete with DGCNN (Wang et al) model in terms of MA

1 Introduction

Point cloud as a simple and efficient representation for 3Dshapes and scenes has becomemore andmore popular in thefields of both academia and industry For example auton-omous vehicle [1ndash4] robotic mapping and navigation [5ndash7]3D shape representation and modelling [8 9] and otherrelevant applications [10ndash15] Lots of ways can be used toobtain 3D point cloud data such as utilizing 3D scannersincluding physical touch or noncontact measurements withlight sound LiDAR etc

Up to now a variety of approaches have been developedto handle this kind of data such as the commonly usedtraditional handcraft algorithms [16ndash18] In terms of thesemethods it is significant to classify or segment point cloudby choosing salient features of point cloud such as normalscurvatures and colors Handcrafted features are usuallyemployed to address specific problems but tough to transfer

to new tasks 2en it is a hot topic in last decades that howto overcome the shortcomings for traditional methods

With the development of deep learning some existedend-to-end neural networks have overcame many chal-lengesrsquo stem from 3D data and made great breakthrough forpoint cloud see Figure 1 In particular the modificatoryworks of convolutional neural networks (CNNs) haveachieved significant success for point cloud data in computervision tasks such as PointNet [19] and its improved version[20] PointCNN [21 22] and PointSift [23] Unfortunatelylots of neural networks for point cloud only capture globalfeature without local information which are also an importsemantic feature for point cloud Hence exploiting rea-sonably the local information of point cloud has become anew research hotspot and some valuable works also havesprung up recently PointNet++ [20] extends the PointNetmodel by constructing a hierarchical neural network thatrecursively applies PointNet with designed sampling and

HindawiComplexityVolume 2021 Article ID 8832081 9 pageshttpsdoiorg10115520218832081

grouping layers to extract local features Graph neuralnetworks [24 25] can not only directly address a moregeneral class of graphs eg cyclic directed and undirectedgraphs but also be applied to deal with point cloud dataRecently DGCNN [26] and its variant [27] well utilized thegraph network with respect to the edgesrsquo convolution onpoints and then obtained the local edgesrsquo information ofpoint cloud Other relevant works applying the graphstructure of point cloud can be found in [28ndash30]

Attention mechanism plays a significant role in machinetranslation task [31] vision-based task [32] and graph-basedtask [33] Combining graph structure and attentionmechanism some favorable network architectures areconstructed which leverage well the local semantic featuresof point cloud Readers can refer to [34ndash36]

However the scale of different graphs for the existed graphnetworks are fixed then the semantic expression of the pointwill not be goodHence in thiswork inspired by graph attentionnetwork [33] graph convolution network [37] and local con-textual information networks we design a multiscale receptivefieldsrsquo graph attention network for point cloud classificationUnlike previous models that only consider the attribute infor-mation such as coordinate of each single point or only exploitlocal semantic information of point we pay attention to thespatial context information of both local and global structure forpoint cloud Finally like the standard convolution in grid do-main our model can also be efficiently implemented for thegraph representation of a point cloud

2e key contributions of our work are summarized asfollows

(i) We construct graph of local patch for point cloudand then enhance the feature representation ofpoint in point cloud by combining edgesrsquo infor-mation and neighborsrsquo information

(ii) We introduce a multiscale receptive fieldsrsquo mech-anism to capture the local semantic features invarious ranges for point cloud

(iii) We balance the influence between neighbors andcentroid in the local graph by means of attentionmechanism

(iv) We release our code to facilitate reproducibility andfuture research (httpsgithubcomBlue-GiantMRFGATndashNET)

2e rest parts of this paper are structured as follows InSection 2 we review the most closely related literatures on point

cloud In Section 3 we introduce our proposed MRFGAT ar-chitecture and provide the details of our framework in terms ofshape classification for point cloud We describe the dataset anddesign comparison algorithms in Section 4 followed by theexperimentsrsquo results and discussion Finally some concludingremarks are made in Section 5

2 Related Works

21 Pointwise MLP and Point Convolution NetworksUtilizing the deep learning technique the classical PointNet [19]was proposed to deal with directly unordered point cloudswithout using any volumetric or grid-mesh representation 2emain idea of this network is as follows At first a SpatialTransformer Network (STN) module similar to feature-extracting process is constructed which guarantees the invari-ance of transformations 2en a shared pointwise Multilayer-Perceptron (MLP)module is introduced which is used to extractsemantic features form point sets At last the final semanticinformation of point cloud is aggregated by means of a maxpooling layer Due to the favorable ability to approximate anycontinuous function for MLP which is easy to implement bypoint convolution some relatedworkswere presented accordingto the PointNet architecture [38 39]

Similar to convolution operator in 2D space someconvolution kernels for points in 3D space are designedwhich can capture the abundant information of point cloudPointCNN [21] used a local X-transformation kernel tofulfill the invariance of permutation for points and thengeneralized this technique to the hierarchical form inanalogy to that of image CNNs 2e authors in [40ndash42]extended the convolution operator of 2D space applied atindividual point in local region of point cloud and thencollected the neighborsrsquo information in the hierarchicalconvolution layer to the center point Kernel Point Con-volution (KPConv) [43] consists of a set of local 3D filtersand overcomes stand point convolution limitation 2isnovel kernel structure is very flexible to learn local geometricpatterns without any weights

22 Learning Local Features In order to overcome theshortcoming for PointNet-like networks which fail to exploitlocal features some hierarchical architectures have beendeveloped for example PointNet [20] and So-Net [38] toaggregate local information with MLP operation by con-sidering local spatial relationships of 3D data In contrast tothe previous type these methods can avoid sparsity andupdate dynamically in different feature dimensionsAccording to a Capsule Networks 3D Capsule Convolu-tional Networks were developed which can learn well thelocal features of point cloud see [44ndash46]

23 Graph Convolutional Networks Graph ConvolutionalNeural Networks (GCNNs) have gained more and moreattraction to address irregularly structured data such ascitation networks and social networks In terms of 3D pointcloud data GCNNs have shown its powerful ability onclassification and segmentation tasks Using the convolution

N times

d Extraction

N times

dprime

Aggregation

Feat

ure v

ecto

r

Classifier K

Figure 1 A typical framework of a 3D point cloud classificationprocess 2e model takes N points in d-dimensional space as itsinput and then extracts point features of dimension dprime followed byan aggregation module used to form a feature vector which isinvariant to point permutation Finally a classifier is used to classifythe resulting feature vector into K categories

2 Complexity

operator with respect to the graph in the spectral domain isan important approach [47ndash49] but it needs to calculate a lotof parameters on polynomial or rational spectral filters [50]Recently many researchers constructed local graph of pointcloud by utilizing each pointrsquos neighbors in low-dimensionalmanifold based on N-dimensional Euclidean distance andthen grouped each pointrsquos neighbors in the form of high-dimensional vectors such as EdgeConv-like works[26 27 51] and graph convolutions [37 52] Compared withthe spectral methods its main merit is that it is moreconsistent with the characteristics of data distributionSpecially EdgeConv extracts edge features through the re-lationship between central point and neighbor points bysuccessively constructing graph in the hierarchical model Tosum up the graph convolution network combines featureson local surface patches which are invariant to the defor-mations of patches for point cloud in Euclidean space

24 Attention Mechanism 2e idea of attention has beensuccessfully used in natural language processing (NLP) [31]and graph-based work [33 53] Attention module canbalance the weight relationship of different nodes in graphstructure data or different parts in sequence data

Recently the attention idea has obtained more and moreattraction and made a great contribution to point cloudprocesses [34 35] In these works it is significant to ag-gregate point or edge features by means of attention moduleUnlike the existing methods we try to enhance the high-level representation of point cloud by capturing the relationof points and local information along its channel

3 Our Approach

2e framework of point cloud classification includes twocontents taking the 3D point cloud as input and assigningone semantic class label for each point Based on thetechnique of extracting features from the local directedgraph and attention mechanism a new architecture forshape classification task is proposed to better learn pointrsquosrepresentation for unstructured point cloud 2is new ar-chitecture is composed of three components which are thepoint enhancement the feature representation and theprediction 2ese three components are fully coupled to-gether which leads to an end-to-end training pipeline

31 Problem Statement At first we letP pi isin Rd i 1 2 N1113966 1113967 represent a raw set of unor-dered points as the input for our mode where N is thenumber of the points and pi is a feature vector with a di-mension d In actual applications the feature vector pi mightcontain 3D space coordinates (x y z) color intensitysurface normal etc For the sake of simplicity we set d 3 inour work and only take 3D coordinates of point as thefeature representation for point A classification or semanticsegmentation of a point cloud are map Φc or Φs respec-tively which assign individual point semantic labels or pointcloud semantic labels respectively ie

Φ P⟶ Lk (1)

Here Φ represents the map Φc or Φs 2e objective ofour model is finding the optimal map that can obtain ac-curate semantic labels

2e above map should satisfy some constraints includingthe following (1) Permutation invariance the order ofpoints may vary but does not influence the category of thepoint or point cloud (2) Transformation invariance for theuncertain translation and rotation of point cloud the resultsof classification or segmentation should not be changed forpoint or point cloud

32 Graph Generation for Point Cloud Some works indicatethat local features of point cloud can be used to improve thediscriminability of point then exploring the relationshipamong points in a whole sets or local patch is a keypoint forour work Graph neural network is a feasible approach toprocess point cloud because it propagates on each node forthe whole sets or a local patch of point cloud individuallyignores the permutation order of nodes and then extractsthe local information between nodes To apply the graphneural network on the point cloud we firstly convert it to adirected graph Like DGCNN [26 27] and GAPNet [34] wecan obtain the neighbors (including self ) of each point inpoint cloud by means of KndashNN algorithm and then con-struct a local directed graph in Euclidean space Figure 2depicts the directed graph G (V E) of local patch for pointcloud V 1 2 K is the vertice set of G namely thenodes of local patch E stands for the edge set of G and eachedge is eij pi minus pij with pi isin P and pij isin V being centroidand neighbors respectively

33 Single Receptive Field Graph Attention Layer (SRFGAT)In order to aggregate the information of neighbors we use aneighboring-attention mechanism which is introduced toobtain attention coefficients of neighbors for each point seeFigure 3 Additionally edge features are important localfeatures which can enhance the semantic expression ofpoint then an edge-attention mechanism is also introducedto aggregate information of different edges see Figure 3 Inlight of the attention mechanism [33 34] we firstly trans-form the neighbors and edges into a high-level feature spaceto obtain sufficient expressive power To this end as aninitial step a parametric nonlinear function h(middot) is applied toevery neighbor and edge and the results are defined by

eijprime h eij θ1113872 1113873 isin RFprime

pijprime h pij θ1113872 1113873 isin RFprime

(2)

respectively where θ is a set of learnable parameters of thefilter and Fprime is output dimension In our method functionh(middot) is set to a single-layer neural network

It is worthwhile to noting that edges in Euclidean spacenot only stand for the local features but also indicate thedependency between centroid and neighbor We then obtainattentional coefficients of edges and neighbors which are

Complexity 3

aij Leakly ReLU g eijprime θ1113872 11138731113872 1113873 and bij Leakly ReLU g eij θ1113872 11138731113872 1113873

(3)

respectively where g(eijprime θ) and g(eij θ) are single-layer

neural network with 1-dimensional output Leaky ReLU(middot)

denotes nonlinear activation function leaky ReLU withReLU max 0 x To make coefficients easily comparableacross different neighbors and edges we use a softmax op-eration to normalize the above coefficients which are defined as

αij exp aij1113872 1113873

1113936kexp aik( 1113857

βij exp bij1113872 1113873

1113936kexp bik( 1113857

(4)

respectively then the normalized coefficients are used tocompute contextual feature for every point and it is

pi

pi2pi6

pi3pi5

pi1

pi4

ei1

ei2

ei3

ei4

ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi

Figure 2 Local graph of point cloud pi and pi1 pi2 pi3 pi4 pi5 pi61113864 1113865 are a central point and its neighbors respectively 2e directed edgesfrom the neighbors to the central point are denoted by ei1 ei2 ei3 ei4 ei5 ei61113864 1113865

eprimei1

eprimei2

eprimei3

eprimei4

eprimei5

eprimei6

ei1

ei2

ei3

ei4

ei5

ei6

Edgesprimeattention

Neighborsprimeattention

Edges-coef

Neighbors-coef

ei1 ei2

ei3

ei4ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi

Figure 3 An illustration of attention coefficientsrsquo generation 2e edge feature not only serves as local information to center point but alsoresponses the effect of between neighbors and centroid Edge attention and neighbor attention reflect the important of edge features andneighbor features to centroid respectively

4 Complexity

1113957xi f 1113944j

αijeijprime⎛⎝ ⎞⎠ 1113944

j

βijqijprime⎛⎝ ⎞⎠

⎛⎝ ⎞⎠ (5)

where f(middot) is a nonlinear activation function and isconcatenation operation In our model we choose ReLUfunction as f(middot)

34 Multiscale Receptive Fields Graph Attention Layer(MRFGAT) In order to obtain sufficient feature informa-tion and stabilize the network the multiscale receptive fieldstrategy analogous to multiheads mechanism is proposedsee Figure 4 Unlike previous works the sizes of receptivefields in our model are different for various branches2erefore we concatenate M independent SRFGATmoduleand generate a semantic feature with M times Fprime channels

1113957xi Mm11113957x

(m)i (6)

where 1113957x(m)i is the receptive field feature of the mth branch M

is the total number of branches and is the concatenationoperation over feature channels

35 MRFGAT Architecture Our MRFGATmodel shown inFigure 5 considers shape classification task for point cloud2e architecture is similar to PointNet [19] However thereare three main differences between the architectures ofMRFGATand PointNet Firstly according to the analyses ofLinkDGCNN model we remove the transformation net-work which is used in many architectures such as PointNetDGCNN and GAPNet Secondly instead of only processingindividual points of point cloud we also exploit local fea-tures by a SRFGAT-layer before the stacked MLP layers2irdly an attention pooling layer is used to obtain localfeature information that is connected to the intermediatelayer for forming a global descriptor In addition we ag-gregate individually the original edge feature of everySRFGAT channel and then obtain local features which canenhance the semantic feature of MRFGAT

4 Experiments

In this section we evaluate ourMRFGATmodel on 3D pointcloud analysis for the classification tasks To demonstrateeffectiveness of our model we then compare the perfor-mance for our model to recent state-of-the-art methods andperform ablation study to investigate different designvariations

41 Classification

411 Dataset We demonstrate the feasibility and effec-tiveness of our model on the ModelNet dataset such asModelNet40 benchmarks [54] for shape classification 2eModelNet40 dataset contains 12311 meshed CAD models

that are classified to 40 man-made categories In this workwe divide the ModelNet40 dataset into two parts the partone is named as training set which includes 9843 models andthe part two is called as testing set includes 2468 models2en we normalize the models in the unit sphere anduniformly sample 1024 points over model surface Besideswe further augment the training dataset by randomly ro-tating scaling the point cloud and jittering the location ofevery point by means of Gaussian noise with zero mean and001 standard deviation for all the models

412 Implementation Details According to the analysis ofthe LinkDGCNN model [27] we omit the spatial trans-formation network to align the point cloud to a canonicalspace 2e network employs four SRFGAP layer moduleswith (8 16 16 24) channels to capture attention featuresrespectively 2en four shared MLP layers with sizes (12864 64 64) respectively followed by it are used to aggregatethe feature information Next the output features are fedinto an aggregation operation followed by the MLP layerwith 1024 neurons In the end of network a max poolingoperation and two full-connected layers (512 256) are usedto finally obtain the classification score 2e training iscarried out using Adam optimizer with minibatch training(batch size of 16) and an initial learning rate of 0001 2eReLU activate function and Batch Normalization (BN) arealso used in both the SRFGAP module and MLP layer Atlast the network was implemented using TensorFlow andexecuted on the server equipped with four NVIDIAGTX2080Ti

413 Results Figures 6ndash8 depict the process for trainingand testing From the figures we see that our model willquickly attain the stage of high accuracy which means ourmodel is highly efficient Table 1 lists the results of ourmethod and several recent state-of-the-art works 2emethods listed in Table 1 have one thing in common 2einput is only raw point cloud with 3D coordinates (xi yi zi)Based on these results we can conclude that our modelperforms better than other methods and obtains wonderfulperformance on both the ModelNet40 benchmark Com-pared to other point-based methods the performance forour model is only a little weaker than that of DGCNN interms of MA on ModelNet 40 However it outperforms theprevious state-of-the-art model GAPNet by 01 accuracy interms of OA 2ese phenomena show that the strategyemploying local and global features in different receptivefields is efficient and it will help us to capture the prominentsemantic feature for point cloud And in our model sincewe introduce the structure of the data by providing the localinterconnection between points and explore graph featuresfrom different scale field levels by the localized graphconvolutional layers it guarantees the exploration of moredistinctive latent representations for each object class

Complexity 5

Figure 4 Multiscale receptive fields

Inpu

tPoi

ntCl

oud

N times

3

SRFGAT1

SRFGAT2

middotmiddotmiddot

SRFGATM

ConcatEdge features||

Con

cat

MLP MLP MLP MLPConcat

MLP

N times

102

4

1024

Maxpooling

MLP

shared

Clas

sific

atio

n

||

||

Figure 5 2e architecture of classification In this framework it takes N points as input and applies M individual SRFGATmodules toobtain multiattention features on multilocal graphs then the output features are recast by means of five shared MLP layers and attentionpooling layer respectively Finally a shared full-connected layer is employed to form a global feature and then classification scores for ccategories are obtained

1

15

2

25

3

35

138

14

142

144

650 700 800600 750

200 400 600 800 1000 12000

Figure 6 2e curve of loss varying with epochs for ModelNet40 and it includes five training in one epoch

1

095

09

085

08

075

07

650 750 800600 700

095

096

097

098

1200400 600 800 10002000

Figure 7 2e curve of training accuracy varying with epochs for ModelNet40 and it includes five training in one epoch

6 Complexity

5 Conclusion

Enlightening by graph convolutional networks for the taskof classification in 3D computer vision we design a novelMRFGAT-based modules for point feature and contextaggregation Utilizing different receptive fields and at-tention strategies the pipeline MRFGAT can capture

more fine features of point clouds for classification task Inaddition we list some comparable results with recentworks which show that our model can achieve the state-of-the-art performance on the dataset ModelNet for classi-fication task of point clouds it outperforms the GAPNetmodel by 01 in terms of OA and competes with theDGCNN model in terms of MA It is necessary to pointout that our model will have some burden for constructingvarying scale graphs Based on the state-of-the-art GraphConvolution Networks (GCN) for semantic segmentationin point cloud it would be interesting to introduce ourmodel to address this problem for unstructured data in thefuture

Data Availability

2is dataset used in this manuscript is available at httpsshapenetcsstanfordedumediamodelnet40_ply_hdf5_2048zip

Conflicts of Interest

2e authors declare that they have no conflicts of interest

100 150 200

14

16

18

2

10050 150 200 2500

145

146

147

(a)

100 150 200

10050 150 200 2500065

07

075

08

085

09

095

09

091

(b)

10050 150 200 2500

06

07

08

09

086

088

100 150 200

(c)

Figure 8 2e curves of testing loss and accuracy varying with epochs for ModelNet40 (a) Loss (b) Accuracy (c) Mean accuracy

Table 1 Classification results on ModelNet40 MA representsmean per-class accuracy and the per-class accuracy is the ratio ofthe number of correct classifications to that of the objects in a classOA denotes the overall accuracy which is the ratio of the number ofoverall correct classifications to that of overall objects

Method Points MA () OA ()SO-Net [38] 2048 times 3 873 909KD-Net [40] 1024 times 3 885 918PointNet [19] 1024 times 3 860 892PointNet++ [20] 1024 times 3 mdash 907PointCNN [21] 1024 times 3 881 922DGCNN [26] 1024 times 3 902 922PCNN [22] 1024 times 3 mdash 923GAPNet [34] 1024 times 3 897 924Ours 1024 times 3 901 925

Complexity 7

Acknowledgments

2is work was supported by GDASprime Project of Science andTechnology Development (no 2018GDASCX-0804) andProject of Guangdong Engineering Technology ResearchCenter (no 810115228131)

References

[1] C R Qi W Liu C Wu H Su and L J Guibas ldquoFrustumPointNets for 3D object detection from RGB-D datardquo inProceedings of the 2018 IEEECVF Conference on ComputerVision and Pattern Recognition 2018 pp 918ndash927 Salt LakeCity UT USA June 2018

[2] J Ku M Mozifian J Lee A Harakeh and S L WaslanderldquoJoint 3D proposal generation and object detection from viewaggregationrdquo in Proceedings of the 2018 IEEERSJ Interna-tional Conference on Intelligent Robots and Systems (IROS)pp 1ndash8 Madrid Spain October 2018

[3] M Liang B Yang S Wang and R Urtasun ldquoDeep con-tinuous fusion for multi-sensor 3D object detectionrdquo inProceedings of the European Conference on Computer Vision(ECCV) pp 663ndash678 Munich Germany September 2018

[4] C Yang C Chen W He R Cui and Z Li ldquoRobot learningsystem based on adaptive neural control and dynamicmovement primitivesrdquo IEEE Transactions on Neural Networksand Learning Systems vol 30 no 3 pp 777ndash787 2019

[5] J Biswas and M Veloso ldquoDepth camera based indoor mobilerobot localization and navigationrdquo in Proceedings of the 2012IEEE International Conference on Robotics and Automationpp 1697ndash1702 St Paul MIN USA May 2012

[6] Y Zhu R Mottaghi E Kolve et al ldquoTarget-driven visualnavigation in indoor scenes using deep reinforcementlearningrdquo 2017 httparxivorgabs160905143

[7] H Wang S Wang J Yao R Pan and J Yang Effective Anti-collision Algorithms for RFID Robots System AssemblyAutomation Ahead-Of-Print (2019)

[8] A Golovinskiy V G Kim and T Funkhouser ldquoShape-basedrecognition of 3D point clouds in urban environmentsrdquo inProceedings of the 2009 IEEE 12th International Conference onComputer Vision Kyoto Japan October 2009

[9] P Gu F Zhou D Yu F Wan W Wang and B Yu ldquoA 3Dreconstructionmethod usingmultisensor fusion in large-scaleindoor scenesrdquo Complexity vol 2020 Article ID 697379014 pages 2020

[10] H Qiao Y Li T Tang and P Wang ldquoIntroducing memoryand association mechanism into a biologically inspired visualmodelyrdquo IEEE Transactions on Cybernetics vol 44pp 1485ndash1496 2014

[11] H Qiao M Wang J Su S Jia and R Li ldquo2e concept ofldquoattractive region in environmentrdquo and its application in high-precision tasks with low-precision systemsrdquo IEEEASMETransactions on Mechatronics vol 20 no 5 pp 2311ndash23272015

[12] C Yang H Wu Z Li W He N Wang and C-Y Su ldquoMindcontrol of A robotic arm with visual fusion Technologyrdquo IEEETransactions on Industrial Informatics vol 14 no 9pp 3822ndash3830 2018

[13] C Yang C Zeng C Fang W He and Z Li ldquoA DMPs-basedframework for robot learning and generalization of hu-manlike variable impedance skillsrdquo IEEEASME Transactionson Mechatronics vol 23 no 3 pp 1193ndash1203 2018

[14] Z Zhao C K Ahn and H-X Li ldquoBoundary antidisturbancecontrol of a spatially nonlinear flexible string systemrdquo IEEE

Transactions on Industrial Electronics vol 67 no 6pp 4846ndash4856 2020

[15] Z Zhao C K Ahn and H-X Li ldquoDead zone compensationand adaptive vibration control of uncertain spatial flexibleriser systemsrdquo IEEEASME Transactions on Mechatronicsvol 25 no 3 pp 1398ndash1408 2020

[16] G Vosselman and S Dijkman ldquo3D building model recon-struction from point clouds and ground plansrdquo in Proceedingsof the ISPRS Workshop Land Surface Mapping and Charac-terization Using Laser Altimetry pp 37ndash43 Annapolis MDUSA October 2001

[17] R B Rusu N Blodow and M Beetz ldquoFast point featurehistograms (FPFH) for 3D registrationrdquo in Proceedings of the2009 IEEE International Conference on Robotics andAutomation Kobe Japan May 2009

[18] F Tombari S Salti and L D Stefano ldquoUnique signatures ofhistograms for local surface descriptionrdquo in Proceedings of the11th European Conference on Computer Vision ECCV 2010Heraklion Crete Greece September 2010

[19] R Q Charles H Su M Kaichun and L J Guibas ldquoPointNetdeep learning on point sets for 3D classification and seg-mentationrdquo in Proceedings of the 2017 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 77ndash85Honolulu HI USA July 2017

[20] C R Qi L Yi H Su and L J Guibas ldquoPointNet++ deephierarchical feature learning on point sets in a metric spacerdquo2017 httparxivorgabs170602413

[21] M S W W X D Yangyan Li R Bu and B ChenldquoPointCNN convolution on X-transformed pointsrdquo inProceedings of the Advances in Neural Information ProcessingSystems(NIPS) pp 828ndash838 Denver CO USA December2018

[22] M Atzmon H Maron and Y Lipman ldquoPoint convolutionalneural networks by extension operatorsrdquo InternationalConference on Computer Graphics and Interactive Techniquesvol 37 p 71 2018

[23] M Jiang Y Wu T Zhao Z Zhao and C Lu ldquoPointSIFT aSIFT-like network module for 3D point cloud semanticsegmentationrdquo 2018 httparxivorgabs180700652

[24] M Gori G Monfardini and F Scarselli ldquoA new model forlearning in graph domainsrdquo in Proceedings of the 2005 IEEEInternational Joint Conference on Neural Networks pp 729ndash734 Montreal Que Canada December 2005

[25] F Scarselli M Gori A C Ah Chung Tsoi M Hagenbuchnerand G Monfardini ldquo2e graph neural network modelrdquo IEEETransactions on Neural Networks vol 20 no 1 pp 61ndash802009

[26] Y Wang Y Sun Z Liu S E Sarma M M Bronstein andJ M Solomon ldquoDynamic graph CNN for learning on pointcloudsrdquo ACM Transactions on Graphics vol 38 p 146 2019

[27] K Zhang M Hao J Wang C W de Silva and C Fu ldquoLinkeddynamic graph CNN learning on point cloud via linkinghierarchical featuresrdquo 2019 httparxivorgabs190410014

[28] G Te W Hu Z Guo and A Zheng ldquoRGCNN regularizedgraph CNN for point cloud segmentationrdquo 2018 httparxivorgabs180602952

[29] X Gao W Hu and Z Guo ldquoExploring structure-adaptivegraph learning for robust semi-supervised classificationrdquo2020 httparxivorgabs190410146

[30] Q Lu C Chen W Xie and Y Luo ldquoPointNGCNN deepconvolutional networks on 3D point clouds with neighbor-hood graph filtersrdquo Computers amp Graphics vol 86 pp 42ndash512020

8 Complexity

[31] A Vaswani N Shazeer N Parmar et al ldquoAttention is all youneedrdquo 2017 httparxivorgabs170603762

[32] V Mnih N Heess A Graves and K Kavukcuoglu ldquoRe-current models of visual attentionrdquo 2014 httparxivorgabs14066247

[33] P Velickovic G Cucurull A Casanova A Romero P Lioand Y Bengio ldquoGraph attention networksrdquo 2018 httparxivorgabs171010903

[34] C Chen L Z Fragonara and A Tsourdos ldquoGAPNet graphattention based point neural network for exploiting localfeature of point cloudrdquo 2019 httparxivorgabs190508705

[35] L Wang Y Huang Y Hou S Zhang and J Shan ldquoGraphattention convolution for point cloud semantic segmenta-tionrdquo in Proceedings of the Computer Vision and PatternRecognition pp 10296ndash10305 Long Beach CA USA January2019

[36] M Feng L Zhang X Lin S Z Gilani and A Mian ldquoPointAttention network for semantic segmentation of 3D pointcloudsrdquo 2019 httparxivorgabs190912663

[37] M Niepert M Ahmed and K Kutzkov ldquoLearning con-volutional neural networks for graphsrdquo 2016 httparxivorgabs160505273

[38] J Li B M Chen and G H Lee ldquoSO-Net self-organizingnetwork for point cloud analysisrdquo in Proceedings of theComputer Vision and Pattern Recognition pp 9397ndash9406 SaltLake City UT USA June 2018

[39] L Yu X Li C-W Fu D Cohen-Or P-A Heng and PU-NetldquoPoint cloud upsampling networkrdquo in Proceedings of theComputer Vision and Pattern Recognition pp 2790ndash2799 SaltLake City UT USA June 2018

[40] R Klokov and V Lempitsky ldquoEscape from cells deep kd-networks for the recognition of 3D point cloud modelsrdquo inProceedings of the Computer Vision and Pattern Recognitionpp 863ndash872 Honolulu HI USA July 2017

[41] Y You Y Lou Q Liu et al ldquoPointwise rotation-invariantnetwork with adaptive sampling and 3D spherical voxelconvolutionrdquo 2018 httparxivorgabs181109361

[42] W Wu Z Qi and L Fuxin ldquoPointConv deep convolutionalnetworks on 3D point cloudsrdquo in Proceedings of the ComputerVision and Pattern Recognition pp 9621ndash9630 Long BeachCA USA June 2019

[43] H 2omas C R Qi J-E Deschaud B MarcoteguiF Goulette and L J Guibas ldquoKPConv flexible and de-formable convolution for point cloudsrdquo in Proceedings of theIEEE International Conference on Computer Visionpp 6411ndash6420 Long Beach CA USA June 2019

[44] Y Zhao T Birdal H Deng and F Tombari ldquo3D pointCapsule networksrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 1009ndash1018 Long BeachCA USA June 2019

[45] N Srivastava H Goh and R Salakhutdinov ldquoGeometriccapsule autoencoders for 3D point cloudsrdquo 2019 httparxivorgabs191203310

[46] A Cheraghian and L Petersson ldquo3DCapsule extending thecapsule architecture to classify 3D point cloudsrdquo in Pro-ceedings of the IEEE International Conference on ComputerVision pp 1194ndash1202 Long Beach CA USA June 2019

[47] Y Xu T Fan M Xu L Zeng and Y Qiao ldquoSpiderCNN deeplearning on point sets with parameterized convolutional fil-tersrdquo in Proceedings of the European Conference on ComputerVision (ECCV) pp 87ndash102 Munich Germany September2018

[48] D Boscaini J Masci S Melzi M M Bronstein U Castellaniand P Vandergheynst ldquoLearning class-specific descriptors for

deformable shapes using localized spectral convolutionalnetworksrdquo Proceedings of the Eurographics Symposium onGeometry Processing vol 34 pp 13ndash23 2015

[49] L Yi H Su X Guo and L Guibas ldquoSyncSpecCNN syn-chronized spectral CNN for 3D shape segmentationrdquo inProceedings of the 2017 IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 6584ndash6592 HonoluluHI USA July 2017

[50] M Defferrard X Bresson and P Vandergheynst ldquoCon-volutional neural networks on graphs with fast localizedspectral filteringrdquo in Proceedings of the 30th InternationalConference on Neural Information Processing SystemsNIPSrsquo16 pp 3844ndash3852 Barcelona Spain December 2016

[51] H Xiu T Shinohara and M Matsuoka ldquoDynamic-scalegraph convolutional network for semantic segmentation of 3dpoint cloudrdquo in Proceedings of the 2019 IEEE InternationalSymposium on Multimedia (ISM) pp 271ndash2717 Diego CAUSA December 2019

[52] N Verma E Boyer and J Verbeek ldquoFeaStNet feature-steered graph convolutions for 3D shape analysisrdquo 2018httparxivorgabs170605206

[53] J B Lee R A Rossi S Kim N K Ahmed and E KohldquoAttention models in graphsrdquo ACM Transactions onKnowledge Discovery From Data vol 13 no 6 pp 1ndash25 2019

[54] Z Wu S Song A Khosla et al ldquo3D ShapeNets a deeprepresentation for volumetric shapesrdquo in Proceedings of the2015 IEEE Conference on Computer Vision and Pattern Rec-ognition (CVPR) pp 1912ndash1920 Boston MA USA June2015

Complexity 9

grouping layers to extract local features Graph neuralnetworks [24 25] can not only directly address a moregeneral class of graphs eg cyclic directed and undirectedgraphs but also be applied to deal with point cloud dataRecently DGCNN [26] and its variant [27] well utilized thegraph network with respect to the edgesrsquo convolution onpoints and then obtained the local edgesrsquo information ofpoint cloud Other relevant works applying the graphstructure of point cloud can be found in [28ndash30]

Attention mechanism plays a significant role in machinetranslation task [31] vision-based task [32] and graph-basedtask [33] Combining graph structure and attentionmechanism some favorable network architectures areconstructed which leverage well the local semantic featuresof point cloud Readers can refer to [34ndash36]

However the scale of different graphs for the existed graphnetworks are fixed then the semantic expression of the pointwill not be goodHence in thiswork inspired by graph attentionnetwork [33] graph convolution network [37] and local con-textual information networks we design a multiscale receptivefieldsrsquo graph attention network for point cloud classificationUnlike previous models that only consider the attribute infor-mation such as coordinate of each single point or only exploitlocal semantic information of point we pay attention to thespatial context information of both local and global structure forpoint cloud Finally like the standard convolution in grid do-main our model can also be efficiently implemented for thegraph representation of a point cloud

2e key contributions of our work are summarized asfollows

(i) We construct graph of local patch for point cloudand then enhance the feature representation ofpoint in point cloud by combining edgesrsquo infor-mation and neighborsrsquo information

(ii) We introduce a multiscale receptive fieldsrsquo mech-anism to capture the local semantic features invarious ranges for point cloud

(iii) We balance the influence between neighbors andcentroid in the local graph by means of attentionmechanism

(iv) We release our code to facilitate reproducibility andfuture research (httpsgithubcomBlue-GiantMRFGATndashNET)

2e rest parts of this paper are structured as follows InSection 2 we review the most closely related literatures on point

cloud In Section 3 we introduce our proposed MRFGAT ar-chitecture and provide the details of our framework in terms ofshape classification for point cloud We describe the dataset anddesign comparison algorithms in Section 4 followed by theexperimentsrsquo results and discussion Finally some concludingremarks are made in Section 5

2 Related Works

21 Pointwise MLP and Point Convolution NetworksUtilizing the deep learning technique the classical PointNet [19]was proposed to deal with directly unordered point cloudswithout using any volumetric or grid-mesh representation 2emain idea of this network is as follows At first a SpatialTransformer Network (STN) module similar to feature-extracting process is constructed which guarantees the invari-ance of transformations 2en a shared pointwise Multilayer-Perceptron (MLP)module is introduced which is used to extractsemantic features form point sets At last the final semanticinformation of point cloud is aggregated by means of a maxpooling layer Due to the favorable ability to approximate anycontinuous function for MLP which is easy to implement bypoint convolution some relatedworkswere presented accordingto the PointNet architecture [38 39]

Similar to convolution operator in 2D space someconvolution kernels for points in 3D space are designedwhich can capture the abundant information of point cloudPointCNN [21] used a local X-transformation kernel tofulfill the invariance of permutation for points and thengeneralized this technique to the hierarchical form inanalogy to that of image CNNs 2e authors in [40ndash42]extended the convolution operator of 2D space applied atindividual point in local region of point cloud and thencollected the neighborsrsquo information in the hierarchicalconvolution layer to the center point Kernel Point Con-volution (KPConv) [43] consists of a set of local 3D filtersand overcomes stand point convolution limitation 2isnovel kernel structure is very flexible to learn local geometricpatterns without any weights

22 Learning Local Features In order to overcome theshortcoming for PointNet-like networks which fail to exploitlocal features some hierarchical architectures have beendeveloped for example PointNet [20] and So-Net [38] toaggregate local information with MLP operation by con-sidering local spatial relationships of 3D data In contrast tothe previous type these methods can avoid sparsity andupdate dynamically in different feature dimensionsAccording to a Capsule Networks 3D Capsule Convolu-tional Networks were developed which can learn well thelocal features of point cloud see [44ndash46]

23 Graph Convolutional Networks Graph ConvolutionalNeural Networks (GCNNs) have gained more and moreattraction to address irregularly structured data such ascitation networks and social networks In terms of 3D pointcloud data GCNNs have shown its powerful ability onclassification and segmentation tasks Using the convolution

N times

d Extraction

N times

dprime

Aggregation

Feat

ure v

ecto

r

Classifier K

Figure 1 A typical framework of a 3D point cloud classificationprocess 2e model takes N points in d-dimensional space as itsinput and then extracts point features of dimension dprime followed byan aggregation module used to form a feature vector which isinvariant to point permutation Finally a classifier is used to classifythe resulting feature vector into K categories

2 Complexity




3 Our Approach



Φ P⟶ Lk (1)







(2)



Complexity 3


(3)




αij exp aij1113872 1113873

1113936kexp aik( 1113857

βij exp bij1113872 1113873

1113936kexp bik( 1113857

(4)


pi

pi2pi6

pi3pi5

pi1

pi4

ei1

ei2

ei3

ei4

ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi


eprimei1

eprimei2

eprimei3

eprimei4

eprimei5

eprimei6

ei1

ei2

ei3

ei4

ei5

ei6

Edgesprimeattention


Edges-coef

Neighbors-coef

ei1 ei2

ei3

ei4ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi


4 Complexity

1113957xi f 1113944j


j


⎛⎝ ⎞⎠ (5)



1113957xi Mm11113957x

(m)i (6)




4 Experiments


41 Classification





Complexity 5


Inpu

tPoi

ntCl

oud

N times

3

SRFGAT1

SRFGAT2

middotmiddotmiddot

SRFGATM


Con

cat


MLP

N times

102

4

1024

Maxpooling

MLP

shared

Clas

sific

atio

n

||

||


1

15

2

25

3

35

138

14

142

144

650 700 800600 750

200 400 600 800 1000 12000


1

095

09

085

08

075

07

650 750 800600 700

095

096

097

098

1200400 600 800 10002000


6 Complexity

5 Conclusion



Data Availability




100 150 200

14

16

18

2

10050 150 200 2500

145

146

147

(a)

100 150 200

10050 150 200 2500065

07

075

08

085

09

095

09

091

(b)

10050 150 200 2500

06

07

08

09

086

088

100 150 200

(c)




Complexity 7

Acknowledgments


References
































8 Complexity


























Complexity 9




3 Our Approach



Φ P⟶ Lk (1)







(2)



Complexity 3


(3)




αij exp aij1113872 1113873

1113936kexp aik( 1113857

βij exp bij1113872 1113873

1113936kexp bik( 1113857

(4)


pi

pi2pi6

pi3pi5

pi1

pi4

ei1

ei2

ei3

ei4

ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi


eprimei1

eprimei2

eprimei3

eprimei4

eprimei5

eprimei6

ei1

ei2

ei3

ei4

ei5

ei6

Edgesprimeattention


Edges-coef

Neighbors-coef

ei1 ei2

ei3

ei4ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi


4 Complexity

1113957xi f 1113944j


j


⎛⎝ ⎞⎠ (5)



1113957xi Mm11113957x

(m)i (6)




4 Experiments


41 Classification





Complexity 5


Inpu

tPoi

ntCl

oud

N times

3

SRFGAT1

SRFGAT2

middotmiddotmiddot

SRFGATM


Con

cat


MLP

N times

102

4

1024

Maxpooling

MLP

shared

Clas

sific

atio

n

||

||


1

15

2

25

3

35

138

14

142

144

650 700 800600 750

200 400 600 800 1000 12000


1

095

09

085

08

075

07

650 750 800600 700

095

096

097

098

1200400 600 800 10002000


6 Complexity

5 Conclusion



Data Availability




100 150 200

14

16

18

2

10050 150 200 2500

145

146

147

(a)

100 150 200

10050 150 200 2500065

07

075

08

085

09

095

09

091

(b)

10050 150 200 2500

06

07

08

09

086

088

100 150 200

(c)




Complexity 7

Acknowledgments


References
































8 Complexity


























Complexity 9


(3)




αij exp aij1113872 1113873

1113936kexp aik( 1113857

βij exp bij1113872 1113873

1113936kexp bik( 1113857

(4)


pi

pi2pi6

pi3pi5

pi1

pi4

ei1

ei2

ei3

ei4

ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi


eprimei1

eprimei2

eprimei3

eprimei4

eprimei5

eprimei6

ei1

ei2

ei3

ei4

ei5

ei6

Edgesprimeattention


Edges-coef

Neighbors-coef

ei1 ei2

ei3

ei4ei5

ei6

pi2pi6

pi3pi5

pi1

pi4

pi


4 Complexity

1113957xi f 1113944j


j


⎛⎝ ⎞⎠ (5)



1113957xi Mm11113957x

(m)i (6)




4 Experiments


41 Classification





Complexity 5


Inpu

tPoi

ntCl

oud

N times

3

SRFGAT1

SRFGAT2

middotmiddotmiddot

SRFGATM


Con

cat


MLP

N times

102

4

1024

Maxpooling

MLP

shared

Clas

sific

atio

n

||

||


1

15

2

25

3

35

138

14

142

144

650 700 800600 750

200 400 600 800 1000 12000


1

095

09

085

08

075

07

650 750 800600 700

095

096

097

098

1200400 600 800 10002000


6 Complexity

5 Conclusion



Data Availability




100 150 200

14

16

18

2

10050 150 200 2500

145

146

147

(a)

100 150 200

10050 150 200 2500065

07

075

08

085

09

095

09

091

(b)

10050 150 200 2500

06

07

08

09

086

088

100 150 200

(c)




Complexity 7

Acknowledgments


References
































8 Complexity


























Complexity 9

1113957xi f 1113944j


j


⎛⎝ ⎞⎠ (5)



1113957xi Mm11113957x

(m)i (6)




4 Experiments


41 Classification





Complexity 5


Inpu

tPoi

ntCl

oud

N times

3

SRFGAT1

SRFGAT2

middotmiddotmiddot

SRFGATM


Con

cat


MLP

N times

102

4

1024

Maxpooling

MLP

shared

Clas

sific

atio

n

||

||


1

15

2

25

3

35

138

14

142

144

650 700 800600 750

200 400 600 800 1000 12000


1

095

09

085

08

075

07

650 750 800600 700

095

096

097

098

1200400 600 800 10002000


6 Complexity

5 Conclusion



Data Availability




100 150 200

14

16

18

2

10050 150 200 2500

145

146

147

(a)

100 150 200

10050 150 200 2500065

07

075

08

085

09

095

09

091

(b)

10050 150 200 2500

06

07

08

09

086

088

100 150 200

(c)




Complexity 7

Acknowledgments


References
































8 Complexity


























Complexity 9


Inpu

tPoi

ntCl

oud

N times

3

SRFGAT1

SRFGAT2

middotmiddotmiddot

SRFGATM


Con

cat


MLP

N times

102

4

1024

Maxpooling

MLP

shared

Clas

sific

atio

n

||

||


1

15

2

25

3

35

138

14

142

144

650 700 800600 750

200 400 600 800 1000 12000


1

095

09

085

08

075

07

650 750 800600 700

095

096

097

098

1200400 600 800 10002000


6 Complexity

5 Conclusion



Data Availability




100 150 200

14

16

18

2

10050 150 200 2500

145

146

147

(a)

100 150 200

10050 150 200 2500065

07

075

08

085

09

095

09

091

(b)

10050 150 200 2500

06

07

08

09

086

088

100 150 200

(c)




Complexity 7

Acknowledgments


References
































8 Complexity


























Complexity 9

5 Conclusion



Data Availability




100 150 200

14

16

18

2

10050 150 200 2500

145

146

147

(a)

100 150 200

10050 150 200 2500065

07

075

08

085

09

095

09

091

(b)

10050 150 200 2500

06

07

08

09

086

088

100 150 200

(c)




Complexity 7

Acknowledgments


References
































8 Complexity


























Complexity 9

Acknowledgments


References
































8 Complexity


























Complexity 9


























Complexity 9

multiscalereceptivefieldsgraphattentionnetworkforpoint...

Documents