building a grid-point cloud-semantic map based on graph...

Building a grid-point cloud-semantic map based ongraph for the navigation of intelligent wheelchair

Cheng ZHAO*, Huosheng HU, Dongbing GUSchool of Computer Science and Electrical Engineering, University of Essex CO4 3SQ, Colchester, UK

Email: [email protected], [email protected], [email protected]

Abstract—The automatic navigation of the intelligentwheelchair is a major challenge for its applications. Mostprevious researches mainly focus on 2D navigation of intelli-gent wheelchair, which loses many useful environment informa-tion. This paper proposed a novel Grid-Point Cloud-SemanticMulti-layered Map based on graph optimization for intelligentwheelchair navigation. For mapping, the 2D grid map is at thebottom, the 3D point cloud map is on the grid map and the seman-tic markers are labelled in them. The semantic markers combinethe name and coordinate value of object marker together. Fornavigation, the wheelchair uses the grid map for localization andpath planning, uses the point cloud map for feature extraction andobstacle avoidance, and uses the semantic markers for human-robot vocal interaction. A number of experiments are carried outin real environments to verify its feasibility.

Keywords—Graph-based SLAM, Grid Map, Point Cloud Map,Semantic Labelling, Navigation

I. INTRODUCTION

Intelligent wheelchair is a very useful assisted livingapplication for the elderly and disabled people, which candecrease the pressure of ageing of population. The availabilityof automatical navigation of intelligent wheelchair is neces-sary in hospital, office and home. In the past decade, thereare many notable successes for the simultaneous localizationand mapping(SLAM) in robotics community. There are fourmainly kinds of typical map now: grid map, point cloud map,topological map and semantic map. There are many navigationresearches that focus on different type of maps.

The outline of grid and point cloud mapping is similarand it can be classified as filtering, smoothing and graph-based approaches. The state of filtering approach consists incurrent robot position and map, which can perform a onlinestate estimation. When the new measurement is available, theestimate is augmented. The Kalman [1][2], particle [3][4], in-formation [5][6] filter are very popular online SLAM methods.In term of smoothing approach [7][8] that relies on least-square error minimization estimate the whole trajectory ofrobot through the whole set of measurements. Recently, thegraph-based SLAM [9][10][11] achieved many breakthroughs,which constructs a graph out of the raw sensor measurements.The graph consists of the nodes each of which represents therobot position with the measurements from this position andedges each of which represents the spatial constraint betweentwo robot poses. The constraint is actually the transformationrelating two robot poses and it can be calculated throughodometry measurements or aligning the observations. Afterconstructing the graph, the next step is graph optimization:calculating a most likely configuration that best satisfies the

constrains. However online processing for global loop closuredetection is difficult in large-scale and long-term environment,the work in [12] proposed a graph-based memory managementapproach that only considers portions of map satisfy onlineprocessing requirements. The only difference between grip andpoint cloud mapping is that replacing the 2D scan matchingand loop closure detecting with counterparts that operate 3Dpoint cloud.

The work in [13] proposed a hybrid map that integrates gridand topological map for indoor navigation. The work in [14]proposed an approach to segment and detect the room spacesfor navigation using range data in office environment. Usingthe anchoring technique, [15] labels the topological map withsemantic information for navigation. The works in [16][17] in-troduced an approach to learn topological maps from geometricmap by applying semantic classification procedure based onassociative Markov networks and AdaBoost. Many work usedvisual features from camera sensor like [18] to extract semanticinformation via place categorization. The paper [19] integrateda robust image labelling method with a dense 3D semanticmap. In addition, many work added rich semantic informationto the map through human-robot interaction for navigation. In[20], a contextual topological map was built through makingthe robot follow the user and verbal commentary. The workin [21] built the semantic map through clarification naturallanguage dialogues between human and robot. The system in[22] integrated laser and vision sensors for place and objectrecognition as well as a linguistic framework, which createconceptual representation of human-made indoor environment.The work in [23] summarized many multi-modal interactionssuch as speech, gesture and vision for semantic labelling,which can make the robot get rich environment knowledgewithout many pre-requisites features.

Different types of map have their own advantages anddisadvantages for navigation. Grid map is easy to make pathplanning but it does not have enough 3D features and semanticinformation. Point cloud map is easy to get enough featuresinformation but it is hard to use for path planning because ofhigh complexity. Semantic map can provide the knowledge ofenvironment for human-robot interaction but the robot can notperform any action based on it. It is necessary to combine dif-ferent type of map together in order to make the navigation ofrobot more efficiently. This paper proposes a grid-point cloud-semantic multi-layered map for the navigation of intelligentwheelchair and carries out some related experiments in realenvironment. The rest of this paper is organized as follows.Section 2 introduces the intelligent wheelchair platform andoutlines the basic system workflow. Section 3 outlines the

Proceedings of the 21th International Conference onAutomation & Computing, University of Strathclyde,Glasgow, UK, 11-12 September 2015

222

approach deployed in this research. In section 4, some relatedexperiment results are presented to verify the performanceof the proposed approach. Finally, section 5 makes a briefconclusion and future work.

II. SYSTEM OVERVIEW

A. Robot Platform

The platform used in this work is a commercial intelli-gent wheelchair Spectra Plus equipped with a Kinect whichprovides RGB and depth information, a Hokuyo URG-04LX-UG01 Laser which provides 2D laser scan information andoptical encoders which provide odometry transformation infor-mation. Computation is preformed on an on board PC (IntelPentium(R) 2.30 GHz, 8G RAM and Integrated Graphics)installed with Ubuntu 14.04 and ROS Indigo. The platformsnapshot is shown in Figure 1.

Fig. 1: The platform snapshot.

B. Architecture Overview

The mapping module can be broken up into three modules:Frontend, Backend and Map creation. The frontend mainlyobtains the sensor data, extracts the features and calculatesthe geometric relationships between robot and landmarks. Thebackend mainly constructs a graph, preforms loop closuredetection and optimizes the graph structure. Finally, thismodule outputs a maximum likelihood solution of the robottrajectory. In addition, it also performs a model matchingthrough comparing the object database with the features fromRGB frames. If the model matching is successful, the masscentre position of the object will be calculated through thedepth images. This position is combined with the object nametogether in a semantic marker. In terms of map creation, the2D grid map and 3D point map are generated based on thetrajectory. Then the semantic markers are labelled to them.Combining them together, the multi-layered map is finallygenerated.

The navigation module consists of localization, getting adestination, path planning. The wheelchair can perform thelocalization based on the trajectory from graph SLAM. Thenthe user tells the wheelchair the destination name using vocalinterface. The wheelchair can get the position of destinationfrom semantic markers. Finally, the wheelchair performs pathplanning using grid map and obstacle avoidance using pointcloud map. The architecture overview is shown in Figure 2.

III. APPROACHES

A. Pose graph outline

The map of graph-based is a graph consisting of nodes andedges like G = {V,E}. The nodes [12] contain the odometryposes, laser scans, RGB and depth images of each location. Inaddition, they also save visual words which are used for loopclosure detection. The edges are mode of neighbors and loopclosures, which save the geometrical transformation betweennodes. When the odometry transformation between the currentand previous nodes is generated, the neighbor edges are addedto the graph. When a loop closure detection is found betweenthe current node and one of the previous maps, the loop closureedges are added to the graph. After the construction of graph,the graph optimization is performed which calculates the mostlikely configuration that best satisfies the constraints of edges.

Suppose x = (x1 . . . x2)T is a vector of parameters which

represent the configuration of nodes. ωij and Ωij represent themean and the information matrix of the observation of node jand i respectively. For the state x, fij(x) is the function whichcan calculate the observation of the current state. The residualrij can be calculated via equation (1).

rij(x) = ωij − fij(x) (1)

The amount of error introduced by constraints which are cal-culated by real or visual odometry, weighed by its information,can be obtained via equation (2).

dij(x)2 = rij(x)

TΩijrij(x) (2)

Assuming all the constraints to be independent, the overallerror can be calculated through equation (3).

D2(x) =∑

(ij)∈φdij(x)

2 =∑

(ij)∈φrij(x)

TΩijrij(x) (3)

where dij(x)2 is residual of the edge which connects node i

and j in graph φ. So the key of graph-based SLAM is to finda state x∗ that minimizes the overall error.

x∗ = argminx

∑

(ij)∈φrij(x)

TΩijrij(x) (4)

Compactly, it is rewritten in equation (5)

x∗ = argminx

∑

(ij)∈φ‖rij(x)‖2∑

ij(5)

where∑

ij = Ω−1ij is the corresponding covariance matrix of

the information matrix.

B. Loop closure detection

The approach of loop closure detection [12][24] is used inthis work. They use a Bayesian filter to evaluate loop closurehypotheses over all previous images. If the hypothesis reachesthe pre-defined threshold H , a loop closure is detected. Inorder to compute the likelihood required by the filter, the ORBfeatures is used for an incremental visual words consideringthe low computing power of the on board PC. If you have apowerful on board PC, the best choice is SIFT feature withGPU. For every 2D point from RGB image where the visualwords are extracted, the relating 3D position can be calculatedvia calibration matrix and the depth image. So, every visual

223

Kinect

Wheelodometry

Laser

RGB

Depth

2D Scan

Transformation

Point cloud subsampling

Featuresextraction RANSAC

ICP

Scan Matching

Loop detection&Graph optimization Trajectory

2D map creation

3D map creation

Model matching

Object features database

Semantic markers

Multi-layeredmap

Kinect

Wheelodometry

Laser

RGB

Depth

2D Scan

Transformation

Point cloud subsampling

Featuresextraction RANSAC

ICP

Scan Matching

Loop detection&Graph optimization Trajectory

Model matching

Object features database

2D map creation

3D map creation

Semantic markers

Multi-layeredmap

Frontend Backend Map creation

Position calcultation

Obstacle avoidance

Path planning

Destination position

Vocal interface

Reach thedestination

Obstacle avoidance

Path planning

Destinationposition

Vocal interface

Reach thedestination

Navigation

Fig. 2: Architecture overview.

words’ 3D position can be obtained. Using the 3D visual wordcorrespondences, every transformation between the matchingimages is computed via RANSAC if a loop closure is found.When the minimum of I inliers is found, the loop closure issuccessful and the related edge is added to the graph.

C. Graph optimization

For graph optimization, the G2O [25][26] framework isused in our work. This approach performs a minimization ofnon-linear error function that can be represented as a graph.G2O can minimize the error in a graph in which verticesare parameterized by transformation and edges represent con-straints between vertices with associated covariance matrices.It maximizes the likelihood of the vertex parameters subjectto the constraints using stochastic gradient descent. When therobot revisits a known part of the map, the loop closure edgesin the graph can diminish the accumulated error introduced byodometry.

D. Map representation

Based on the trajectory outputted from graph-based SLAM,the laser scans are assembled together and the 2D grid mapis generated at the bottom, meanwhile the RGB and depthimages are assembled together and the 3D point cloud mapis generated on the grid map. In terms of semantic labelling,firstly a database of objects’ features is built. Then a modelmatching is performed comparing the extracted features fromRGB images and the features in object database. If a modelmatching is successful, the centre of mass position of the foundobject can be calculated through corresponding depth image.This position is combined with object name in a semanticmarker. Due to the low computing power of the on boardPC, the ORB feature is selected as the feature extraction anddescription. Lastly, the semantic markers are labelled in the2D and 3D map.

E. Navigation

”Where am I”,”Where shall I go” and ”How to get there”are the three central issues of navigation. For the first issue, theintelligent wheelchair can get its position through the trajectory

outputted from graph-base SLAM. For the second issue, theuser can tell wheelchair the destination name using vocalinterface through bluetooth earphone. The open source speechrecognition toolkit Pocket Sphinx [27] is used for human-robot vocal interaction, which is easy to use for training(justupload the semantic text to the tool website). The voice will betransformed to text and then a string matching is performed.Finally the wheelchair can find the position of correspondingdestination name from semantic markers. For the third issue,the wheelchair transforms the grid map to a cost map [28][29],and then uses Dijkstra algorithm to search the cost map to finda route that has minimal sum of the cost value. For obstacleavoidance, the obstacles and ground are segmented by normalfiltering using point cloud map. All points with normal in the zdirection are labelled as ground and all the others are labelledas obstacles. Lastly, movebase package [30] in ROS is usedfor the motion planning of intelligent wheelchair.

IV. EXPERIMENTS

A. 2D grid map

As shown in the Figure 3, the 2D grid map is generatedusing laser scans. The blue line represents the trajectory ofthe wheelchair and the small blue point represents the graphnode. The yellow line connected with two nodes representsthat there is a loop closure between them. There are manyloop closures in A, B, C because the wheelchair rotated by360 degrees there, while there are also many loop closures inD, E, F because the wheelchair moved with a loop trajectory.

B. 3D point cloud map

As shown in the Figure 5, the 3D point cloud map isgenerated using RGB-D information. The whole point cloudmap are made of a large number of RGB-D frames obtainedfrom every node in the graph. Those frames are alignedtogether through the transformation constrains.

C. Semantic labelling

The object database stores the ORB features of Baxter, Foxrobot, Robot fish, Fly robot and Robot dog. By extracting the

224

A

B

C

D

E

F

B

F

E

D

A

C

Fig. 3: 2D grid map.

Fig. 4: 3D semantic labelling.

225

Fig. 5: 3D point cloud map.

A

B

C

D

E

F

G

H

Fig. 6: Multi-layered map.

226

Fig. 7: Navigation in hybrid map.

ORB features from RGB images, these robots were found inthe process of mapping, as show in the colorized box fromFigure 4. The positions of their centre mass were calculatedusing the depth image. As shown in Figure 4, there is a yellowline connecting the wheelchair and the object marker. Finally,those semantic markers (X,Y, Z) will be labelled in the 3Dpoint cloud map.

D. Multi-layered map

As shown in the Figure 6, the multi-layered map is gen-erated: the grid map A is at the bottom, the point cloud mapB is on the grid map and the semantic markers C, D, E, F, Gare labelled as green points in the point cloud map. The greenline represents the trajectory of the wheelchair.

E. Navigation

As shown in the Figure 7, in the point cloud map, theobstacles are inflated and labelled as red colour, the groundis labelled as green colour. The blue and red line on thegrid map are the outcomes of global and local path planningrespectively. All the mapping and navigation videos of ex-periments can be found https://www.youtube.com/channel/UC-Tl5R4MBKIXsGI0sFlJuXQ

V. CONCLUSION

As there are some obviously drawbacks when the intel-ligent wheelchair performs navigation only using one single

type of map, in this paper a Grid-Point Cloud-Semantic Multi-layered map based on graph optimization for the navigation isproposed. The grid map at the bottom is used for localizationand path planning, the point cloud map on the grid map is usedfor features extraction and obstacles avoidance, the semanticmarkers is used for human-robot vocal interaction. This hybridmap can make intelligent wheelchair perform navigation taskmore efficiently. In the future, our work will mainly focuson adding more rich semantic information to this hybrid mapthrough 3D labelling.

ACKNOWLEDGEMENT

The 1st author is financially supported by scholarships fromChina Scholarship Council and University of Essex, U.K.

REFERENCES

[1] C. P. Smith R, Self M, Estimating Uncertain Spatial Relationships inRobotics. Springer New York, 1990.

[2] M. W. M. Gamini Dissanayake, P. Newman, S. Clark, H. F. Durrant-Whyte, and M. Csorba, “A solution to the simultaneous localizationand map building (SLAM) problem,” IEEE Transactions on Roboticsand Automation, vol. 17, no. 3, pp. 229–241, 2001.

[3] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “FastSLAM:A factored solution to the simultaneous localization and mapping prob-lem,” in Proc. of 8th National Conference on Artificial Intelligence/14thConference on Innovative Applications of Artificial Intelligence, vol. 68,no. 2, 2002, pp. 593–598.

[4] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for gridmapping with Rao-Blackwellized particle filters,” IEEE Transactions onRobotics, vol. 23, pp. 34–46, 2007.

227

[5] R. M. Eustice, H. Singh, and J. J. Leonard, “Exactly sparse delayed-statefilters,” in Robotics and Automation, 2005. ICRA 2005. Proceedings ofthe 2005 IEEE International Conference on. IEEE, 2005, pp. 2417–2424.

[6] S. Thrun, Y. Liu, D. Koller, A. Ng, Z. Ghahramani, and H. Durrant-Whyte, “Simultaneous Localization and Mapping with Sparse ExtendedInformation Filters,” vol. 23, no. 7, pp. 693–716, 2003.

[7] F. Dellaert and M. Kaess, “Square Root SAM: Simultaneous Local-ization and Mapping via Square Root Information Smoothing,” TheInternational Journal of Robotics Research, vol. 25, no. 12, pp. 1181–1203, 2006.

[8] E. Olson, J. Leonard, and S. Teller, “Fast iterative alignment of posegraphs with poor initial estimates,” in Proceedings - IEEE InternationalConference on Robotics and Automation, vol. 2006, no. May, 2006, pp.2262–2269.

[9] S. Thrun, “The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures,” The International Journal ofRobotics Research, vol. 25, no. 5-6, pp. 403–429, 2006.

[10] H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Whyfilter?” Image and Vision Computing, vol. 30, no. 2, pp. 65–77, 2012.

[11] F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard, “3-D Mappingwith an RGB-D camera,” IEEE Transactions on Robotics, vol. 30, no. 1,pp. 177–187, 2014.

[12] M. Labbe and F. Michaud, “Online global loop closure detectionfor large-scale multi-session graph-based SLAM,” in 2014 IEEE/RSJInternational Conference on Intelligent Robots and Systems, 2014, pp.2661–2666.

[13] S. Thrun and A. Bucken, “Integrating Grid-Based and Topological Mapsfor Mobile Robot Navigation,” Proceedings Of The National ConferenceOn Artificial Intelligence, vol. 13, no. August, pp. 944–950, 1996.

[14] P. Buschka and A. Saffiotti, “A virtual sensor for room detection,”IEEE/RSJ International Conference on Intelligent Robots and Systems,vol. 1, no. October, pp. 637–642, 2002.

[15] C. Galindo, A. Saffiotti, S. Coradeschi, P. Buschka, J. a. Fernandez-Madrigal, and J. Gonzalez, “Multi-hierarchical semantic maps for mo-bile robotics,” in 2005 IEEE/RSJ International Conference on IntelligentRobots and Systems, IROS, 2005, pp. 3492–3497.

[16] O. Mozos and W. Burgard, “Supervised Learning of TopologicalMaps using Semantic Information Extracted from Range Data,” 2006IEEE/RSJ International Conference on Intelligent Robots and Systems,pp. 2772–2777, 2006.

[17] E. Brunskill, T. Kollar, and N. Roy, “Topological mapping using spectralclustering and classification,” 2007 IEEE/RSJ International Conferenceon Intelligent Robots and Systems, pp. 3491–3496, 2007.

[18] J. Wu, H. I. Christensen, and J. M. Rehg, “Visual place categorization:Problem, dataset, and algorithm,” in 2009 IEEE/RSJ InternationalConference on Intelligent Robots and Systems, IROS 2009, 2009, pp.4763–4770.

[19] Z. Zhao and X. Chen, “Semantic mapping for object category and struc-tural class,” in 2014 IEEE/RSJ International Conference on IntelligentRobots and Systems, no. Iros, 2014, pp. 724–729.

[20] A. Diosi, G. Taylor, and L. Kleeman, “Interactive SLAM using laserand advanced sonar,” in Proceedings - IEEE International Conferenceon Robotics and Automation, vol. 2005, no. April, 2005, pp. 1103–1108.

[21] G.-J. M. Kruijff, H. Zender, P. Jensfelt, and H. I. Christensen, “Clarifi-cation dialogues in human-augmented mapping,” Proceeding of the 1stACM SIGCHI/SIGART conference on Human-robot interaction - HRI’06, p. 282, 2006.

[22] H. Zender, O. Martınez Mozos, P. Jensfelt, G. J. M. Kruijff, andW. Burgard, “Conceptual spatial representations for indoor mobilerobots,” Robotics and Autonomous Systems, vol. 56, no. 6, pp. 493–502, 2008.

[23] G. Randelli, T. M. Bonanni, L. Iocchi, and D. Nardi, “Knowledgeacquisition through human-robot multimodal interaction,” IntelligentService Robotics, vol. 6, no. 1, pp. 19–31, 2013.

[24] M. Labbe and F. Michaud, “Appearance-based loop closure detectionfor online large-scale and long-term operation,” IEEE Transactions onRobotics, vol. 29, pp. 734–745, 2013.

[25] R. Kummerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard,“G2o: A general framework for graph optimization,” in Proceedings -

IEEE International Conference on Robotics and Automation, 2011, pp.3607–3613.

[26] G. Grisetti, S. Grzonka, C. Stachniss, P. Pfaff, and W. Burgard,“Efficient estimation of accurate maximum likelihood maps in 3D,”in IEEE International Conference on Intelligent Robots and Systems,no. Ml, 2007, pp. 3472–3478.

[27] D. Huggins-Daines, M. Kumar, A. Chan, A. Black, M. Ravishankar,and A. Rudnicky, “Pocketsphinx: A Free, Real-Time Continuous SpeechRecognition System for Hand-Held Devices,” 2006 IEEE InternationalConference on Acoustics Speech and Signal Processing Proceedings,vol. 1, pp. 185–188, 2006.

[28] J. King and M. Likhachev, “Efficient cost computation in cost mapplanning for non-circular robots,” in Intelligent Robots and Systems.St.Louis,MO: Ieee, Oct. 2009, pp. 3924–3930.

[29] E. Marder-Eppstein, E. Berger, T. Foote, B. Gerkey, and K. Konolige,“The Office Marathon: Robust navigation in an indoor office environ-ment,” in IEEE International Conference on Robotics and Automation.Anchorage,AK: Ieee, May 2010, pp. 300–307.

[30] B. P. Gerkey and K. Konolige, “Planning and Control in UnstructuredTerrain,” in ICRA Workshop on Path Planning on Costmaps, 2008.

228

building a grid-point cloud-semantic map based on graph...

Documents