implementing visual perception tasks for the reem robot

Upload: bence-magyar

Post on 24-Feb-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    1/89

    Implementing visual perception tasks for

    the REEM robot

    Visual pre-grasping pose

    Author: Bence Magyar

    Supervisors: Jordi Pages, PhD ; Dr. Zoltan Istenes, PhD

    Barcelona & Budapest, 2012

    Master Thesis Computer Science

    Eotvos Lorand University Faculty of Informatics - Department of Software Technology and

    Methodology

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    2/89

    This copy was printed for Jordi Pages as a sign of acknowledgement for his help

    as advisor.

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    3/89

    Contents

    List of Tables IV

    List of Figures V

    1 Introduction and background 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 REEM introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3 Outline and goal of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4 ROS Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.5 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.6 Computer Vision basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.7 Grasping problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.8 Visual servoing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2 Object detection survey and State of the Art 10

    2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2 Available sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.3 Survey work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.4 Brief summary of survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3 Pose estimation of an object 17

    3.1 Introduction and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 CAD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.3 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.4 Edge detection on color image . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.5 Particle filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.6 Feature detection (SIFT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.7 kNN and RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.7.1 kNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.7.2 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.8 Implemented application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    I

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    4/89

    3.8.1 Learning module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.8.2 Detector module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.8.3 Tracker module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.8.4 State design pattern for node modes . . . . . . . . . . . . . . . . . . . 33

    3.9 Pose estimation results and ways for improvement. . . . . . . . . . . . . . . . 35

    3.10 Published software, documentation and tutorials . . . . . . . . . . . . . . . . . 35

    3.11 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    3.12 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4 Increasing the speed of pose estimation using image segmentation 38

    4.1 Image segmentation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    4.2 Segmentation using image processing . . . . . . . . . . . . . . . . . . . . . . 39

    4.3 ROS node design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.4 Stereo disparity-based segmentation . . . . . . . . . . . . . . . . . . . . . . . 41

    4.4.1 Computing depth information from stereo imaging . . . . . . . . . . . 41

    4.4.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.5 Template-based segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.5.1 Template matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.5.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.6 Histogram backprojection-based segmentation. . . . . . . . . . . . . . . . . . 44

    4.6.1 Histogram backprojection . . . . . . . . . . . . . . . . . . . . . . . . 44

    4.6.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    4.7 Combined results with BLORT . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    4.8 Published software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.9 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.10 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    5 Tracking the hand of the robot 49

    5.1 Hand tracking problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    5.2 AR Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    5.3 ESM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    5.4 Aruco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    5.5 Application examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    5.6 Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    5.7 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    6 Experimental results for visual pre-grasping 56

    6.1 Putting it all together: visual servoing architecture. . . . . . . . . . . . . . . . 56

    6.2 Tests on the REEM RH2 robot . . . . . . . . . . . . . . . . . . . . . . . . . . 576.3 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    II

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    5/89

    7 Conclusion 60

    7.1 Key results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    7.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    8 Bibliography 63

    A Appendix 1: Deep survey tables 67

    B Appendix 2: Shallow survey tables 71

    III

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    6/89

    List of Tables

    2.1 Survey summary table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.1 Effect of segmentation on detection . . . . . . . . . . . . . . . . . . . . . . . 47

    A.1 Filtered deep survey part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    A.2 Filtered deep survey part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    A.3 Filtered deep survey part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    B.1 Wide shallow survey part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    B.2 Wide shallow survey part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    B.3 Wide shallow survey part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    B.4 Wide shallow survey part 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    B.5 Wide shallow survey part 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    B.6 Wide shallow survey part 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.7 Wide shallow survey part 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    B.8 Wide shallow survey part 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    B.9 Wide shallow survey part 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    IV

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    7/89

    List of Figures

    1.1 Willowgarages PR2 finishing a search task . . . . . . . . . . . . . . . . . . . 2

    1.2 PAL Robotics REEM robot . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3 Two nodes in the ROS graph connected through topics . . . . . . . . . . . . . 4

    1.4 Real scene with the REEM robot . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.5 Examples for grasping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.6 REEM grasping a juicebox . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.7 Closed loop architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1 Most common sensor types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.2 Stereo camera theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.3 LINE-Mod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.4 ODUFinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.5 RoboEarth Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 ViSP Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.7 ESM Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3.1 CAD model of a Pringles box in MeshLab . . . . . . . . . . . . . . . . . . . . 20

    3.2 Examples of rendering in case of BLORT . . . . . . . . . . . . . . . . . . . . 21

    3.3 Image convolution example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.4 Steps of image processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.5 Particle filter used for localization . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.6 Particles visualized on the detected object of BLORT. Greens are valid, reds areinvalid particles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.7 Extracted SIFT and ORB feature points of the same scene. . . . . . . . . . . . 27

    3.8 SIFT orientation histogram example . . . . . . . . . . . . . . . . . . . . . . . 28

    3.9 Extracted SIFTs. Red SIFTs are not in the codebook, yellow and green ones are

    considered as object points, green ones are inliers of the model and yellow ones

    are outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.10 Detection result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.11 Tracking result, the object visible in the image is rendered . . . . . . . . . . . 323.12 Diagram of the tracking mode . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    V

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    8/89

    3.13 Diagram of the singleshot mode . . . . . . . . . . . . . . . . . . . . . . . . . 34

    3.14 Screenshots of the ROS wiki documentation . . . . . . . . . . . . . . . . . . . 36

    4.1 Example of erosion the black pixel class were eroded . . . . . . . . . . . . . . 39

    4.2 Example of dilation where the black pixel class were dilated . . . . . . . . . . 404.3 ROS node design of segmentation nodes . . . . . . . . . . . . . . . . . . . . . 40

    4.4 Parameters exposed through dynamic reconfigure . . . . . . . . . . . . . . . . 41

    4.5 Example of stereo vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    4.6 Masked, input and disparity images . . . . . . . . . . . . . . . . . . . . . . . 42

    4.7 Template-based segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.8 The process of histogram backprojection-based segmentation . . . . . . . . . . 44

    4.9 Histogram segmentation using a template of the target orange ball . . . . . . . 45

    4.10 The segmentation process and BLORT . . . . . . . . . . . . . . . . . . . . . . 46

    4.11 Test scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.12 Screenshot of the ROS wiki documentation . . . . . . . . . . . . . . . . . . . 48

    5.1 ARToolkit markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    5.2 ARToolkit in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    5.3 Tests using the ESM ROS wrapper . . . . . . . . . . . . . . . . . . . . . . . . 52

    5.4 Example markers of Aruco . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    5.5 Otsu thresholding example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    5.6 Video of Aruco used for visual servoing. Markers are attached to the hand andto the target object in the Gazebo simulator. . . . . . . . . . . . . . . . . . . . 54

    5.7 Tests done with Aruco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    6.1 Putting it all together: visual servoing architecture. . . . . . . . . . . . . . . . 56

    6.2 A perfect result with the juicebox. . . . . . . . . . . . . . . . . . . . . . . . . 57

    6.3 An experiment gone wrong . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    6.4 An experiment where tracking was tested . . . . . . . . . . . . . . . . . . . . 58

    VI

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    9/89

    Acknowledgements

    First I would like to thank my family and my girlfriend for their support and patience during

    my 5-month journey in science and technology in Barcelona. The same appreciation goes to

    my friends. I would also like to say thanks to Eotvos Lorand University and PAL Robotics

    for providing the opportunity as an Erasmus internship to conduct such research at a foreign

    country. Many thanks to my advisors Jordi Pages who mentored me at PAL and Zoltan Isteneswho both helped me forming this manuscript and organizing my work so that it can be pre-

    sented. Thumbs up for Thomas Morwald who was always willing to answer my questions

    about BLORT. The conversations and emails exchanged with Ferran Rigual, Julius Adorf and

    Dejan Pangercic helped a great deal with my research.

    I really enjoyed the friendly environment created by the co-workers and interns of PAL

    Robotics especially: Laszlo Szabados, Jordi Pages, Don Joven Agravante, Adolfo Rodriguez,

    Enrico Mingo, Hilario Tome, Carmen Lopera and all.

    I would also like to give credit to everyone whose work served as a basis for my thesis.

    These people are the members of the open source community and the developers of: Ubuntu,

    C++, OpenCV, ROS, Texmaker, Latex, Qt Creator, GiMP, Inkscape and many more.

    Thank you.

    VII

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    10/89

    Chapter 1

    Introduction and background

    1.1 IntroductionEven though we are not aware of it we are already being surrounded by robots. The most ac-

    cepted definition of a robot is that it is some kind of machine thats automated in order to help

    its owner by completing some tasks. They might not have human form as one would assume but

    the only difference is that humanoid robots are bigger and more complex. A humanoid robot

    could replace humans in various hazardous situations where a human form is still required such

    as the tools provided for a rescue mission are hand-tools designed for humans. Although pop-

    ular science fiction and sometimes even scientist like to paint a highly developed and idealized

    picture about robotics, it is only in the state of maturing.

    Despite the initial football oriented goal, even RoboCup - one of the most respected robotics

    competitions - has a special league calledRobocup@Homewhere humanoid robots compete in

    well defined common tasks in home environments. 1 Also the DARPA Grand Challenge - the

    most well-founded competition - has announced its latest challenge centered around a humanoid

    robot. 2

    1

    http://www.ai.rug.nl/robocupathome/2http://spectrum.ieee.org/automaton/robotics/humanoids/

    darpa-robotics-challenge-here-are-the-official-details

    1

    http://www.ai.rug.nl/robocupathome/http://spectrum.ieee.org/automaton/robotics/humanoids/darpa-robotics-challenge-here-are-the-official-detailshttp://spectrum.ieee.org/automaton/robotics/humanoids/darpa-robotics-challenge-here-are-the-official-detailshttp://spectrum.ieee.org/automaton/robotics/humanoids/darpa-robotics-challenge-here-are-the-official-detailshttp://spectrum.ieee.org/automaton/robotics/humanoids/darpa-robotics-challenge-here-are-the-official-detailshttp://www.ai.rug.nl/robocupathome/
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    11/89

    Figure 1.1: Willowgarages PR2 finishing a search task

    However there is no generally accepted solution even for manipulating simple objects like

    boxes and glasses at the current time but this field has been through a lot of development lately.

    Right now it is still an open problem but promising works such as [ 15] have been published.

    Finding and identifying an object to be grasped highly depends on the type and number of

    sensors a robot has.

    1.2 REEM introduction

    The latest creation of PAL Robotics is the robot named REEM.

    Figure 1.2: PAL Robotics REEM robot

    With its 22 degrees of freedom, 8 hours of battery time, 30kg payload and 4km/h speed it

    is one of the top humanoid service robots. Each arm owns 7 degrees of freedom with 2 for

    the torso and 2 for the head. The head unit of REEM holds a pair of cameras as well as a

    2

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    12/89

    microphone and speaker system while a touch screen is available on the chest for multimedia

    applications such as map navigation.

    The main goal of this thesis work was to develop applications for this specific robot while

    making sure that the end result will still be general enough to allow the usage of other robot

    platforms.

    1.3 Outline and goal of the thesis

    Before going into more details the basic problems are needed to be defined.

    The goal of this thesis was to implement computer vision modules for grasping tabletop

    objects with the REEM robot. To be more precise: it consisted of implementing solutions for

    the sub-problems ofvisual servoing in order to solve the grasping problem. This covers the

    following two tasks from the topic statement: detection of textured and non-textured objects,

    detection of tabletop objects for robot grasping.

    The first and primary problem encountered is the pose estimation problem which was the

    main task of this thesis work. There are several examples in scientific literature solving

    slightly different problems partial to pose estimation. One of them is the object detectionproblem and the other one is the object tracking problem. It is crucial to always have these

    problems in mind when dealing with objects through vision.

    The pose estimation problem is that we have to compute an estimated pose of an object

    given some input image(s) and - possibly - given additional background knowledge. Ways of

    defining aposecan be found at1.6.

    An object detection problem can be identified by its desired answer type. One is dealing

    withobject detectionif the desired answer for an image or image sequence is whether an object

    is present or its number of appearances. This problem is typically solved using features.

    Numerous examples and articles can also be found for the object tracking problem. Usu-

    ally these type of methods are specialized to provide real-time speed. To do so they require an

    initialization stage before starting the tracker. Concretely: the target object has to be set to an

    initial pose or the tracker has to be initialized with the pose of the object.

    The secondary task of this thesis was to provide solution for tracking the hand of the

    REEM robot during the grasping process so the manipulator position and the target position

    3

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    13/89

    can be inserted into a visual servoing architecture.

    A necessaryoverall objectivewas to provide all results with speedthat makes these appli-

    cations eligable fordeployment in a real scene on the REEM robot.

    1.4 ROS Introduction

    ROS [29] (Robot Operating System) is a meta operating system designed to help and enhance

    the work of robotics scientists. ROS is not an operating system in the traditional sense of process

    management and scheduling; rather, it provides a structured communications layer above the

    host operating systems of a heterogenous computer cluster.

    At the very core of it, ROS provides an implementation of the Observer Design Pattern [14,

    p. 293] and additional software tools to well organize the system. A ROS system is made up of

    nodes which serve as computational processes in the system. ROS nodes are communicating

    via typed messages through topics which are registered by using simple strings as names. A

    nodecanpublishand/orsubscribeto a topic.

    A ROS system is completely modular, each node can be dynamically started or stopped,

    they are all independent components in the system depending on each other only for data input

    reasons. Topicsprovide continous dataflow-style processing of messages but they have limita-

    tions if one would like to use a node service in a blocking call way. There is a way to create

    such interfaces for nodes and these are calledservices. To support a dynamic way to store and

    modify commonly used global or local parameters, ROS also has a Parameter Server through

    which nodes can read, create, modify parameters.

    The link below provides more information about ROS:

    http://ros.org/wiki

    Figure 1.3: Two nodes in the ROS graph connected through topics

    Among others, ROS also provides

    4

    http://ros.org/wikihttp://ros.org/wiki
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    14/89

    a build system - rosbuild, rosmake

    a launch system - roslaunch

    monitoring tools - rxgraph, rosinfo, rosservice, rostopic

    1.5 OpenCV

    OpenCV [9] stands for Open Source Computer Vision and it is a programming library developed

    for real time computer vision tasks. OpenCV is released under a BSD license, it is free for both

    academic and commercial use. It has C++, C and Python interfaces running on Windows, Linux,

    Android and Mac. It provides an implementation of several image processing and computer

    vision algorithms classic and state of the art alike. It has great amounts of supplementary

    material available on the internet such as [22]. It is being developed by Willowgarage along

    with ROS and is widely used for vision oriented applications on all platforms. All tasks related

    to image-processing in this thesis work were solved using OpenCV.

    1.6 Computer Vision basics

    This section will go through the very basic definitions of Computer Vision.

    A rigid body in 3D space is defined by its position and orientation which is commonly refer-

    enced as pose. Such a pose is always defined with respect to an orthonormal reference frame

    wherex,y,zare the unit vectors of the frame axes.

    Position of a point O0 on the rigid body with respect to the coordinate frame O xyz is ex-

    pressed by the relation

    o0 = o0xx+o0yy+o

    0zz (1.1)

    , where o0x, o0y, o

    0z denote the components of the vector o

    0 R3 along the frame axes. The

    position ofO0therefore can be defined as vector o0 as follows:

    o0 =

    o0x

    o0y

    o0z

    (1.2)

    So far we covered the position element of the objectspose.

    5

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    15/89

    The orientation ofO0can be defined w.r.t its reference frame as follows:

    x0 = x0xx+x0yy+ x

    0zz

    y0 =y 0xx+y0yy+ y

    0zz

    z0 = z0xx+z0yy+z0zz

    (1.3)

    A more practical form is the following usually called a rotation matrixR:

    R=h

    x0 y0 z0i

    =

    x0x y0x z

    0x

    x0y y0y z

    0y

    x0z y0z z

    0z

    =

    x0Tx y0Tx z0Tx

    x0Ty y0Ty z0Ty

    x0Tz y0Tz z0Tz

    (1.4)

    The columns of matrixR are mutually orthogonal so as a consequence

    RTR= I3 (1.5)

    whereI3denotes the(3 3)identity matrix.

    It is clear that the rotation matrix above is redundant in representation. In some cases a

    unit quaternion representation is used. Given a unit quaternion q= (w , x , y, z) the equivalent

    rotation matrix can be computed:

    Q=

    1 2y2

    2z2

    2xy 2zw 2xz+ 2yw

    2xy+ 2zw 1 2x2 2z2 2yz 2xw

    2xz 2yw 2yz+ 2xw 1 2x2 2y2

    (1.6)

    Note: When talking about transformations the components of a pose are usually called

    translationandrotationinstead ofpositionandorientation.

    [35] provided great help for writing this section.

    1.7 Grasping problem

    A grasping problem has several definitions depending on specific parameters. Since the goal of

    this thesis was not visual servoing the presented grasping problem will be a simplified version.

    6

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    16/89

    Figure 1.4: Real scene with the REEM robot

    Given an object frame in the 3D space the task is to find an apropriate sequence of operations

    resulting the used robot manipulator in a pose that meets the desired definition of goal frame.LetFco denote theobject frame w.r.t thecamera frame and define the goal frame as

    Fcg =Tof f Fc

    o (1.7)

    whereTof f is a transformation that defines a desired offset on the object frame. Also letFc

    m

    denote the manipulator frame w.r.t. the camera frame.

    The next task is to find the sequenceT1, T2,...,Tnwhere

    ||T1T2...TnFc

    m Fc

    g ||= do4

    Manipulate/modifyT1, T2,...,Tnto minimize error. ;5

    end6

    Grasp object - close hand/gripper ;7

    Algorithm 1: General grasping algorithm

    7

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    17/89

    (a) (b) (c)

    (d) (e)

    Figure 1.5: Examples for grasping

    Figure 1.6: REEM grasping a juicebox

    8

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    18/89

    1.8 Visual servoing

    Following the same principles as with motor control visual servoing stands for controlling a

    robot manipulator based on feedback. In this particular case the feedback is obtained by using

    computer vision. It is also referred to as Vision-Based Control and has 3 main types such as:

    Image Based (IBVS): The feedback is the error between the current and the desired image

    points on image plane. Does not include 3D pose at all therefore is often referred to as

    2D visual servoing.

    Position Based (PBVS): The main feedback is the 3D pose error between the current pose

    and the goal pose. Usually referred to as 3D visual servoing.

    Hybrid: 2D-3D visual servoing approaches are taking image features as well as 3D pose

    information combining the two servoing methods mentioned above.

    Visual servoing is categorized as closed loop control. Figure1.7 shows the general archi-

    tecture of visual servoing.

    Figure 1.7: Closed loop architecture

    9

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    19/89

    Chapter 2

    Object detection survey and State of the

    Art

    2.1 Introduction

    As a precursor for this project a wide survey of existing software packages and techniques

    needed to be done. The survey consisted of 2 stages.

    1. A wider survey for shallow testing and research to classify possible subjects. The table

    of results can be found in Appendix 2B.

    2. A filtered survey based on the attributes and previous results and experiences with more

    detailed tests and research also taking available sensors into account. The table of results

    can be found in Appendix 1A.

    This chapter introduces the most resulting softwares and techniques from the above surveys

    providing the benefits and drawbacks experienced.

    2.2 Available sensors

    There are several ways to address the tasks of digitally recording the world. While there is a

    wide variety of sensors suitable for image-based applications when building an actual humanoid

    robot one has to consider to choose the type best fitting the application and one that can fit into

    a robot body or - more preferably - into a robot head.

    10

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    20/89

    (a) Monocular camera (b) Stereo cameras (c) RGB-D devices (d) Laser scanner

    Figure 2.1: Most common sensor types

    Monocular cameras usually provide accurate colors and can be reasonably faster than the

    other sensor types.

    Stereo cameras usually require extra processing time since it is a system of two calibratedcameras with a predefined distance between the two monocular cameras the system con-

    sists of. They are also used for digital and analog 3D filming and photoshooting.

    Figure 2.2: Stereo camera theory

    RGB-D sensors are operating with structured light or time-of-flight techniques and have

    become quite popular and frequent thanks to Microsoft Kinect or the Asus Xtion. These

    sensors are cheaper then a medium quality stereo camera system and also require less or

    no calibration at all but their quality is fixed to standard webcameras. They have special

    hardware for processing the data into RGB images with depth information namely RGB-

    D. A really common use cases for these sensors are human-PC virtual reality interactioninterfaces.

    11

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    21/89

    Laser scanners are more industrial and usually substantially more expensive than the oth-

    ers. Due to the primary industrial design of laser scanners they have an extremely low

    error rate and high resolution. They are mostly used on mobile robots for mapping tasks

    or 3D object scanning for graphical or medical applications.

    2.3 Survey work

    This section summarizes the research conducted for this thesis mentioning test experiences if

    there were any.

    Holzer et al.defined so-called distance templates and applied them using regular template

    matching methods.

    Hinterstoisser et al.introduced a method using Dominant Orientation Templates to identify

    textureless objects and estimate their pose in real-time. In their very recent workHinterstoisser

    et al.engineered the method LINE-Mod for detecting textureless objects using gradient normals

    and surface normals. The advantage of their approach is that even though an RGB-D sensor is

    required in the learning stage, a simple webcamera is enough to detect - of course the error will

    increase since there are no surface points available from a webcam. A compact implementation

    is available since OpenCV 2.4.

    Experiments done with LINE-Mod showed that this method cannot be applied to textured

    objects although it is a reasonably nice alternative for textureless objects. An experience gained

    by using this method is that the false detection rate was extremely high and no applicable 3D

    pose result could be obtained, it only provided if an object was detected or not. The first

    implementation was released at the time of this thesis work therefore it is possible that future

    versions will improve results. The product of this thesis work could be expanded to textureless

    objects using this technique.

    Test videos prepared for this thesis:

    http://www.youtube.com/watch?v=2cCsYfwQGxI

    http://www.youtube.com/watch?v=3e3Wola4EWA

    12

    http://www.youtube.com/watch?v=2cCsYfwQGxIhttp://www.youtube.com/watch?v=3e3Wola4EWAhttp://www.youtube.com/watch?v=3e3Wola4EWAhttp://www.youtube.com/watch?v=2cCsYfwQGxI
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    22/89

    Figure 2.3: LINE-Mod

    Nister and Steweniusspeeded up the classic feature-based matching approach by utilizing

    a tree-based search optimized data structure. They also ran experiments on a PR2 robot and

    released the ROS package named Objects of Daily Use Finder (ODUFinder).

    Figure 2.4: ODUFinder

    However the theoretical base of this method is solid conducted experiments showed that the

    practical results were not applicable for a mobile robot working in human environment at the

    time of this work.

    Muja et al.implemented the general Recognition Infrastructure to host and coordinate the

    modules of recognition pipelines whileRusu et al.provided an example use case applying Bi-

    narized Gradient Grids and Viewpoint Feature Histograms.

    13

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    23/89

    The OpenCV group,Rublee et al.defined a new type of feature detector/extractor Oriented

    BRief (ORB) to provide a BSD licensed solution in constrast to SURF [5]. The work of [4]

    was to experiment and create benchmarks for TOD [34] using ORB as its main feature detec-

    tion/extracion method.

    Experimental work was done for this thesis to see if SIFT could be replaced with ORB in Chap-

    ter 3 but due to deadlines it was not possible to implement it. As future work it would be

    however a nice addition to the final software.

    The work of [38], RoboEarth is a general communication platform for robots and has a

    ROS package which contains a database clientand a detector module. Even though the detec-

    tor module of RoboEarth was not precise enough for the task of this thesis its still exemplary

    as robotics software.

    The tests of RoboEarth package were really smooth and easy to do since they provided tuto-

    rials and convenient interfaces for their software. The requirements of the system however did

    not exactly meet the provided hardware because the detector of RoboEarth needs an RGB-D

    sensor and with REEM we only had a stereo camera. Experiments showed that obtaining a

    precise pose is hard due to its high variance and the false detection rate was also high.

    Figure 2.5: RoboEarth Detector

    The published library ofEric Marchand et al.called ViSP contains tools for visual servoing

    tasks along image processing and other fields as well. ViSP is also available as a ROS package

    and it contains a ready-to-use model-based tracker tracking edges of the object model. The

    ViSP tracker operates using the edges of the objects and tracks it starting from a known initial

    position.

    14

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    24/89

    Figure 2.6: ViSP Tracker

    Thanks to the ROS package provided for it the ViSP Tracker was easy to test. The good

    results almost made it the primary choice of tracker however it still requires a known initial

    position to start and it solely does tracking which alone does not solve the pose estimation

    problem alone. Though it provided good results problems occured due to the limit of only

    using greyscale images. The tracker finally chosen (3.4) to be used is also taking colors into

    account.

    A remarkably unique approach for tracking 2D planar patterns is ESM[7]. It has a free

    version for demoing but also provides a licensed version which is highly optimized. It did

    not prove reliable enough and the output format also raised problems for this task. During the

    tracking process the template searched is always modified to adapt to small changes over time.

    Because of this it can only work when theres tiny difference between two consecutive images

    and more importantly the target pattern should not travel too big distances between such im-

    ages. It is also worth mentioning that since this technique is also a trackerthe initial pose is

    required. The implementation provided for this technique did not make the job of testing it eas-

    ier with its dinamically linked C library and C header file. No open-source solution is provided.

    Figure 2.7: ESM Tracker

    15

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    25/89

    Mixing techniques proved to be successful in markerless object detection. Feature detection

    and edge-tracking based methods were presented and discussed in [26], [37], [30], [25], [31]

    [8], [19], [20] and [21] leading to the birth of a software package called The Blocks World

    Robotic Vision Toolbox. The basic idea is to use a feature-based pose estimation module to

    acquire a rough estimation and than use this result to initialize a real-time tracker using edge

    detection to make things a lot faster and dynamic. As a result of this survey work first BLORT

    was chosen to be further tested and to be integrated into the software of the REEM robot.

    2.4 Brief summary of survey

    Table2.1is summarizing the previous section in a table form highlighting the most relevant

    attributes.

    Name Tracker Detector Hybrid Sensor Texture Speed Output Keywords

    ViSP tracker Yes No No Monocular Only edges 30Hz Pose edge tracking,

    grayscale, particle

    filter

    RoboEarth No Yes No RGB-

    D(train,

    detect),

    monocu-

    lar(detect)

    Needed 11Hz Pattern

    matched

    kinect, point cloud

    matching, texture

    matching

    LINE-Mod No Yes No RGB-

    D(train,

    detect),

    monocu-

    lar(detect)

    Low texture 30Hz Pattern

    matched

    surface and color

    gradient normals,

    kinect

    ESM Yes No No Monocular Needed 30Hz Homography custom minimiza-

    tion, pattern match-

    ing

    ODUFinder No Yes No Monocular Needed 4-6Hz Matched

    SIFTs

    SIFT, vocabulary tree

    BLORT No No Yes Monocular Needed 30Hz+ 3D pose SIFT, edge, CAD,

    RANSAC, OpenGL

    Table 2.1: Survey summary table

    16

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    26/89

    Chapter 3

    Pose estimation of an object

    3.1 Introduction and overviewAs a result of the wide and then the deep survey one software package was chosen to be inte-

    grated with the REEM robot. A crucial factor of all techniques surveyed was to see what type

    of sensor is required because the REEM robot does not have a visual depth sensor in the head.

    Early experiments with the BLORT system showed that it could be capable of serving as a pose

    estimator on REEM for the grasping task. It provided correct results with a low ratio of false

    detections especially when compared to others along with a reasonably good speed.

    BLORT - The Blocks World Robotic Vision Toolbox

    The vision and robotics communities have developed a large number of increas-

    ingly successful methods for tracking, recognizing and online learning of objects,

    all of which have their particular strengths and weaknesses. A researcher aiming

    to provide a robot with the ability to handle objects will typically have to pick

    amongst these and engineer a system that works for her particular setting. The

    toolbox is aimed at robotics research and as such we have in mind objects typicallyof interest for robotic manipulation scenarios, e.g. mugs, boxes and packaging of

    various sorts. We are not aiming to cover articulated objects (such as walking hu-

    mans), highly irregular objects (such as potted plants) or deformable objects (such

    as cables). The system does not require specialized hardware and simply uses a sin-

    gle camera allowing usage on about any robot. The toolbox integrates state-of-the

    art methods for detection and learning of novel objects, and recognition and track-

    ing of learned models. Integration is currently done via our own modular robotics

    framework, but of course the libraries making up the modules can also be separately

    integrated into own projects.

    17

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    27/89

    Source: http://www.acin.tuwien.ac.at/?id=290(Last accessed: 2012.10.16.)

    For the core of BLORT credits are to Thomas Morwald, Johann Prankl, Michael Zwillich,

    Andreas Richtsfeld and Markus Wincze at the Vision for Robotics (V4R) lab at the Automation

    and Control Institute (ACIN) of the Vienna University of Technology (TUWIEN). Originally

    BLORT was designed to provide a toolkit for robotics therefore its full name is Blocks World

    Robotic VisionToolbox.

    For a better understanding on this chapter it is required to read1.3.

    The list of papers connected to BLORT:

    1. BLORT - The Blocks World Robotic Vision Toolbox Best Practice in 3D Perception and

    Modeling for Mobile Manipulation [26]

    2. Anytimeness Avoids Parameters in Detecting Closed Convex Polygons [37]

    3. Basic Object Shape Detection and Tracking using Perceptual Organization [30]

    4. Edge Tracking of Textured Objects with a Recursive Particle Filter [25]

    5. Taking in Shape: Detection and Tracking of Basic 3D Shapes in a Robotics Context

    [31]

    Since there was no ROS package provided the integration had to start at that level. The

    system itself is composed of separate works of the above authors but performs reasonably well

    integrated together. A positive aspect of BLORT is that it was designed to be used with a single

    webcam. This way no extra sensor is required on most robots and still it performs well. Of

    course as most scientific software BLORT was also developed indoors without ever leaving the

    lab. The step to take with BLORT was to integrate it into a system that runs ROS and tune it so

    it will be able to operate in a real robot outside laboratory environment.

    For the above objectives to work out the software had to be throughoutly tested while also

    discovering those regions where most of the computation is being done. The code had to be

    refactored in order to provide more convenient interfaces and also to eliminate bugs such as

    tiny memory leaks and other problems coming from incorrect memory usage. Also all the com-

    ponents and algorithms used by BLORT had to be inspected and their parameters exposed to

    end-users for deploy-time configuration or modified inside for better results.

    18

    http://www.acin.tuwien.ac.at/?id=290http://www.acin.tuwien.ac.at/?id=290
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    28/89

    The algorithmic design of BLORT is a sequence of the detectorandtrackermodules.

    initialization ;1

    whileobject not detected or (object detected andconfidence < thresholddetector)do2

    //Run object detector;3

    Extract SIFT features;4

    Match extracted SIFTs to codebook using kNN;5

    Estimate object pose from matched SIFTs using RANSAC;6

    Validate confidence;7

    publishobject pose for tracker;8

    end9

    whileobject tracking confidence is highdo10

    //Run object tracker;11

    Copy the input image and render the textured object into scene to its known location;12

    Run colored edge detection on both (input and rendered) image;13

    Use a particle filter to match the images around the estimated pose;14

    Average particle guesses and compute confidence rate;15

    Smooth confidence values (edge, color) to avoid unrealistic fast flashes;16

    ifconfidence > thresholdtracker then17

    publishpose of the object ;18

    end19

    end20

    Algorithm 2: BLORT algorithmical overview

    19

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    29/89

    3.2 CAD Model

    CAD modelsare commonly used inComputerAidedDesign softwares mainly by different type

    of engineers. These models define simple 3D objects as well as more complex ones.

    Object trackersoften rely on CAD models of the target object(s) to perform edge detection-

    based tracking.

    Related articles of BLORT: [26], [25]

    MeshLab [11] proved to be a great tool to handle simple objects and generate convex hulls

    of complex meshes.

    A demonstration video about the process of creating a simple juicebox brick can be found

    on the following link: http://www.youtube.com/watch?v=OtduI5MWVag

    Figure 3.1: CAD model of a Pringles box in MeshLab

    3.3 Rendering

    Rendering is commonly known from computer games or scientific visualization. It loads or

    generates 3D shapes and (usually) projects them onto a 2D surface - the screen. The OpenGL

    and DirectX libraries are often used to utilize the computational power of the GPU (video card)

    for rendering tasks through their APIs. Unlike CUDA which is pretty young compared to the

    other two these libraries were not designed for scientific computation but they are still being

    used for it.

    20

    http://www.youtube.com/watch?v=OtduI5MWVaghttp://www.youtube.com/watch?v=OtduI5MWVag
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    30/89

    (a) Visualizer of TomGine(part of BLORT) (b) Rendering a 3D model onto a camera

    image

    Figure 3.2: Examples of rendering in case of BLORT

    In the case of BLORT rendering is used in the trackermodule to validate the actual poseguess. An intuitive description of this step is that the trackermoduleimagines(renders) how the

    objectshouldlook like given the current pose guess and validates the guess using a comparison

    method.

    3.4 Edge detection on color image

    To validate a pose guess the trackermodule compares the original input image with the one

    with the 3D object rendered onto it. Such a comparison can be done several ways. In the caseof object tracking it is considerable to use the edges of the objectwhich can be extracted by

    detecting the edgesof the image.

    The following steps were implemenented using OpenGL Shaders - a technique highly opti-

    mized for computingimage convolution. The procedure takes an input image Iand a convolu-

    tion kernelKand outputsO. A simplified definition could be

    O[x, y] = X I[f(x, y)] K[g(x, y)] (3.1)

    wheref(x, y)and g(x, y)are the corresponding indexer functions. The result however is often

    required to be normalized. This can be arranged by adding a normalizing factor to Equation

    3.1which is the sum of the factors of multiplication, more concretely the elements of kernel K.

    The final formula for convolution should look like the following:

    O[x, y] = 1Pa,bK[a, b]

    XI[f(x, y)] K[g(x, y)] (3.2)

    21

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    31/89

    Figure 3.3: Image convolution example

    Steps of image processing in thetrackermodule of BLORT:

    1. A blurring operator is applied to the input image to minimize pointilized error such as

    isolated black and white pixels. This is usually an important pre-filtering step for edge-

    detection methods. For this purpose a 5x5 Gauss operator was chosen.

    K= 1115

    2 4 5 4 2

    4 9 12 9 4

    5 12 15 12 5

    4 9 12 9 4

    2 4 5 4 2

    (3.3)

    2. Edge detection using a Scharr operator. By applyingKx andKy as convolutions to the

    input image the corresponding estimated derivatives can be computed.

    Kx =

    1

    22

    3 0 3

    10 0 103 0 3

    (3.4)

    Ky = 1

    22

    3 10 3

    0 0 0

    3 10 3

    (3.5)

    3. Nonmaxima supression to only keep the strongest edges of theedge detection.

    Kx =

    0 0 0

    1 0 1

    0 0 0

    (3.6)

    22

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    32/89

    Kx =

    0 1 0

    0 0 0

    0 1 0

    (3.7)

    In this step the above convolutions serve as indicators whether the current pixel is a max-

    imal edge compared to its neighborhood. If it is not, the pixel is disposed and an extremal

    element is returned (RGB(0,127,128)).

    4. Spreading operation to grow the remaining edges from the previous step.

    K=

    12

    1 12

    1 0 112

    1 12

    (3.8)

    This step enlarges the previously determined strongest edges. This step is important to

    remove the small errors received from detected false edges.

    23

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    33/89

    (a) Input image (b) Gaussian blur

    (c) Scharr operator (d) Nonmaxima supression

    (e) Spreading

    Figure 3.4: Steps of image processing

    The implementation of the above method is used through an OpenGL shader(which makes

    use of the paralellizable nature of image processing techniques) but a pure CPU version us-

    ing OpenCV was also implemented during the work of this thesis though they have proven

    reasonably slower than theshaderversion.

    24

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    34/89

    3.5 Particle filter

    As a technique well-based on statistical methods in robotics particle filters are often used for

    localization tasks. For object detection tasks its utilized for trackingobjects in real time.

    At its very core, particle filtering is a model estimation technique based on simulation. In

    such a system a particle could be called an elementary guess about one possible estimation of

    the model whilesimulationstands for continuously validating and resampling these particles to

    adapt the model to new information given by measurements or additional data.

    (a) (b)

    (c) (d)

    Figure 3.5: Particle filter used for localization

    Figure 3.5 shows a particle filter used in localization. It is clear that in the initial situ-

    ation where no information was given the particles are well-spread around the map. As the

    robot moves and uses sensors to measure its environment these particles are beginning to center

    around those areas more likely to contain the robot.

    The design of particle filters makes it possible to utilize paralell computing techniques in

    the implementation such as using multiple processor threads or the graphics card. This is an

    important feature which makes this algorithm suitable for real-time tracking.

    25

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    35/89

    Generate initial particles ;1

    whileError > T hresholddo2

    Wait for additional information ;3

    Calculate normalized particle importance weights ;4

    Resample particles based on importance weight to generate new particle set ;5

    Calculate Error ;6

    end7

    Algorithm 3: Particle filter algorithm

    Thetrackermodule is using a particle filter to track and refine the pose of an object. One

    particle in this specific case holds a posevalue which is evaluated by running an edge-detection

    based comparation method described in Section3.4.

    Figure 3.6: Particles visualized on the detected object of BLORT. Greens are valid, reds are

    invalid particles.

    3.6 Feature detection (SIFT)

    Image processing is often only the first step to further goals such as image analization or pattern

    matching. The termimage processingrefers to operations done on pixel-level where the infor-

    mation gained is also often pixel-level information. The features used here are the individual

    pixels. However it is necessary to define features of higher level in order to increase complexity,

    robustness or speed or all of the previous at the same time. Though a sucessfully extracted line

    in an image is also considered a feature when speaking of feature detection it usually refers tofeature types which are centered around a point. Such feature detectors are for example:

    26

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    36/89

    FAST

    SIFT[23]

    SURF[5]

    BRIEF

    ORB [32]

    FREAK

    Figure 3.7: Extracted SIFT and ORB feature points of the same scene

    The SIFT detector did prove one of the strongest through the literature and existing appli-

    cations therefore was chosen to be the main feature detector of BLORT. The SIFTs extracted

    from the surface of the current object in the learning stageare saved in a data structure which

    will be referred to ascodebookor object SIFTsfrom now on. Later this codebookis used to

    matchimage SIFTs: features extracted from the current image.

    SIFT details:

    invariant

    scaling

    orientation

    partially invariant

    affine distortion

    illumination changes

    SIFT procedure:

    Image convolved using Laplacian of Gaussian (LoG) filter at different scales (scale pyra-

    mid)

    Compute difference between the neighboring filtered images

    Keypoints: local max/min of difference of LoG

    27

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    37/89

    Compare to its 8 neighbors on the same scale

    Compare to the 9 corresponding pixels on the neighboring levels

    Keypoint localization

    Problem: too many (unstable) keypoints

    Discard the low contrast points

    Eliminate the weak edge points

    Orientation assignment

    Invariant for rotation

    Each keypoint is assigned one or more orientations from local gradient features

    m(x, y) =p

    (L(x+ 1, y) L(x 1, y))2 + (L(x, y+ 1) L(x, y 1))2 (3.9)

    (x, y) =arctgL(x, y+ 1) L(x, y 1)

    L(x+ 1, y) L(x 1, y) (3.10)

    Calculate for every pixel in a neighboring region to create an orientation histogram

    Determine dominant orientation based on the histogram

    Figure 3.8: SIFT orientation histogram example

    On the implementation level the feature detection step is done by utilizing the graphics card

    again by using the SiftGPU [36] library to extract image SIFTs. As a part of this thesis work a

    ROSwrapper package of this library was also created.

    http://ros.org/wiki/siftgpu (Last accessed: 2012.11.05.)

    28

    http://ros.org/wiki/siftgpuhttp://ros.org/wiki/siftgpu
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    38/89

    3.7 kNN and RANSAC

    3.7.1 kNN

    k-NearestNeighbors [12] is an algorithm used for solving classification problems. Given a dis-

    tance measure of the data type of the actual dataset it classifies the current element based on the

    attributes and classes of its nearest neighbors. It is also often used for clustering tasks.

    In BLORT kNN is used to select a fixed size of set (the number k) of features from the

    codebookmost similar to the featurecurrently being matched during thedetectionstage.

    3.7.2 RANSAC

    The RANSAC[13] algorithm is possibly the most widely used robust estimator in the field of

    computer vision. The abbreviation stands for Ran SampleConsensus. RANSAC is an iterative

    model estimation algorithm which operates by assuming that the input data set contains outliers

    - elements not inside the validation range of the estimated mathematical model and minimizes

    the ratio of outlierinlier

    . It is a non-deterministic algorithm since a random number generation is used

    in the sampling stage.

    InBLORTRANSAC is used to estimate the pose of the object using image features (SIFTs

    in this case) to initialize the trackermodule therefore a RANSAC method can be found in the

    detectormodule.

    Figure 3.9: Extracted SIFTs. Red SIFTs are not in the codebook, yellow and green ones are

    considered as object points, green ones are inliers of the model and yellow ones are outliers.

    29

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    39/89

    Data: dataset;

    model - whose parameters are needed to be estimated;

    n-points-to-sample - number of points to use to give a new estimation ;

    max-ransac-trials - maximum number of iterations;

    t - a threshold for maximal error when fitting a model ;

    n-points-to-match - the number of dataset elements required to set up a valid model ;

    0- an optional, tolerable error limit

    Result: best-model ;

    best-inliers;

    best-error ;

    iterations = 0 ;1

    idx = NIL;2

    = npointstomatchdataset.size

    ;3

    whileiterations < max ransac trialsor(1.0 npointstomatch)iterations >=04

    do

    idx = random indices from dataset ;5

    model = Compute model(idx) ;6

    inliers = Get inliers(model, dataset) ;7

    ifinliers.size >=n points to matchthen8

    error = Compute error(model, dataset, idx) ;9

    iferror < best error then10

    best-model = model ;11

    best-inliers = inliers ;12

    best-error = error ;13

    end14

    end15

    increment iterations ;16

    end17

    Algorithm 4: RANSAC algorithm in BLORT

    30

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    40/89

    3.8 Implemented application

    All implementation were done using ROS1.4and C++.

    Training and detecting a Pringles box: http://www.youtube.com/watch?v=

    HoMSHIzhERI

    Training and detecting a juicebox: http://www.youtube.com/watch?v=0QVc9x3ZRx8

    3.8.1 Learning module

    As well as similar applications BLORT also requires a learning stage before any detection could

    be done. In order to start the process a CAD model of the object is needed. This model gets

    textured during the learning process as well as SIFTs are registered onto surface points of the

    model. The software itself is running the trackermodule which is able to operate without tex-

    ture only based on the pheriperial edges of the object (ie: the outline of the object). The learning

    stage is operated manually.

    After the operator starts the tracker in an initial pose displayed on the screen the tracker

    will follow the object. By the pressing of a single button both texture and SIFT descriptors are

    registered for the most dominantfaceof the object (ie: the one that is the most orthogonal to the

    camera). All information captured are used on-the-fly from the moment of recording during the

    learning stage. As the tracker gets more information by registering textures to different faces of

    the object the task of the operator becomes more convenient. 1

    To make this step easier for new users of BLORT demonstrative videos were recorded:

    Training a Pringles container: http://www.youtube.com/watch?v=pp6RwxbUwrI

    Training with a juicebox: http://www.youtube.com/watch?v=Hfg7spaPmY0

    3.8.2 Detector module

    Thedetector moduleunlike its name implies does object detectionandposeestimation however

    this resultingposeis often not completely precise. Thedetection stagestarts with the extraction

    of SIFTs3.6then continues with akNN3.7.1method to determine the best matchings from the

    codebookthen further used by a RANSAC3.7.2method approximating theposeof the object.

    1Cylindrical objects tend to keep rotating when there is no texture due to the completely symmetric form.

    31

    http://www.youtube.com/watch?v=HoMSHIzhERIhttp://www.youtube.com/watch?v=HoMSHIzhERIhttp://www.youtube.com/watch?v=0QVc9x3ZRx8http://www.youtube.com/watch?v=pp6RwxbUwrIhttp://www.youtube.com/watch?v=Hfg7spaPmY0http://www.youtube.com/watch?v=Hfg7spaPmY0http://www.youtube.com/watch?v=pp6RwxbUwrIhttp://www.youtube.com/watch?v=0QVc9x3ZRx8http://www.youtube.com/watch?v=HoMSHIzhERIhttp://www.youtube.com/watch?v=HoMSHIzhERI
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    41/89

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    42/89

    3.8.4 State design pattern for node modes

    Although BLORT was designed to provide real-time tracking (tracker module) after feature-

    based initialization (detector module) it yields a different possible use-case which is more de-

    sirable for this thesis than the original functionality.

    When used in an almost stand still scene to determine the pose of an object to be grabbed

    tracking provides the refinement and validation of the pose acquired by the detector. By defining

    a timeout for the tracker in these cases would result in high resource saving which is important

    in a real robot. After the timeout has been passed and the confidence is sufficient the last de-

    termined pose can be used. This way for example the robot doesnt have to run all the costly

    algorithms until it reached the table where it needs to grab an object.

    The above behaviour however is not always an option therefore it is also required to have a

    full-featured tracker which can recover when the object is lost.

    Even though it is a launch-time parameter of BLORT the run-time design pattern called

    State[14,p. 305] brings convenience to the implementation and future use.

    tracking

    The full-featured version of BLORT. When BLORT is launched intrackingmode it will recover

    (or initialize) when needed and track continuously.

    Figure 3.12: Diagram of the tracking mode

    33

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    43/89

    singleshot

    When launched insingleshotmode, BLORT will initialize using thedetector modulethen refine

    the gained pose by launching the tracker moduleonly when queried for this service through a

    ROS service interface. The result of one service call is one pose, or an empty answer if the pose

    estimation failed due to unpresent object or bad detection.

    Figure 3.13: Diagram of the singleshot mode

    34

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    44/89

    3.9 Pose estimation results and ways for improvement

    The goal of this thesis was to find the way to start with tabletop object grasping on the REEM

    robot and to provide an initial solution to it.

    Table4.1shows detection statistics of BLORT in a few given scenes. The average pose

    estimation time was between 3 seconds and 5 seconds.

    Since the part which takes most of the CPU time is the RANSAC algorithm inside the de-

    tectormodule it is desirable to decrease the number of extracted SIFT (or any) features.

    Most of the failed attempts were caused by matching the bottom or the top of the boxes to

    the wall or any untextured surface. In these cases the detector made a mistake by initializing the

    tracker with a wrong pose but the tracker was satisfied with it because the edge-based matching

    (requiring texture) was perfect. It would be useful to provide a way to block specific faces of

    the object in case they are low textured.

    3.10 Published software, documentation and tutorials

    All software developed for BLORT were published open-source on the ROS wiki and can be

    found at the following link:

    http://www.ros.org/wiki/perception_blort

    It is a ROS stack which consists of 3 packages.

    blort: holds the modified version of the original BLORT sources used as a library.

    blort ros: contains the nodes using the BLORT library, completely separate from it.

    siftgpu: a necessary dependency of the blortpackage.

    The codes are hosted at PAL Robotics public github account at https://github.com/

    pal-robotics.

    35

    http://www.ros.org/wiki/perception_blorthttps://github.com/pal-roboticshttps://github.com/pal-roboticshttps://github.com/pal-roboticshttps://github.com/pal-roboticshttp://www.ros.org/wiki/perception_blort
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    45/89

    (a) (b) (c)

    (d) (e)

    Figure 3.14: Screenshots of the ROS wiki documentation

    Links to documentation and tutorials:

    BLORT stack: http://ros.org/wiki/perception_blort

    blort package: http://ros.org/wiki/blort

    blort ros package: http://ros.org/wiki/blort_ros

    siftgpu package: http://ros.org/wiki/siftgpu

    Training tutorial: http://www.ros.org/wiki/blort_ros/Tutorials/Training

    Track and detect tutorialhttp://www.ros.org/wiki/blort_ros/Tutorials/

    TrackAndDetect

    36

    http://ros.org/wiki/perception_blorthttp://ros.org/wiki/blorthttp://ros.org/wiki/blort_roshttp://ros.org/wiki/siftgpuhttp://www.ros.org/wiki/blort_ros/Tutorials/Traininghttp://www.ros.org/wiki/blort_ros/Tutorials/TrackAndDetecthttp://www.ros.org/wiki/blort_ros/Tutorials/TrackAndDetecthttp://www.ros.org/wiki/blort_ros/Tutorials/TrackAndDetecthttp://www.ros.org/wiki/blort_ros/Tutorials/TrackAndDetecthttp://www.ros.org/wiki/blort_ros/Tutorials/Traininghttp://ros.org/wiki/siftgpuhttp://ros.org/wiki/blort_roshttp://ros.org/wiki/blorthttp://ros.org/wiki/perception_blort
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    46/89

    How to tune? tutorial: http://www.ros.org/wiki/blort_ros/Tutorials/

    Tune

    3.11 Hardware requirementsBLORT requires an OpenGL supported graphics card with GLSL = 2.0 (OpenGL Shading

    Language) for running the paralellized image processing in the trackermodule and also in the

    detectormodule by SiftGPU to extract SIFTs fast.

    3.12 Future work

    Use a database to store learned object models. This could also be used to interface with

    other object detection systems.

    SIFT dependency: Remove the mandatory usage of SiftGPU and SIFT in general. Provide

    a way to use different keypoint extractor/detector techniques.

    OpenGL dependency: It would be elegant to have build options which also support CUDA

    or non-GPU modes.

    37

    http://www.ros.org/wiki/blort_ros/Tutorials/Tunehttp://www.ros.org/wiki/blort_ros/Tutorials/Tunehttp://www.ros.org/wiki/blort_ros/Tutorials/Tunehttp://www.ros.org/wiki/blort_ros/Tutorials/Tune
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    47/89

    Chapter 4

    Increasing the speed of pose estimation

    using image segmentation

    4.1 Image segmentation problem

    Image processing operators such as feature extractors are usually quite expensive in terms of

    computation time therefore it is often benefitially to limit their operating space. Much faster

    speed can be achieved by limiting the space of frequently called costly image processing oper-

    ators. The question of possible ways arises here.

    Trying to copy nature is usually a good way to start in engineering. Lets think of how our

    image processing works. Human perception tries to keep things simple and fast while the brain

    only provides a tiny part of itself to do it. The way how our perception works is that most of the

    information we receive through our eyes is disposed of by the time it reaches our brains. The

    information that actually reaches the brain is based around a certain area of our vision with high

    detail called focus point while we only get highly sparse information about other areas. In this

    chapter the same approach is followed to increase the speed of image-based systems - in this

    case more focused on boostingBLORT.

    In order to limit the operating space the input image needs to be segmented. Segmentation

    can be done via direct masking by painting the masked regions of the image to some color or

    by assigning a matrix of 0s and 1s as mask to the image marking the valid and invalid pixels

    and carrying this mask along with the image.

    In general it requires a priori knowledge to know which areas of the input are interesting for

    a specific costly operator. Most of the time it depends on the actual application environment

    that is defined by hardware, software, camera and physical environments. The result of the

    segmentation is a mask which in the end will be used to indicate which pixels are valid for

    38

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    48/89

    further analysis and which are invalid. Formally it could be written as

    O[i, j] = image operator(I , i , j), whereM[i, j] == valid

    = extremal element, whereM[i, j] == invalid(4.1)

    whereO is the output image, Iis the input image,i, j are current indices,Mis the mask while

    image operator and extremal element are depending on the current use-case.

    As it plays an important role in Computer Vision, image segmentation is a strong tool of

    medical imaging, face and iris recognition and agricultural imaging as well as image operator

    optimization.

    4.2 Segmentation using image processingAfter creating a mask based on a specific method pointilized errors should be eliminated. This

    step is done by running an erodeoperator which is used in image processing. Erodeis using a

    binary image therefore during this step all pixels of the target color (or class) will be trialed for

    survival. Figure4.1shows an example and the way erosion trial works on pixel-level.

    (a) Original (b) Result (c) Structuring element

    Figure 4.1: Example of erosion the black pixel class were eroded

    In order to make sure that the mask was not minimized too much a dilatestep may be done.

    Aserodebefore thedilateoperator is working on a binary image but trials all non-target pixels

    for survival. Figure4.2shows an example and the way dilation trial works on pixel-level.1

    1Figures in this section were taken from http://docs.opencv.org/doc/tutorials/imgproc/

    erosion_dilatation/erosion_dilatation.html and[10]

    39

    http://docs.opencv.org/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.htmlhttp://docs.opencv.org/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.htmlhttp://docs.opencv.org/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.htmlhttp://docs.opencv.org/doc/tutorials/imgproc/erosion_dilatation/erosion_dilatation.html
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    49/89

    (a) Original (b) Result (c) Structuring element

    Figure 4.2: Example of dilation where the black pixel class were dilated

    By combiningerodeanddilatepoint-like noise can be eliminated and masking errors can be

    fixed in an adaptive way to reduce mask noise. The parameters of the two operators are exposed

    to the end-user and they can be tuned in run-time.

    4.3 ROS node design

    Since the segmentation tasks well defined the same ROS node skeleton can be used to imple-

    ment all segmentation methods. This node has two primary input topics: image, and camera

    info. The latter here holds the camera parameters and is published by the ROS node responsible

    for capturing images. The output topics of the node are: a debug topic which holds information

    on the inner working (eg: correlation map), a masked version of the input image and a mask.

    For efficiency the node is designed in a way that messages on these topics are only published

    when there is at least one node subscribing to them. For this reason the debug topic is usually

    empty.

    Figure 4.3: ROS node design of segmentation nodes

    40

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    50/89

    It was mentioned before that the parameters oferode and dilate operators need to be ex-

    posed. This is solved through a dynamic reconfigure 2 interface provided by ROS. For each

    oferode and dilate the number of iterations and the kernel size can be set. An extra parame-

    ter threshold was included because segmentation methods often use at least one thresholding

    operator inside.

    Figure 4.4: Parameters exposed through dynamic reconfigure

    4.4 Stereo disparity-based segmentation

    4.4.1 Computing depth information from stereo imaging

    (a) Left camera image (b) Right camera image (c) Computed disparity image

    (d) Computed depth map that

    matches the dimensions of the left

    camera image

    Figure 4.5: Example of stereo vision

    2

    http://ros.org/wiki/dynamic_reconfigure

    41

    http://ros.org/wiki/dynamic_reconfigurehttp://ros.org/wiki/dynamic_reconfigure
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    51/89

    In Figure4.5the disparity image is computed by matching image patches between the images

    captured by the two cameras. Subfigured in4.5 shows a depth map with each pixel colored

    accordingly to its estimated depth value. Black regions have unknown depth. The major draw-

    back of stereo cameras compared to RGB-D sensors is that while depth images acquired from

    RGB-D sensors are continuous, stereo systems tend to have holes in the depth map where no

    depth information is available. This effect is usually due to the fact that stereo cameras operate

    using feature detection and matching while most RGB-D cameras use light-emitting techniques.

    Depth map holes are acquired of regions where no features could be extracted because of tex-

    turelessness. The texturelessness problem is solved by RGB-D techniques by emitting a light

    pattern onto the surface and determining the distortion of these.

    4.4.2 Segmentation

    After obtaining a depth image it is not enough to create a mask based on the distance values

    of single pixels. These masks would reflect the raw result of the segmentation however further

    steps could be done to refine them.

    Distance-based segmentation is good but not good enough in itself. Even though some parts

    of the input image is usually omitted it can still forward too much unwanted information to a

    costly image-processing system. Images of experiments are shown in Figure4.6. Segmentation

    steps can be organized in a pipeline fashion so the obtained result is an aggregate of masks

    computed using different techniques.

    Figure 4.6: Masked, input and disparity images

    4.5 Template-based segmentation

    4.5.1 Template matching

    Template-matching is a common way to start with object detection but rarely yields success as

    a standalone solution. It is perfect to search for a subimage in a big image but the matching

    often fails when the pattern is from different source.

    42

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    52/89

    The most straightforward approach to template matching is image correlation. The output of

    image correlation is a correlation map which is usually represented as a floating point single-

    channel image that is of the same size as the image scanned and where the value of one pixel

    holds the result of the image-subimage correlation centered around that position.

    OpenCV has a highly optimized implementation for template matching where several dif-

    ferent correlation methods can be chosen. 3

    (a) Debug image showing the window around the

    target

    (b)

    Template

    (c) Masked image (d) Mask

    Figure 4.7: Template-based segmentation

    4.5.2 Segmentation

    Irrelevant regions can be masked by thresholding the correlation map with a certain limit and

    using this as the final mask. For tuning conveniency and noise issueserodeanddilateoperations

    can also be used.

    3OpenCV documentation: http://docs.opencv.org/modules/imgproc/doc/object_

    detection.html?highlight=matchtemplate

    43

    http://docs.opencv.org/modules/imgproc/doc/object_detection.html?highlight=matchtemplatehttp://docs.opencv.org/modules/imgproc/doc/object_detection.html?highlight=matchtemplatehttp://docs.opencv.org/modules/imgproc/doc/object_detection.html?highlight=matchtemplatehttp://docs.opencv.org/modules/imgproc/doc/object_detection.html?highlight=matchtemplate
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    53/89

    4.6 Histogram backprojection-based segmentation

    4.6.1 Histogram backprojection

    Calculating a histogram of an image is a fast operation and can serve with pixel-level statistical

    information. Luckily this type of information is also often used to solve pattern matching in

    a relatively simple way. It is based on the assumption that similar images or sub-images often

    have similar histograms especially when these are normalized.

    OpenCV provides an implementation for histogram backprojection where the target his-

    togram (the pattern in this case) is backprojected to the scanned image and an correlation

    image can be computed. This result image will indicate how well the target and sub-image

    histograms are matching therefore a maximum search will find the best matching region.

    (a) Input camera image (b) Histogram backprojection result (c) Masked image

    Figure 4.8: The process of histogram backprojection-based segmentation

    Figure4.8shows the results of experiments where texture information is used that was cap-

    tured during the training of BLORT. At startup the segmentation node reads the image and

    computes its histogram. Later on when an input image is received the node uses the computed

    histogram and backprojects it onto the input images histogram.

    4.6.2 Segmentation

    The noise level of these results are not relevant therefore erodesteps are not necessary here but

    to enlarge the valid regions of the mask the dilateoperator can still be used. The parameters are

    - as before - exposed through configuration files.

    Experiments have shown that histogram-backprojection works far more precisely and faster than

    the pixel correlation-based template matching approach. Figure4.9shows an experiment where

    the pattern was the orange ball that can be seen in the upper-right corner and the on the left is

    an image masked according to the result of the histogram backprojection. Image correlation-

    based matching usually fails when using different light conditions than the one the pattern was

    44

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    54/89

    captured with. It can be seen that histogram-backprojection is more robust to changes in light

    conditions.

    Figure 4.9: Histogram segmentation using a template of the target orange ball

    45

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    55/89

    4.7 Combined results with BLORT

    By masking the input image of BLORT nodes the overall success rate was increased. The key

    of this success was to control the ratio of inliers and outliers inserted into the RANSAC method

    inside the detector module. By manipulating the features handled by RANSAC in a way thatthe Inliers

    Outliersratio increases the overall success rate and speed can be enhanced. However this

    ratio can not be increased directly but it can be manipulated by decreasing the overall number

    of extracted features while trying to keep the ones coming from the object. A good indicator

    number is the ratio ofObject SIFTs - the features matching the codebook - and All SIFTs ex-

    tracted from the image.

    (a) Left camera image (b) After stereo segmentation (c) After histogram segmentation

    (d) Detector result (e) Tracker result

    Figure 4.10: The segmentation process and BLORT

    This approach proved useful when BLORT is deployed in a noisy environment. To demon-strate this measurements were taken from a sample of 6 scenes a 100 times each. Table4.1

    shows the effectiveness of each segmentation method averaged from all scenes. The timeout

    parameter ofBLORT singleshotwas set to 120 seconds. It can be seen that the speed and suc-

    cess rate of BLORT was dramatically increased by segmenting the input especially when using

    different techniques.

    46

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    56/89

    Method used Extracted features ObjectSIFTsAllSIFTs

    Success rate Worst detection time

    Nothing 4106 524106

    14% 101s

    Stereo-based 2287 532287

    41% 64s

    Matching-based 3406 323406

    31% 74s

    Histogram-based 1220 501220

    50% 32s

    Stereo+histogram hybrid 600 52600

    82% 20s

    Table 4.1: Effect of segmentation on detection

    For the test scenes depicted in Figure4.11the pose of the object was estimated using an

    Augmented Reality marker and its detector.

    (a) (b) (c)

    (d) (e) (f)

    Figure 4.11: Test scenes

    4.8 Published software

    All software developed for BLORT were published open-source on the ROS wiki and can be

    found at the following link:

    http://www.ros.org/wiki/pal_vision_segmentation

    47

    http://www.ros.org/wiki/pal_vision_segmentationhttp://www.ros.org/wiki/pal_vision_segmentation
  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    57/89

    Figure 4.12: Screenshot of the ROS wiki documentation

    4.9 Hardware requirements

    There are no special hardware requirements for these nodes.

    4.10 Future work

    Future work on this topic may include the introduction of other pattern-matching techniques or

    even new sensors. Also most works marked as detectors in Chapter 2like LINE-Mod can be

    used for segmentation as long as it is reasonable in terms of computation time.

    48

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    58/89

    Chapter 5

    Tracking the hand of the robot

    5.1 Hand tracking problemThe previous chapters of this thesis are about estimating the pose of the target objects which

    is necessary for grasping but when considering the grasping problem1.7in full detail and the

    visual servoing problem1.8it is necessary to be able to track the robot manipulator - the hand

    in this case.

    A reasonable approach could be to use a textured CAD model to track the hand but the

    design of REEM does not have any textures on the body by default. To overcome this prob-

    lem the marker-based approach was selected. Augmented Reality applications already featuremarker-based detectors and trackers therefore it is worthwhile to test them for tracking a robot

    manipulator.

    49

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    59/89

    5.2 AR Toolkit

    As its name indicates the Augmented Reality Toolkit [1] was designed to support applications

    implementing augmented reality. It is an open-source project widely known and supported. It

    provides marker design and software to detect and give a pose estimation of these markers in3D space or to compute the viewpoint of the user.

    (a) (b)

    Figure 5.1: ARToolkit markers

    The functionality is implemented using edge- and corner-detection techniques. A marker is

    defined by its black frame while the inside of the frame serves as the identifier of the marker

    and as a primary indicator of orientation. Detection speed is increased to that of a usual CPU-

    based implementation by the usage of OpenGL and the GPU. Despite being faster than the usual

    CPU-based implementations using the GPU can also yield problems when the target platform

    does not have such a unit or it is being exclusively used by other components.

    The AR Toolkit is already available in ROS wrapper so it is straightforward to integrate it

    with a robot running with ROS.

    50

  • 7/25/2019 Implementing visual perception tasks for the REEM robot

    60/89

    (a) Using an ARToolki