autonomous embedded stereo vision on mobile robots for ... · unlike digital camera sensors the...

52
Autonomous embedded Stereo Vision on mobile Robots for Distance Learning eingereichte BACHELORARBEIT von cand. ing. Enrico Boner geb. am 21.01.1990 wohnhaft in: Landshuter Allee 102 80637 M¨ unchen Tel.: 0171 7839371 Lehrstuhl f¨ ur STEUERUNGS- und REGELUNGSTECHNIK Technische Universit¨atM¨ unchen Univ.-Prof. Dr.-Ing./Univ. Tokio Martin Buss Univ.-Prof. Dr.-Ing. Sandra Hirche Betreuer: Dipl.-Inf. Nicolai Waniek Beginn: 22.04.2013 Zwischenbericht: 20.06.2013 Abgabe: 02.09.2013

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

Autonomous embedded Stereo

Vision on mobile Robots for

Distance Learning

eingereichte

BACHELORARBEITvon

cand. ing. Enrico Boner

geb. am 21.01.1990

wohnhaft in:Landshuter Allee 102

80637 MunchenTel.: 0171 7839371

Lehrstuhl furSTEUERUNGS- und REGELUNGSTECHNIK

Technische Universitat Munchen

Univ.-Prof. Dr.-Ing./Univ. Tokio Martin Buss

Univ.-Prof. Dr.-Ing. Sandra Hirche

Betreuer: Dipl.-Inf. Nicolai WaniekBeginn: 22.04.2013Zwischenbericht: 20.06.2013Abgabe: 02.09.2013

Page 2: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason
Page 3: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

Abstract

Computer stereo vision is the extraction of depth information from two images fromvantage points on a scene. Therefor, the standard techniques calculate the imagedisparities of two simultaneously recorded pictures.

In this bachelor thesis a pair of embedded dynamic vision sensors should be usedto perform stereo vision. These sensors are biologically inspired and operate in asimilar way as the human eye. Instead of full image frames, these sensors transmitan asynchronous stream of events that are caused by the temporal change of illumi-nation at a pixel.

This thesis presents several stereo matching algorithms for the event based frame-work of the embedded dynamic vision sensors. The disparity information is obtainedwith uncalibrated sensors and without a rectification of the transmitted data.

Afterwards the performance of the presented algorithms will be evaluated and dis-cussed.

Zusammenfassung

Unter Computer Stereo Vision versteht man die Extraktion von Tiefeninformatio-nen aus Bildern der gleichen Umgebung von verschieden Blickwinkeln. Die Stan-dardtechniken berechnen dafur die Disparitaten aus zwei gleichzeitig aufgenommenBildern.

In dieser Bachelor Thesis sollen zwei ”Embedded Dynamic Vision Sensor” benutztwerden um die Disparitat zu berechnen. Diese Sensoren sind biologisch inspiri-ert und funktionieren in einer ahnlichen Art und Weise wie das menschliche Auge.Anstatt ganze Bilder zu ubertragen, wird bei diesen Sensoren ein asynchroner Stroman Events ubertragen, welche durch die zeitliche Anderung der Beleuchtung an einerbestimmten Stelle ausgelost werden.

In dieser Thesis werden einige Stereo Matching Algorithmen fur das eventbasierteSystem des ”Embedded Dynamic Vision Sensors” prasentiert. Die Disparitat wirdmit unkalibrierten Sensoren und ohne Entzerrung der ubermittelten Daten gewon-nen.

Anschließend werden die einzelnen Algorithmen evaluiert und diskutiert.

Page 4: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

2

Page 5: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

CONTENTS 3

Contents

1 Introduction 5

1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Hardware and Software 7

2.1 Dynamic Vision Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.1 Key Specifications . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Embedded Dynamic Vision Sensor . . . . . . . . . . . . . . . . . . . . 92.3 Stationary Stereo eDVS-Board . . . . . . . . . . . . . . . . . . . . . . 102.4 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Disparity Computation 13

3.1 Adress Event Representation . . . . . . . . . . . . . . . . . . . . . . . 133.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Frame-Based Stereo Matching Algorithm . . . . . . . . . . . . . . . . 153.4 Event-Time Based Stereo Matching Algorithm . . . . . . . . . . . . . 183.5 Area-Based Stereo Matching . . . . . . . . . . . . . . . . . . . . . . . 20

3.5.1 Frame-Area Based Stereo Matching Algorithm . . . . . . . . . 203.5.2 Event-Area Based Stereo Matching Algorithm . . . . . . . . . 21

3.6 Event-Vector Based Stereo Matching Algorithm . . . . . . . . . . . . 22

4 Results and Evaluation 25

4.1 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Frame-Based Stereo Matching . . . . . . . . . . . . . . . . . . . . . . 264.3 Event-Time Based Stereo Matching . . . . . . . . . . . . . . . . . . . 294.4 Area-Based Stereo Matching . . . . . . . . . . . . . . . . . . . . . . . 33

4.4.1 Frame-Area Based Stereo Matching . . . . . . . . . . . . . . . 334.4.2 Event-Area Based Stereo Matching . . . . . . . . . . . . . . . 34

4.5 Event-Vector Based Stereo Matching . . . . . . . . . . . . . . . . . . 35

5 Discussion 39

5.1 Frame-Based Stereo Matching . . . . . . . . . . . . . . . . . . . . . . 395.2 Event-Time Based Stereo Matching . . . . . . . . . . . . . . . . . . . 405.3 Area-Based Stereo Matching . . . . . . . . . . . . . . . . . . . . . . . 405.4 Event-Vector Based Stereo Matching . . . . . . . . . . . . . . . . . . 41

Page 6: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

4 CONTENTS

6 Conclusion 43

List of Tables 45

List of Figures 47

Bibliography 49

Page 7: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

5

Chapter 1

Introduction

In many robot applications the knowledge of the distance to nearby objects is de-sired. There are several ways to achieve that. In the most industrial applications theoptical distance measurement is used. The reason why this is the most widespreadway to measure the distance, is that it is not computationally intensive. Hence canbe implemented in an embedded system with a small form factor. To measure dis-tances with this method, laser light is emitted towards and reflected by an object.Thereby, it is possible to determine the distance of the item by using the speed oflight and the measured time of flight. This is an very accurate but not very biolog-ically inspired solution to solve this problem.

An alternative way to compute the distance of items is to use computer stereo vi-sion. That comes close to the manner how humans and the most animals obtain thedistance of an object. Two cameras are used as a replacement for the human eyes.In traditional computer stereo vision, two cameras, displaced horizontally from eachother are used to obtain two images from vantage points on a scene. Thereby, thedepth information can be obtained by comparing these two images. The image dis-parity of an object is inverse proportional to the distance of the object. Although itis an easy and subconsciously operating task to estimate the distance of seen objectsfor humans. To perform the same task with two cameras and a computer is a verychallenging problem. The reason for this is that conventional cameras are deliveringan huge amount of redundant information. Therefore, the most algorithms to esti-mate the distance of objects that are recorded with two cameras are too processingpower demanding to run in real-time.

In this bachelor thesis a pair of embedded dynamic vision sensors should be usedto calculate the image disparity of recorded objects. These sensors are operating ina similar way as the human eye. They are transmitting an asynchronous stream ofevents that are caused by change of illumination at certain areas.

Page 8: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

6 CHAPTER 1. INTRODUCTION

1.1 Related Work

Recently, there have been some publications which are dealing with stereo matchingof the asynchronous events of the dynamic vision sensor. An publication that usethe output of two calibrated dynamic vision sensor an creates frames of the receiveddata is shown in [KSK09]. The depth information is gained with a classic frame-based stereo matching approach.

Paul Rogister, Ryad Benosman, Soi-Hoi Ieng, Patrick Lichtensteiner and TobiasDelbruck showed an efficient solution for an event-based stereo matching with theaid of the epipolar geometry [RBI+12].

”The epipolar geometry is the intrinsic projective geometry between two views. It isindependent of scene structure, and only depends on the cameras’ internal parame-ters. The Fundamental Matrix F encapsulates this intrinsic geometry.” [HZ04].F can be calculated with a set of at least 7 corresponding points of two images. Foran event at a certain pixel the epipolarline contains all possible matches in the othersensor. With the knowledge of the epipolarline and the time of occurrence of theevents they were able to draw conclusions about corresponding events.

A drawback of the of implementation with epipolar geometry is that the Fundamen-tal Matrix have to be determined for every buildup and sensor configuration. Thevision senors have to stay in exactly the same orientation otherwise the Fundamen-tal Matrix loses its validity. If it is the case that the cameras change their positionrelative to each other, it is necessary to recalculate the Fundamental Matrix. The in[KSK09] presented algorithms need also a calibration and a rectification step beforethe stereo matching can be performed.

This thesis deals with a solution for the disparity matching with uncalibrated sen-sors. Therefore, the epipolar geometry is not adequate. The advantage of an algo-rithm for uncalibrated sensors is that nothing has to be changed if other dynamicvision sensors or other lenses are used. The presented approaches work independentof the buildup of the senors. Thereby, such an algorithm can easily be applied if itis necessary for an application to use an other buildup or different sensors.

Page 9: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

7

Chapter 2

Hardware and Software

2.1 Dynamic Vision Sensor

The most used digital image sensor are based on CCD or CMOS technology. Thesevision sensors are transmitting a continuous stream of full image frames at fixedtime intervals. Typically frame rates for these kind of sensors are 30 or 60 framesper second. At each time frame the color and intensity for all pixels are evaluatedand transmitted. A drawback of these camera sensors is that because the changes inthe recorded area are normally very low between two pictures, these frames containan enormous amount of redundant information.

The dynamic vision sensor (DVS128) had been developed at the Institute of Neu-roinformatics, UZH and ETH Zurich[LPDM].Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason that this sensor is often referred to as silicon retina.The dynamic vision sensor is based on the CMOS technology similar to many con-ventional vision sensors. The schematic of a single Pixel of the Sensors can be seenin figure 2.1.

Figure 2.1: Pixel circuit

Page 10: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

8 CHAPTER 2. HARDWARE AND SOFTWARE

Unlike digital camera sensors, the DVS128 transmits events that are caused by thetemporal change of illumination at a pixel, instead of transmitting full image frames.Every pixel of the sensor acts completely independent. Once the illumination inten-sity of a pixel compared to a previous value exceeds or falls below a specific thresholdan event is generated. If the illumination exceeds the threshold an so called on eventis generated. A fast decrease of brightness cause an off event. The threshold foron and off events can be set individual. This threshold is a relative value of a pre-vious reference value of illumination. If an event is generated the current value ofillumination intensity is stored as the new reference value. This working principledrastically reduces the amount of transmitted data compared to other CMOS visionsensors.

Each event contains information about the x and y position of the event on thepixel matrix and one bit is used to note whether illumination falls below or exceedsthe threshold. Furthermore, it is possible to add a timestamp with a temporal res-olution of 1µs to the event data.

Figure 2.2 shows a comparison from a smartphone camera and a DVS128. Therecorded area is the same for both. A hand is moved trough the field of view. Theright picture shows all events from a DVS128 sensors appearing in a period of 50ms. On events are visualized as green dots and off events are visualized as red dots.

(a) Moving hand recorded with asmartphone camera

9.1898s (Event-Time)

20 40 60 80 100 120

20

40

60

80

100

120

(b) All events from a DVS128 appearing in a period of 50ms

Figure 2.2: Comparison of a smartphone camera and a DVS128

Page 11: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

2.2. EMBEDDED DYNAMIC VISION SENSOR 9

2.1.1 Key Specifications

The DVS128 sensors has a resolution of 128x128 pixels. The power consumptionof the sensor itself is just 23 mW. Despite the low power consumption a minimumresponse time 15 µs and timing precision of 1µs can be achieved. Because all pixelswork autonomous and are locally sensitive to relative change of intensity, this sensorshas a large dynamic range of 120 dB. The maximum transfer rate is approximately1 million events per second.

2.2 Embedded Dynamic Vision Sensor

The embedded dynamic vision sensor boards ”eDVS128” (Figure 2.3) used in thisproject mainly consists of the DVS128 sensor, a lens and an ARM-microcontroller(NXP LPC2106). The lens is used to adjust the focus and the field of view of thesensor. The 32 bit ARM-microcontroller has an 256 kbyte program flash memory.The microcontroller receives the events that are transmitted from the DVS128 sensorand buffers them in its 64 kbyte internal RAM. The microcontroller offers multiplecommunication ports(TWI, SPI and UART). The maximum power consumption ofthe eDVS128 board is less than 200 mW.

Figure 2.3: basic components of the eDVS-board [CBCD09]

Page 12: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

10 CHAPTER 2. HARDWARE AND SOFTWARE

2.3 Stationary Stereo eDVS-Board

To record objects with two eDVS-boards and to make sure that these eDVS128stay at the same distance and in the same direction, a board was build with theLasercutter VersaLaser VLS2.30 at the NST (Figure 2.4 and Figure 2.5). The groundplate of this board has a size of 15cm by 5cm. On this board the two eDVS128 arepositioned centered at a distance of 10cm and are straightened that the camera axesare parallel. In the center between the two Sensor a cord is attached to the board tomeasure the the distance of recorded objects. For the communication of the eDVSboard with a Computer the UART communication port is used with a Baudrate of4 Mbaud.

eDVS1 eDVS2

Figure 2.4: Top view of the stereo eDVS-Board

Figure 2.5: Front view of the stereo eDVS-Board

Page 13: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

2.4. SOFTWARE 11

2.4 Software

All algorithms were implemented on a standard desktop computer with Windows7. In order to acquire the data stream of the two eDVS Boards a Java programhas been implemented. This program uses individual threads for the data aquisitionfrom the two sensors, to make sure that the transmitted data from the two em-bedded dynamic vision sensor are received simultaneously. The received data fromeach sensor is saved in a .txt file. The integrated development environment used toimplement the software was Eclipse 4.3.

All stereo matching algorithms that are introduced in this bachelor thesis are imple-mented in Matlab R2013a. Since it is not possible in Matlab to use multithreadingfor the data acquisition of the two eDVS an simultaneously perform the disparitycomputation, these Matlab implementations process the .txt files that are createdwith the Java program.

Page 14: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

12 CHAPTER 2. HARDWARE AND SOFTWARE

Page 15: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

13

Chapter 3

Disparity Computation

3.1 Adress Event Representation

The eDVS uses the Adress-Event-Represantation (AER) Protocol. It is an asyn-chronous handshaking protocol for the transmission of data between neuromorphicsystems. It ensures that the sender and the receiver only read or write from the buswhen they are allowed to. The idea of the AER protocol is that the bus is only usedif it is really necessary. An event in the AER protocol from one of the used eDVScan be described as

Events(x, y, t) =

{

+1 on event

−1 off event(3.1)

s =

{

1 eDVS1

2 eDVS2(3.2)

where x and y are the pixel coordinates of the occurred event on the eDVS pixelmatrix and t is the timestamp with an temporal resolution of ∆t=1 µs

3.2 Challenges

With the events from just one or two milliseconds from a eDVS it is not possibleto draw conclusions about the recorded objects. A reason for that is, if an edge ismoving through the recorded area, the events for these changes of illumination thatare caused by the edge are not necessarily detected from the sensor at the same time.This leads to more difficulties if two eDVS-sensors are simultaneously used. If theevents appearing in a short time interval from two simultaneously used eDVS arecompared, they are usually completely different. Figure 3.1 shows the Events fromboth sensors in a period of 2 ms. It can be seen that it is not possible to determineobjects or corresponding events. In figure 3.2 the same data is visualized but the

Page 16: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

14 CHAPTER 3. DISPARITY COMPUTATION

period for the frame is increased from 2 ms to 40 ms. In that figure it is possible todetermine objects and corresponding events.

eDVS1

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

Figure 3.1: Events in a period of 2 ms (red dots are off events, green dots are onevents)

eDVS1

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

Figure 3.2: Events in a period of 40 ms

Another challenge for the stereo matching is that the delivered data from the eDVScontain a significant amount of noise. In Figure 3.2 two arms which are movedthrough the recorded area can be clearly identified. Aside of the the arms the imagesare showing background noise. To make sure that these events are not consideredas possible matching candidates, the stereo matching algorithms have to deal with

Page 17: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

3.3. FRAME-BASED STEREO MATCHING ALGORITHM 15

noisy the data of the sensors and suppress the noise.

The event rate of the two used sensors is not equal. In a short time interval of 5 msit was measured that one sensor delivers up to 3 times more events than the othersensor. A detailed description of the behavior of the two used sensors is presentedin section 4.1

The lenses of the used eDVS have a slight barrel distortion. This means that atthe border of the recorded area, a straight edge in the scene would appear in thesensor stream as a curve. As the epipolar lines are not parallel it is possible that twocorresponding events appear in different horizontal lines in the sensor data stream.

3.3 Frame-Based Stereo Matching Algorithm

As the sensors are not delivering full image frames but just single events, the datastructure of the received data has to be changed into a frame format. Therefore,all events appearing in a period of 40 ms are used to create frames. These framesare implemented as a two dimensional array with a size of 128x128. One frame iscreated for each sensor. These frames are containing the sum of all event polaritiesfrom all events occurring in that period of 40 ms and at a certain pixel location. Aframe can be defined as

Frames(x, y) =

∫ ti+1

ti

Events(x, y, t)dt = with ti+1 − ti = 40ms (3.3)

s =

{

1 eDVS1

2 eDVS2(3.4)

For instance, 10 off events and 2 on events at the pixel coordinates x1 and y1 fromeDVS1 leads to

Frame1(x1, y1) = −8 (3.5)

These frames are not compared directly because they contain noise. In order toreduce the noise a two dimensional rotationally symmetric gaussian lowpassfilterwith the standard deviation 1 and a size of 5x5 (3.6) is used .

GaussianF ilter =

0.0030 0.0133 0.0219 0.0133 0.00300.0133 0.0596 0.0983 0.0596 0.01330.0219 0.0983 0.1621 0.0983 0.02190.0133 0.0596 0.0983 0.0596 0.01330.0030 0.0133 0.0219 0.0133 0.0030

(3.6)

Page 18: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

16 CHAPTER 3. DISPARITY COMPUTATION

An example that depicts the result of the gaussian filter for two example framescan be seen in Figure 3.3 at the top. These two pictures are showing the same timeframe from the two sensors.

The two pictures at the bottom of Figure 3.3 are showing the same sequence as thetwo picture at the top, but just with pixels that exceeds the threshold (3.7). Justthese pixel are used for the image disparity computation

Frames(x, y) >= 1.2 (3.7)

Blue areas representing a decline of illumination, orange or red areas are showingan increase of illumination.

eDVS1

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

eDVS1

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

Figure 3.3: Results of the Gaussian Filter (red areas are representing on events, blueareas are representing off events)

Page 19: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

3.3. FRAME-BASED STEREO MATCHING ALGORITHM 17

It is possible that two corresponding events do not appear in the same row in bothsensors, due to barrel distortion or slightly shifted justification of the sensors. De-spite that fact, only the same horizontal lines of the two frames are compared tocalculate the disparity. The reason for that is this reduces the computational costand because of the gaussian filter each row already contains information from theneighbor rows.

In order to compute the disparity an comparison array is created for each line withthe position and the polarity of event accumulations. This comparison array hasa size of 3xN . Where N is the number of event accumulations that exceed thethreshold in a specific horizontal line of a frame. The first row of this array is usedto note the Polarity of the event accumulation. The second row marks the x positionof the left margin of a cluster and the third row is used to store the x position ofthe right end of the accumulation. A visualization how these comparison arraysare created is shown in figure 3.4. The red and blue pixels are event accumulationswhich exceeds the threshold.

Figure 3.4: Comparison Array

Once the comparison arrays for a horizontal line are created, it is checked if thedimension N is the same for both comparison arrays. It is also checked if the entriesin the first row have the same sign. Only if these two conditions are fulfilled, thedisparity computation takes place. If the width of two corresponding event accu-mulations only differs by 30 percent, the center of the two corresponding clusters iscalculated. The disparity results from the difference of these two centers.

Page 20: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

18 CHAPTER 3. DISPARITY COMPUTATION

3.4 Event-Time Based Stereo Matching Algorithm

In the frame-based stereo matching approach all events over a defined time periodare collected in the frame arrays. Afterwards the disparity is computed with thesethen static frames.

In this event-time based approach there are also two dimensional arrays used forthe comparison. However, in this case the arrays are dynamic and change for ev-ery new occurring event. In the following these arrays are referred to as dynamicframes (3.9). If a new event with the pixel coordinates x and y occurs, the dynamicframe is updated at the position x and y and also the 8 surrounding pixels are up-dated. Therefor a two dimensional rotationally symmetric gaussian function withthe standard deviation 1 and a size of 3x3 (3.8) is used.

Gaussian3x3 =

0.0751 0.1238 0.07510.1238 0.2042 0.12380.0751 0.1238 0.0751

(3.8)

Every time an new event appears this two dimensional gaussian function is addedcentered at the pixel coordinates of the event. This update is defined as follows

dFrames

x− 1, y − 1 . . . x+ 1, y − 1...

. . ....

x− 1, y + 1 . . . x+ 1, y + 1

= dFrames

x− 1, y − 1 . . . x+ 1, y − 1...

. . ....

x− 1, y + 1 . . . x+ 1, y + 1

+ 2 ∗ Events(x, y, t) ∗Gaussian3x3

(3.9)

s =

{

1 eDVS1

2 eDVS2(3.10)

To reduce the influence of events over time the dynamic frame is inhibited withtime. Every 250 µs all values of the dynamic frames are multiplied with 0.99. Thismeans, if no event appears at a certain pixel location the value drops to 20 percentof it is default value in 40 ms.

Page 21: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

3.4. EVENT-TIME BASED STEREO MATCHING ALGORITHM 19

A visualization of the condition of the two dynamic frames at the same time can beseen in figure 3.5.

eDVS1

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

Figure 3.5: Visualization of the Dynamic Frames

This algorithm searches only in one direction. If a event from eDVS1 appears, thealgorithm searches a corresponding event from eDVS2. The disparity computationtakes place before the dynamic frame is updated with the new event. For thedisparity computation it is checked if value of the dynamic frame already exceeds acertain threshold. Only if that is the case then it is attempted to find a correspondingevent from the other sensor. Algorithm 1 presents the presents the correspondencesearch.

Algorithm 1 Event-Time Based Stereo Matching

Require: Event1(x, y, t)minimum← 1000if |dFrame1(x, y)| >= 0.8 then

Iterate trough events Event2(x2, y2, t2) with |t2− t| <= 300 µsif |dFrame2(x2, y2)| >= 0.8 and sign(dFrame2(x2, y2)) = Event1(x, y, t)and Event1(x, y, t) = Event2(x2, y2, t2) and |dFrame1(x, y) −dFrame2(x2, y2)| <= 0.3 then

if |y2− y| <= 1 and |x2− x1| <= minimum then

minimum← |x2− x1|disparity ← |x2− x1|

end if

end if

end iteration

end if

Page 22: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

20 CHAPTER 3. DISPARITY COMPUTATION

3.5 Area-Based Stereo Matching

Two versions of the area based stereo matching approach were implemented. Oneversion is frame-based similar to the approach presented in section 3.3. The otherimplementation is event-based like the algorithm in section 3.4.

The area based approach employs the neighbor pixels of the selected pixel and triesto match this block with a corresponding block of pixels from the other sensor.

To measure the similarity between two pixel blocks the Sum of Absolute Differences

(SAD) algorithm is used. It works by taking the absolute difference between eachpixel in the original block and the corresponding pixel in the block being used forcomparison. These differences are summed to create a simple metric of block simi-larity. The SAD algorithm for the same blocks of the two frames can be defined asfollows.

SAD =x+2∑

i=x−2

y−2∑

j=y−2

|Frame1(i, j)− Frame2(i, j)| (3.11)

Due to the justification with parallel camera axes, the x coordinate of event ineDVS2 has always a higher value than the x coordinate of a corresponding eventfrom eDVS1. Therefore, the matching algorithm searches in only one direction forpossible matches.

3.5.1 Frame-Area Based Stereo Matching Algorithm

The frames for this implementation are created in the same way as in the frame-basedapproach (section 3.3) described. There are also just pixels used for the disparitymatching that exceeds a certain threshold. If the pixel exceeds the threshold then ablock with an size of 5x5 is cutted out. The center of the block is the selected pixel.Then it is attempted to find a similar block in the frame from the other sensor. Thecenter pixel of the block with the least sum of absolute difference is considered asthe correct match. This is only done for one horizontal line.

Page 23: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

3.5. AREA-BASED STEREO MATCHING 21

The algorithm below depicts the frame-area based stereo matching.

Algorithm 2 Frame-Area Based Stereo Matching

Require: Frame1 and Frame2disparitymap← initialize with zeros

for y ← 1 to 128 do

for x← 1 to 128 do

minimum← 1000if Frame1(x, y) >= 2 and disparitymap(x, y) = 0 then

for x2← x to 128 do

SAD =∑x+2

i=x−2

∑y−2

j=y−2|dFrame1(i, j)− dFrame2(i+ (x2− x), j)|

if SAD < minimum then

disparity = x2− x1end if

end for

disparitymap(x− 2 . . . x+ 2, y − 2 . . . y + 2)← disparity

end if

end for

end for

3.5.2 Event-Area Based Stereo Matching Algorithm

This algorithm uses dynamic frames which are created as described in section 3.3.The block size is the same as in the frame-area based approach. The disparity iscomputed before the dynamic frame is updated with the new event. Below thestereo matching algorithm for an event from eDVS1 is presented. If an event fromeDVS2 occurs the algorithm is the same, but the corresponding event is sought innegative x-direction

Algorithm 3 Event Area Based Stereo Matching

Require: Event1(x, y, t)if |dFrame1(x, y)| >= 1.2 then

minimum← 1000for x2← x to 128 do

SAD =∑x+2

i=x−2

∑y−2

j=y−2|dFrame1(i, j)− dFrame2(i+ (x2− x), j)|

if SAD < minimum then

minimum← SAD

Disparity ← x2 − x

end if

end for

end if

Page 24: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

22 CHAPTER 3. DISPARITY COMPUTATION

3.6 Event-Vector Based Stereo Matching Algo-

rithm

An idea to resolve the disparity matching problem that was implemented, is tocompute vectors for each new event to other event accumulations in the same row.For the computation of the vectors dynamic frames are used as described in thesection 3.4. If a new event occurs it is checked if the dynamic frame already exceedsthe threshold. If that is the case. Then a vector is created with two elements. Thefirst entry defines the horizontal distance to the next pixel on the left in the dynamicframe that exceeds the threshold. The second entry indicates the distance to thenext pixel on the right in the dynamic frame that exceeds the threshold. The sign ofthe entries is used to note the polarity of the nearby clusters. The sign is negativefor off events and positive for on events. An example for that procedure is depictedin Figure 3.6.

Figure 3.6: Comparison Vector for Pixel 41

Page 25: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

3.6. EVENT-VECTOR BASED STEREO MATCHING ALGORITHM 23

The algorithm below presents the stereo matching algorithm for an event fromeDVS1. If an Event from eDVS2 occurs the algorithm is the same, but the cor-responding event is sought in negative x-direction.

Algorithm 4 Event Vector Based Stereo Matching

Require: Event1(x, y, t)if dFrame1(x, y) thenminimum← 1000vector1← vec(x, y) {create comparison Array for dFrame1(x, y)}for x2← x to 128 do

vector2← vec2(x2, y) {create comparison Array for dFrame2(x2, y)}SAD = |vector1(1)− vector2(1)|+ |vector1(2)− vector2(2)|if SAD < minimum then

minimum← SAD

disparity ← x2 − x

end if

end for

end if

The reason why only one row and just the next clusters are used for comparison,is that this dramatically reduces the computationally cost. And because of the2 dimensional gaussian filter each row already contains information about the theevents from neighbor rows.

Page 26: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

24 CHAPTER 3. DISPARITY COMPUTATION

Page 27: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

25

Chapter 4

Results and Evaluation

This chapter presents the results of the different approaches which were introducedin chapter 3. It also presents the influence of some variations of the algorithms. Forinstance, the usage of just on events, just off events or different frames for on andoff events.

4.1 Test Data

The different stereo matching algorithms for the event-based framework have beentested with a set of recordings. At the distances 50cm, 100cm and 150cm to thestereo eDVS board, an arm was moved to through the recorded area. Another threerecordings were made at the same distances but with two arms moving through therecorded area. At each distance the events from both sensors are recorded for 11 sec-onds. This chapter presents the results of the different stereo matching approacheswith the aid of this six recordings. It is not assured that the arms stay in exactlythe same distance to the stereo eDVS board during the whole recording. Hence thetrue disparity does not stay the same throughout a recording, but the differences arelow. Due to this fact, the calculated mean disparity and the standard deviation forthe different recordings gives a good mark about the quality of the algorithm. Thecomputational cost of the approaches is also compared with a data set. The runtime of each Matlab implementation have been measured without any visualizationof the data. The computation was executed on a desktop computer with an Intel R©Core 2 Quad Processor Q9550. To compare the run time the data set with 2 armsat a distance of 50 cm is used, since it is the test set with the highest amount oftransmitted data.

Page 28: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

26 CHAPTER 4. RESULTS AND EVALUATION

Figure 4.1 shows a visualization of the the test data set with two moving arms at adistance of 100 cm. The figure shows all events in a period of 40 ms

eDVS1

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

Figure 4.1: Visualization of the recording at a distance of 100 cm

The following table depicts the counted on and off events from the different testdata sets. As it can be seen, both sensors are more sensitive to on events than tooff events. The proportion of on and off events is not the same for both sensors. Onaverage, eDVS1 delivers 15% more on events than off events. eDVS2 generates 23%more on than off events on average. In all of the test recordings eDVS2 transmittedmore data than eDVS1. The sensor eDVS2 delivered between 7 and 28 percent moreevents than sensor eDVS1.

Test data eDVS1 on Events eDVS1 off Events eDVS2 on Events eDVS2 off Events

1 Arm 50 cm 199146 176505 160854 1334981 Arm 100 cm 93221 79848 77250 610101 Arm 150 cm 52178 45655 43006 352982 Arms 50 cm 323125 288549 308605 2619312 Arms 100 cm 174696 149009 158456 1286432 Arms 150 cm 77733 64721 72342 56745

Table 4.1: Number of events from the different test data

4.2 Frame-Based Stereo Matching

This sections presents the results of the in section 3.3 introduced frame-based stereomatching algorithm. Table 4.2 shows the results for the different test data sets. Theworst case run time as described in section 4.1 events is 8.23 seconds. That meansthis algorithm is able to process the data in real-time. In contrast to the event-basedmatching approaches, the processing time of this approach depends only slightly on

Page 29: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

4.2. FRAME-BASED STEREO MATCHING 27

the amount of events. For instance, the run time for the test data set with the leastnumber of events is 8.04 s.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 28.2166 4.86141 Arm 100 cm 13.2723 1.91611 Arm 150 cm 7.1364 1.57342 Arms 50 cm 26.6031 5.65682 Arms 100 cm 13.0704 4.31832 Arms 150 cm 9.1880 3.0485

Table 4.2: Frame-based on and off events share one frame

Figure 4.2 shows the result of the disparity computation for two example pictures.The two pictures at the top are showing the two frames that should be comparedwith just the pixels exceeding the threshold. On the right side at the bottom thesame picture as the right picture at the top can be seen, except that every pixelfor which the disparity computation algorithm found a match is removed. The leftpicture at the bottom emerge from the left image at the top and deleted pixels fromthe right picture. The pixel coordinates of every deleted pixel from the right imageat the bottom is taken and the calculated disparity is added to it. The pixel withthese resulting coordinates in the left picture at the bottom is set to an dark red.As it can be seen, there are no large deviations from the true disparity.

eDVS1

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

eDVS2

20 40 60 80 100 120

20

40

60

80

100

120

Figure 4.2: Disparity Computation

Page 30: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

28 CHAPTER 4. RESULTS AND EVALUATION

Since on and off events inhibit each other in the creation of a frame, this frame-basedalgorithm was also implemented with different frames for on and off events. Theresults are shown in table 4.3. Indeed, the standard deviation decreases if differentframes for on and off events are used. However, the computational cost increasesdrastically. This is because twice as many frames have to be compared. With twoindividual frames for on and off events the worst case run time rises to 14,21 s.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 30.0785 2.23631 Arm 100 cm 13.6243 0.92991 Arm 150 cm 7.5630 0.80642 Arms 50 cm 27.6117 4.11782 Arms 100 cm 13.1360 3.28142 Arms 150 cm 9.2020 2.9653

Table 4.3: Results of the frame-based approach if the on and off events are used indifferent frames

Table 4.4 and table 4.5 are presenting the results if just on or just off events areused for the disparity computation. Both results are better than the result with oneframe for on and off events. The usage of just off events for the disparity leads tothe best results of the four presented variations.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 30.1586 2.49351 Arm 100 cm 13.6397 1.00661 Arm 150 cm 7.5704 0.89822 Arms 50 cm 27.6117 4.29782 Arms 100 cm 13.1360 3.38122 Arms 150 cm 9.2020 3.0653

Table 4.4: Results of the frame-based approach with just on events

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 29.9886 1.90261 Arm 100 cm 13.6059 0.82861 Arm 150 cm 7.5513 0.63722 Arms 50 cm 27.6735 3.70022 Arms 100 cm 13.0572 3.08712 Arms 150 cm 9.5477 2.8079

Table 4.5: Results of the frame-based algorithm with just off events

Page 31: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

4.3. EVENT-TIME BASED STEREO MATCHING 29

4.3 Event-Time Based Stereo Matching

Hereinafter the results of the event-time based approach are presented. Table 4.6shows the results of the event-time based approach for the different test data sets. Itcan be seen that the results for one arm are just marginal worse than the results ofthe frame based approach. But if two arms are moved through the recorded area, thestandard deviation is significantly higher. This is due to the fact that the algorithmdoes not ensure that the calculated match for an event belongs to the same object.A visualization of that behavior is shown if Figure 4.5. The worst case run time ofthis approach was 6.52 s. Because of the event based framework, the computationalcosts decreases if less events are generated. The run time for the test set with theleast amount of events was only 3.34 s.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 30.0011 4.23621 Arm 100 cm 13.3670 2.06621 Arm 150 cm 8.5028 3.89072 Arms 50 cm 26.0117 9.33032 Arms 100 cm 15.9779 7.89632 Arms 150 cm 11.4819 6.4244

Table 4.6: Event-based with on and off events in one dynamic frame

Page 32: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

30 CHAPTER 4. RESULTS AND EVALUATION

Figure 4.3 depicts three different moments from a recording. In that recording ahand was moved from a distance of 30 cm to 100 cm. The left and the middlepicture are showing the condition of the two dynamic frames at the end of a periodof 40 ms. The right picture illustrates all calculated disparities in that time frame.The disparity values are visualized in a color coded way according to the color baron the right of the image. The default value for unprocessed background pixel is 0.

(a) near

(b) medium

(c) far

Figure 4.3: Visualization of the results from the event-time based approach

Page 33: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

4.3. EVENT-TIME BASED STEREO MATCHING 31

Figure 4.4. investigates the calculated disparities for the test data set with one armat a distance of 100 cm. The horizontal axis represents the time of occurrence of anevent in µs. The vertical axis indicates the calculated disparity in pixels. It can beseen that a single event is not very meaningful. Whereas, the mean disparity over ashort time interval provides a good approximation.

0 2 4 6 8 10 12

x 106

0

5

10

15

20

25

30

35

time in microseconds

disp

arity

Figure 4.4: Event-time based approach with 1 arm at 100 cm

Figure 4.5. shows the calculated disparities for the test data set with two arms at adistance of 100 cm. The deviations are in this case much higher. The reason for thatis that it is possible that an event from the left arm in dFrame1 can be matched tothe right arm in dFrame2. That is the case if no event from the left arm is createdin eDVS2 in the defined temporal interval, but an event from right arm with ansimilar dynamic frame value. However, as table 4.6 shows the mean disparity overtime still gives a good approximation.

0 2 4 6 8 10 12

x 106

0

10

20

30

40

50

60

time in microseconds

disp

arity

Figure 4.5: Event-time based approach with 2 arms at 100 cm

Page 34: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

32 CHAPTER 4. RESULTS AND EVALUATION

In the frame based approach the usage of separate frames for on and off eventsimmensely increases the computational cost. However, in this case the run time isnot changing significantly compared to the version with just a single dynamic framefor on and off events. Due to the fact that for the matching it is just checked ifthe dynamic frame exceeds a threshold at the position of an upcoming event. Sothe computational cost of the matching algorithm stays the same, just the creationand the inhibition of two additional dynamic frames increases the run time slightly.With two separate dynamic frames for on and off events the worst case run time is7,3 seconds. Just like for the frame-based approach the standard deviation decreasesif individual frames for on and off are used.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 29,5559 3,55561 Arm 100 cm 13,2413 2.10461 Arm 150 cm 7,5579 2,85932 Arms 50 cm 27.4953 8.54562 Arms 100 cm 14.4562 7.61212 Arms 150 cm 9.5646 6.3456

Table 4.7: Event-based different dynamic frames for on and off events

Table 4.8 and 4.9 are showing the outcomes if just on or off events are used tocompute the disparity. Similar to the frame based approach, the best result areachieved if just off events are used to compute the disparity.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 29.5131 4.11701 Arm 100 cm 13,2116 2,29721 Arm 150 cm 7,6049 2,77022 Arms 50 cm 27.3433 8.65642 Arms 100 cm 14.7372 7.64522 Arms 150 cm 10.4345 6.1242

Table 4.8: Event-time based with on events only

Page 35: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

4.4. AREA-BASED STEREO MATCHING 33

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 29,6071 3,34461 Arm 100 cm 13,2824 1,80421 Arm 150 cm 7,3514 3,21762 Arms 50 cm 27.6578 8.16322 Arms 100 cm 13.5896 7.56452 Arms 150 cm 9.9342 6.5645

Table 4.9: Event-based with off events only

4.4 Area-Based Stereo Matching

This section presents the evaluation of the image disparity matching with area basedstereo matching approaches described in section 3.5. It is also a slight variation ofthe algorithms introduced and evaluated.

4.4.1 Frame-Area Based Stereo Matching

As table 4.10 reveals the results for one arm in the recorded area are precise. Whereasthe test data set with lead to a high standard deviation. Due to the high amountof comparisons, the implementation of the algorithm is not able to process the datain real-time. The worst case run time was 32.98 s.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 29.1448 2.71391 Arm 100 cm 13.5993 1.13751 Arm 150 cm 7.5170 0.98302 Arms 50 cm 29.9583 15.61742 Arms 100 cm 17.2209 11.17102 Arms 150 cm 11.6246 8.2937

Table 4.10: Frame-area based with gaussian filter

Because of the gaussian function the details of the true transmitted data becomesblurred. This increases the possibility that a pixel block is matched to a similarpixel block from the false arm and leads to a large standard deviation.

Page 36: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

34 CHAPTER 4. RESULTS AND EVALUATION

Therefore, a modification of the algorithm was implemented to create more detaileddynamic frames. If an new event occurs, the dynamic frame is just changed at thepixel coordinates of a new event, instead of also increasing the 8 surrounding pixel.This update is defined as follows

dFrames(x, y) = dFrames(x, y)s + EventS(x, y, t) (4.1)

The matching algorithm stays the same except that the threshold was increased to4. Table 4.11 shows the achieved outcomes. It can be seen that this modification ofthe algorithm decreases the deviation if two arms are in the recorded area.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 29.3577 3.58021 Arm 100 cm 13.6206 1.33191 Arm 150 cm 7.3105 1.54182 Arms 50 cm 25.3144 10.18022 Arms 100 cm 12.9463 5.19432 Arms 150 cm 8.9476 4.9863

Table 4.11: Area-based without gaussian filter

4.4.2 Event-Area Based Stereo Matching

Table 4.12 introduces the outcomes of the event area based approach as describedin section 3.5.2. This implementation is very computationally intensive and was notable to process any of the recordings in real-time. The measured worst case runtime is 65,53 s.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 27.2442 3.31451 Arm 100 cm 13.0975 1.86011 Arm 150 cm 7.3984 0.75572 Arms 50 cm 20.9390 6.64552 Arms 100 cm 13.0454 3.20562 Arms 150 cm 9.8468 4.2576

Table 4.12: Event-area based with gaussian filter

Page 37: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

4.5. EVENT-VECTOR BASED STEREO MATCHING 35

The event area based algorithm was also tested without the usage of the gaussianfunction as described in the previous section. However, in the event-area basedapproach the usage of the gaussian function provides more precise results.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 29.0075 4.13001 Arm 100 cm 13.6646 2.10461 Arm 150 cm 7.9977 4.33232 Arms 50 cm 25.8478 12.88072 Arms 100 cm 15.9297 10.05142 Arms 150 cm 12.5303 10.071

Table 4.13: Event-area based without gaussian filter

4.5 Event-Vector Based Stereo Matching

The following table presents the outcomes of the in section 3.6 introduced algorithm.This algorithm is able to process the data in real-time. The measured worst caserun time is 10.43 s.

Test data Mean Disparity Standard Deviation of the disparity

1 Arm 50 cm 28.5471 3.55521 Arm 100 cm 12.7489 1.59971 Arm 150 cm, 6.488 1.52962 Arms 50 cm 23.6270 7.19172 Arms 100 cm 13.6411 5.62422 Arms 150 cm 9.3184 4.9024

Table 4.14: Event-area based with gaussian filter

Page 38: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

36 CHAPTER 4. RESULTS AND EVALUATION

Figure 4.6 depicts three different moments from a recording similar to figure 4.3but for the event-vector based approach. In that recording a hand was moved froma distance of 30 cm to 100 cm. The left and the middle picture are showing thecondition of the two dynamic frames at the end of a period of 40 ms. The rightpicture illustrates all calculated disparities in that period of time. The disparityvalues are visualized in a color coded way according to the color bar on the right ofthe image.

(a) near

(b) medium

(c) far

Figure 4.6: Visualization of the results from the event-area based approach

Page 39: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

4.5. EVENT-VECTOR BASED STEREO MATCHING 37

Figure 4.7 analyzes the calculated disparities for the test data set with one arm ata distance of 100 cm. The horizontal axis represents the time of occurrence of anevent in µs. The vertical axis marks the calculated disparity. The results for thistest data set is similar to the results of the event time based approach

0 2 4 6 8 10 12

x 106

5

10

15

20

time in microseconds

disp

arity

Figure 4.7: Event Vector Based with 1 arm at 100 cm

Figure 4.8 presents the calculated disparities for the recording with two arms at adistance of 100 cm. It can be seen that some a few large deviations appear. However,the large deviations are much less compared to the event time based approach whichare shown in figure 4.5.

0 2 4 6 8 10 12

x 106

0

10

20

30

40

50

60

70

80

time in microseconds

disp

arity

Figure 4.8: Event Vector based approach with 2 arms at 100 cm

Page 40: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

38 CHAPTER 4. RESULTS AND EVALUATION

Page 41: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

39

Chapter 5

Discussion

The results presented in chapter 4 need further explanation. In this chapter i willclarify the results in detail and show the limitations of the presented algorithms.Additionally, the different approaches will be evaluated with respect to their possiblescope of applications.

5.1 Frame-Based Stereo Matching

The main drawback of this approach is that a the major benefit of the asynchronousdata interface of the silicon retinas get lost. The delivered data from the eDVS offera temporal resolution of 1 µs. The algorithm translates the data into 25 frames persecond. This means the temporal resolution decreases from 1 µs to 40 ms. Thatcomes close to the data that conventional vision sensors deliver. However, the ad-vantage of no redundancy remains. Two successive frames contain no redundantinformation.

The computational intensity of this algorithm is almost independent of the amountof transmitted data. The downside of that fact is, if the event rate of the sensors islow, the algorithm still require a lot of computing power.

Due to the fact that the disparity is just computed if the number of objects in ahorizontal line is equal for both sensors. This approach delivers also good resultsif many object are moved simultaneous through the recorded area. Therefore, thisalgorithm is well suited for a mobile robot application. The reason for this is, if anrobot is rotating around its axis, every object in field of view simultaneously changeshis position in the image area. That leads to many simultaneously appearing eventsfrom different objects.

Page 42: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

40 CHAPTER 5. DISCUSSION

5.2 Event-Time Based Stereo Matching

The main idea of this algorithm is that, considering a short time interval, it is im-probable that two events that do not belong to the same object appear at the sametime and additionally have a similar dynamic frame value.

A problem with the event-time based stereo matching occurs if the stereo eDVSboard is moved and not just some objects in the recorded area. For that case thisalgorithm deliver bad results because the changes are in that case simultaneous inthe whole image. This is problematic if many objects are close together. Since, thepossibility is high that an event is not matched to the correct object but instead ismatched to the nearest object. As the algorithm selects in case of more than oneevent that fulfills the matching condition the nearest.

That algorithm is better suited for a buildup that stays in the same position. If justobjects are moving through the recorded area, the possibility is high that the foundmatch belongs to the same object as section 4.3 had proven.

A possibility to increase the performance of the event-time based approach in amobile robot application, would be to use an dynamic threshold that depends onthe instantaneous event rate. Since objects that are far away produce less eventsthan near object. That dynamic threshold could be used to suppress the events thatare caused from far away objects. If it can be achieved that only the nearest objectsexceed the threshold, this approach would be also suited for an robot application.

5.3 Area-Based Stereo Matching

Because of the many computational expensive block comparisons, this algorithm isthe highest computational cost of the presented algorithm presented in this thesis.As chapter 4 revealed the results of the frame-area based approach with more thanone object in the recorded area are significantly worse than the results of the framebased approach. The event-area based stereo matching algorithm with the aid ofthe gaussian function delivers results that are comparable to the other event-basedapproaches. However, the computational cost compared to the other event-basedstereo matching approaches is several times higher. Therefore, the area-based ap-proaches can be regarded as not appropriate for an efficient stereo matching of theasynchronous events of the dynamic vision sensor.

To reduce the computational intensity this algorithm was also tested with a blocksize of 4x4 and 3x3, but this greatly increases the number of mismatches. Forexample, a block size of 3x3 leads to an standard deviation of 22.2742 if the testdata set with two arms at a distance of 50 cm is used.

Page 43: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

5.4. EVENT-VECTOR BASED STEREO MATCHING 41

5.4 Event-Vector Based Stereo Matching

A problem with this algorithm is that it presupposes that the distance of objects inthe image area is similar in both sensors. However, the distance of objects in theimages do not have the same in both sensors. For instance, if two object appearnear to each other in the image area, and one object is near to the sensor and theother is far away. It is possible that the distance in the image area of the secondsensor is quite different.

Possible solutions to overcome this problem is to use more complex vectors. Forinstance, vectors to all event accumulations in a horizontal line, instead of just tothe next neighbors. This was also implemented, but rejected because it became tocomputational expensive.

Another drawback of this algorithm is that it is not guaranteed that the calculatedmatch is from the same object. But the chance is very high that it is matched tothe same object. Therefore a single calculated disparity is not very significant. Butif the disparity are collected over time and the mean disparity is taken. Then it isa good approximation

This approach is better suited for an implementation on a mobile robot than theevent time based approach. Because the occurring time has only conditionally in-fluence on the matching process. The temporal influence is limited to the inhibitionof the dynamic frames.

Page 44: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

42 CHAPTER 5. DISCUSSION

Page 45: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

43

Chapter 6

Conclusion

This thesis has shown different algorithms for the event-based framework of theeDVS. These algorithms are all designed to run with uncalibrated sensors. Theadvantages of these algorithms are that there is no need to determine the intrinsicparameters of the sensors and they can be easy applied to different buildups.

It was proven that the data from the eDVS can be translated into frames. Thatframes can be treated like the output of conventional vision sensor. This thesis hasshown an efficient way compute the image disparity with frames, which is able toprocess the data in real-time.However, the frame based approaches reduce the advantage of the silicon retinatechnology. Due to that fact, event based approaches were implemented to resolvethe stereo matching. The event based algorithms exploits the advantages of theeDVS technology and process the events directly when they appear.

Three event based variation were introduced. One approach uses the time of occur-rence of an event as the primary matching condition. A second introduced eventbased approach compares vectors to the neighbor event accumulations to find cor-responding events. The implementation of both algorithms are able compute theimage disparity in real-time. The third event based approach compares image blocksto compute the disparity. This algorithm turned out to be not well suited becauseof its the computational cost.

However, these event based approaches have some limitations. Because the usedsensors are uncalibrated, distorted and the transmitted data contains noise, the in-dividual events must be evaluated from the software. This is achieved with the aidof dynamic frames. This evaluation increases the computational intensity. Addi-tionally, the presented algorithms are not able to find matches for every event andnot all matches are correct. That means single calculated disparities are not verysignificant. However, if the mean the disparity for an area over time is used, thesealgorithms deliver a good approximation.

Page 46: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

44 CHAPTER 6. CONCLUSION

Page 47: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

LIST OF TABLES 45

List of Tables

4.1 Number of events from the different test data . . . . . . . . . . . . . 264.2 Frame-based on and off events share one frame . . . . . . . . . . . . . 274.3 Results of the frame-based approach if the on and off events are used

in different frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Results of the frame-based approach with just on events . . . . . . . 284.5 Results of the frame-based algorithm with just off events . . . . . . . 284.6 Event-based with on and off events in one dynamic frame . . . . . . . 294.7 Event-based different dynamic frames for on and off events . . . . . . 324.8 Event-time based with on events only . . . . . . . . . . . . . . . . . . 324.9 Event-based with off events only . . . . . . . . . . . . . . . . . . . . . 334.10 Frame-area based with gaussian filter . . . . . . . . . . . . . . . . . . 334.11 Area-based without gaussian filter . . . . . . . . . . . . . . . . . . . . 344.12 Event-area based with gaussian filter . . . . . . . . . . . . . . . . . . 344.13 Event-area based without gaussian filter . . . . . . . . . . . . . . . . 354.14 Event-area based with gaussian filter . . . . . . . . . . . . . . . . . . 35

Page 48: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

46 LIST OF TABLES

Page 49: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

LIST OF FIGURES 47

List of Figures

2.1 Pixel circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Comparison of a smartphone camera and a DVS128 . . . . . . . . . . 82.3 basic components of the eDVS-board . . . . . . . . . . . . . . . . . . 92.4 Top view of the stereo eDVS-Board . . . . . . . . . . . . . . . . . . . 102.5 Front view of the stereo eDVS-Board . . . . . . . . . . . . . . . . . . 10

3.1 Events in a period of 2 ms (red dots are off events, green dots are onevents) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Events in a period of 40 ms . . . . . . . . . . . . . . . . . . . . . . . 143.3 Results of the Gaussian Filter (red areas are representing on events,

blue areas are representing off events) . . . . . . . . . . . . . . . . . 163.4 Comparison Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Visualization of the Dynamic Frames . . . . . . . . . . . . . . . . . . 193.6 Comparison Vector for Pixel 41 . . . . . . . . . . . . . . . . . . . . . 22

4.1 Visualization of the recording at a distance of 100 cm . . . . . . . . 264.2 Disparity Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 Visualization of the results from the event-time based approach . . . 304.4 Event-time based approach with 1 arm at 100 cm . . . . . . . . . . . 314.5 Event-time based approach with 2 arms at 100 cm . . . . . . . . . . . 314.6 Visualization of the results from the event-area based approach . . . . 364.7 Event-vector based approach wit 1 arm at 100 cm . . . . . . . . . . . 374.8 event-vector Based approach with 2 arm 100 cm . . . . . . . . . . . . 37

Page 50: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

48 LIST OF FIGURES

Page 51: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

BIBLIOGRAPHY 49

Bibliography

[BIRP11a] Ryad Benosman, Soi-Hoi Ieng, Paul Rogister, and Christoph Posch.Asynchronous event-based hebbian epipolar geometry. IEEE Transac-tions on Neural Networks, 2011.

[BIRP11b] Ryad Benosman, Soi-Hoi Ieng, Pault Rogister, and Christoph Posch.Asynchronous event-based hebbian epipolar geometry. IEEE Transac-tion on Neural Network and Learning Systems, 2011.

[CBCD09] Jorg Conradt, Raphael Berner, Matthew Cook, and Tobi Delbruck. Anembedded aer dynamic vision sensor for low-latency pole balancing,2009.

[HZ04] Richard Hartley and Andrew Zisserman. Multiple View Geometry inComputer Vision. Cambridge University Press, ISBN: 0521540518, sec-ond edition, 2004.

[KSK09] Jurgen Kogler, Christoph Sulzbachner, and Wilfried Kubinger. Bio-inspired Stereo Vision System with Silicon Retina Imagers. pages 174–183. 2009.

[LPDM] Patrick Lichtsteiner, Christoph Posch, Tobi Delbruck, and Senior Mem-ber. A 128 128 120 db 15 s latency asynchronous temporal contrastvision sensor.

[MC] Georg R. Muller and Jorg Conradt. A miniature low-power sensor systemfor real time 2d visual tracking of led markers. In IEEE InternationalConference on Robotics and Biomimetics (IEEE-ROBIO).

[MC12] Georg R. Muller and Jorg Conradt. Self-calibrating marker tracking in3d with event-based vision sensors. In Proceedings of the 22nd interna-tional conference on Artificial Neural Networks and Machine Learning- Volume Part I, ICANN’12, pages 313–321, Berlin, Heidelberg, 2012.Springer-Verlag.

[MP79] D. Marr and T. Poggio. A computational theory of human stereo vision.Proceedings of the Royal Society of London. Series B, Biological Sciences,204(1156):301–328, 1979.

Page 52: Autonomous embedded Stereo Vision on mobile Robots for ... · Unlike digital camera sensors the DVS128 is a biologically inspired sensor. This bio-logical inspiration is the reason

50 BIBLIOGRAPHY

[RBI+12] Paul Rogister, Ryad Benosman, Sio-Hoi Ieng, Patrick Lichtensteiner,and Tobi Delbruck. Asynchronous event-based binocular stereo match-ing. IEEE Transactions on Neural Networks and Learning Systems,2012.

[ZK00] C. Lawrence Zitnick and Takeo Kanade. A cooperative algorithm forstereo matching and occlusion detection. IEEE Transaction on PatternAnalysis and Machine Intelligence, 200.