target tracking in wireless sensor networks by data fusion with video based object detection06533267

Target Tracking in Wireless Sensor Networks by Data Fusion withvideo-based Object Detection

Uwe Gosda1, Richard Weber1, Oliver Michler1, Sven Zeisberg2 and Erik Mademann3

1Technische Universitat Dresden, ”Friedrich List” Faculty of Transportation and Traffic Sciences, Dresden, Germany2University of Applied Sciences Dresden, Department of Telecommunications Technology, Dresden, Germany

3ZIGPOS GmbH, Dresden, Germany

Localization techniques based on wireless sensor networks(WSNs) are an increasingly popular approach for estimatingobject positions in a wide area of applications. Nevertheless,accuracy and reliability of the WSN position estimates need tobe increased for some applications, e.g. automated people regis-tration in public transport vehicles. This goal can be achieved byincorporating additional sensors like cameras in the localizationprocess. In this paper we introduce a novel data fusion approachfor combining WSN position estimates and object positionsdetected in camera images. We project image coordinates to theWSN coordinate frame and modify an Extended Kalman Filter(EKF) for data fusion. We show how the object positions thatarise from 2D camera images are used to reduce the variance ofthe combined position estimate. We test our method by trackingthe movement of a person using WSN positioning and additionalmeasurements obtained by people detection in the correspondingvideo scene.

Index Terms—positioning, wireless sensor networks, extendedkalman filter, data fusion, 2D-3D transformation, people tracking.

I. INTRODUCTION

Localization in Wireless Sensor Networks (WSNs) is be-coming an increasingly popular approach for estimating objectpositions in modern telematics applications. Beside the highpositioning accuracy a main advantage of this technology isits applicability in indoor environments where using GNSS isnot an option. There are many scenarios that demand accurateand reliable localization of objects or persons. In the fieldof transport telematics WSN positioning can be used forimproving railway logistics [1] or traffic control by localizationof vehicles nearby traffic lights interjections [2]. Another pos-sible application of this approach is the registration of publictransport usage, e.g. as part of an electronic ticketing system.Localization can be realized using leaky coaxial cables in thiscontext [3]. As an alternative each passenger’s ticket could beequipped with a mobile sensor node that is located within avehicle’s fixed WSN. The position of the mobile sensor nodethen can help to register a person inside or outside the vehicle.Here, the requirements in terms of positioning accuracy andspeed of WSN localization are very demanding, due to the(possibly high) spatial concentration of mobile sensor nodesin a public transport vehicle. Comprising additional sensors inthe localization process may help to meet these requirements.The surveillance cameras of modern public transport vehicles

seem suitable for this purpose due to the complementarycharacteristics they provide in addition to purely WSN-basedlocalization.

In this paper we present a novel data fusion method forincorporating the results of video-based object detection intoWSN target tracking. In our sample application, we use asingle camera observing the motion of a person and simul-taneously track the position of a sensor node carried by thatperson. Measurements in the 2D image plane are generatedby a state-of-the-art computer vision algorithm for peopledetection [4]. The resulting positions are then transformedinto a 3D coordinate system using the current WSN positionto account for the lack of depth information in the imagemeasurements. We implement data fusion by modifying theobservation model of an EKF-based tracking algorithm. Thisapproach has several advantages:

• The tracking filter operates directly on a predefined num-ber of measured WSN distances and is easy to implement.

• The measurement errors of the two sensors involved havecomplementary characteristics for certain camera posi-tions w.r.t. the WSN geometry, i.e. positioning accuracyof the WSN is poor parallel to the image plane in somecases.

This paper is organized as follows: In the following sectionwe give a brief overview of related work. In Section III weintroduce the sensor nodes used for experiments and presentthe EKF approach to WSN target tracking. Our data fusionalgorithm and the transformation from 2D image coordinatesto 3D WSN coordinates and vice versa are introduced inSection IV. Section V contains the results of the practicalimplementation of the method, followed by conclusions andfuture work in Section VI.

II. RELATED WORK

Localization in Wireless Sensor Networks has received alot of attention as of late. The majority of systems relies onthe evaluation of received signal strength (RSS) between thesensor nodes [5]. Various EKF-based localization approachesuse this principle, e.g. for indoor people tracking [6] orrobotics [7]. Other measurement principles are based on time-of-flight (TOF), time-of-arrival (TOA) or time difference-of-arrival (TDOA). In [8] a ultrasonic TOF-based localizationsystem for a mobile robot is introduced. Zhao et.al. [9]

978-1-4673-6033-3/13/$31.00 ©2013 IEEE

Fig. 1. Atmel IEEE 802.15.4 sensor nodes with antenna diversity.

compare different Kalman filter schemes for the TOF case,whereas in [10] TDOA is used for pedestrian localization.

WSN-based localization techniques are prone to limitedaccuracy due to the nature of radio frequency (RF) signalpropagation which is affected by multipath effects, interfer-ence, etc. Several data fusion approaches exist to overcomethis disadvantage. Xiong et.al. [11] suggest a combination ofWSN and Radio Frequency Identification (RFID) technologyusing an EKF and Particle Filters for data fusion. Camera-based data fusion approaches are rarely applied in combinationwith WSN localization. In [12] a WSN with cameras attachedto some of the nodes is introduced. The authors use thecommunication abilities of a WSN to establish object trackingin a multi-camera network. Finally, Gilbert et.al. [13] proposea robotics application that includes tracking the position of amobile sensor node in a WSN based on RSS values. Resultsof data fusion with camera-based object detection suggest animprovement in accuracy compared to WSN positioning only.

III. WIRELESS SENSOR NETWORK

A WSN typically consists of several spatially distributedautonomous RF sensor nodes. The individual nodes communi-cate with each other by multi-hop networking. For localizationpurposes, the WSN can provide distance measurements be-tween the nodes in the network. This process is called ranging[14], [15]. Distances are measured between pairs of nodes byTOA, TOF or phase-of-arrival (POA) ranging techniques.

The sensor nodes used here are equipped with the IEEE802.15.4-based Atmel AT86RF233 transceiver (Fig. 1), whichis capable of measuring distances within the network’s sensornodes by POA evaluation [16]. The nodes operate at a centerfrequency of 2.44 GHz and a bandwidth of 80 MHz.

In the context of localization in WSNs sensor nodes areusually divided into mobile nodes and anchor nodes. Whilethe former (within the scope of this paper) are assumed tobe attached to an object or carried by a person, the latterare fixed nodes with known position coordinates. Usually,the (unknown) position of the mobile node is computed byevaluating its distances to the anchor nodes.

A. WSN Tracking Algorithm

We apply an EKF-based approach in order to computepositions of a single mobile sensor node using distance mea-surements to several anchor nodes [9]. The prediction andupdate equations of the EKF algorithm are given by:

Predict:x(k|k − 1) = f(x(k − 1|k − 1), w(k − 1)) (1)P (k|k − 1) = F (k − 1)P (k − 1|k − 1)FT (k − 1) · · ·

+Q(k − 1) (2)Update:

S(k) = H(k)P (k|k − 1)HT (k) +R(k) (3)K(k) = P (k|k − 1)HT (k)S−1(k) (4)x(k|k) = x(k|k − 1) +K(k)(z(k) · · ·

−h(x(k|k − 1)) (5)P (k|k) = (I −K(k)H(k))P (k|k − 1). (6)

In the prediction step, the state estimate x(k|k − 1) ispredicted from the previous state vector x(k − 1|k − 1). Inour approach the state transition is a linear function of theprevious state and Equation (1) becomes:

x(k|k − 1) = F (k)x(k − 1|k − 1). (7)

We use a second order kinematic model for estimating positionand velocity of a mobile node in two spatial dimensions. Thus,the state vector x(k) can be written as

x(k) = (sx, vx, sy, vy)T , (8)

where sx and sy are the positions in horizontal and verticaldirection, and vx and vy refer to the corresponding velocitiesat time step k. The Matrix F (k) refers to the system model.In our case it is time invariant and given by

F =

1 ∆t 0 00 1 0 00 0 1 ∆t0 0 0 1

. (9)

The predicted error covariance matrix P (k|k−1) is then com-puted based on the system model, the a-posteriori covariancematrix P (k|k) and the system noise covariance matrix Q(k),which is assumed to be

Q = q

13∆t3 1

2∆t2 0 012∆t2 ∆t 0 0

0 0 13∆t3 1

2∆t2

0 0 12∆t2 ∆t

, (10)

with the known constant q determining the intensity of theprocess noise.

The update step comprises the computation of the inno-vation covariance S(k), followed by the optimal Kalmangain K(k) which is used as a weighting matrix betweenthe predicted state x(k|k − 1) and the measurement residualz(k) − h(x(k|k − 1)) in the computation of the a-posterioristate vector x(k|k). The measurement error covariance matrix

R(k) contains the variance the error for each measurementzi, i = 1, ..., n. The errors are assumed to be independent,thus R(k) can be written as:

R(k) =

σ21 0 . . . 0

0 σ22 0

...... 0

. . . 00 . . . 0 σ2

n

.. (11)

A single measurement zi(k) is the distance between thefixed coordinates (Xj , Yj)

T of an anchor node j and thecurrent position of the mobile node (sx(k), sy(k))T plus themeasurement noise vj(k). It can be written as:

zi(k) = h(x(k)) + v(k) (12)= ‖(sx(k), sy(k))− (Xj , Yj) + vj(k)‖.

The observation matrix Hi for the distance measurementbetween the anchor node j and the mobile node is given bythe the Jacobian

Hi(k) =∂h

∂x(13)

=

sx(k)−Xj√

(sx(k)−Xj)2+(sy(k)−Yj)2

0sy(k)−Yj√

(sx(k)−Xj)2+(s(k)−Yj)2

0

T

.

IV. DATA FUSION APPROACH

The data fusion scheme proposed here is based around theassumption that the mobile node is attached to an object thatvisually appears in the images of a observing camera. In ourapproach 2D object positions detected in video frames areback-projected into the 3D-WSN coordinate system using theobject’s current WSN position to compensate for the lack ofdepth information in image coordinates.

A. Relation between 2D and 3D Measurements

Assuming a pinhole camera model the relation betweenpoints in a local, cartesian 3D-world coordinate system andtheir mapping onto the 2D camera plane can be expressedusing a perspective transformation [17]:

λpC = A[R|t]pW . (14)

Here, the homogeneous coordinates pW = (xW , yW , zW , 1)T

and pC = (u, v, 1)T refer to points in the 3D world coordinateframe and 2D image coordinates, respectively. The scalarvalue λ denotes the homogeneous scaling factor. A (simplified)matrix of the camera’s intrinsic parameters A is given by

A =

f 0 cx0 f cy0 0 1

, (15)

where f is the camera’s focal length and the coordinates(cx, cy)T adjust the origin of the image sensor’s coordinatesystem. The combined 3 × 4 rotation and translation matrix

[R|t] is called the matrix of extrinsic parameters. It describesthe rotation and translation of the camera coordinate systenw.r.t. the world coordinate frame. It can be written as:

[R|t] =

r00 r01 r02 t0r10 r11 r12 t1r20 r21 r22 t2

(16)

with rij the components of the 3 × 3 rotation matrix R andti the components of the translation vector t = (t0, t1, t2)T .

Equation (14) describes the mapping of 3D points to 2Dimage coordinates. Transforming image coordinates to the 3Dworld coordinate frame requires reversing this process. Pixelcoordinates naturally do not provide sufficient information toremap to 3D coordinates. However, if the scaling factor λ isknown, the back-projection of points in the camera coordinateframe to points pW in the world coordinate frame can beachieved by solving

(AR)pW = λpC −At, (17)

which leads to the algebraic solution of the normal equation:

pW = ((AR)T (AR))−1(AR)T (λpC −At). (18)

Thus, any point in the world coordinate frame can be trans-formed to image coordinates using Equation (14) and projectedback to world coordinates by Equation (18).

B. Adapted Filter Algorithm

The data fusion approach we suggest is an extension ofthe EKF-based tracking algorithm described in Section III.Our algorithm takes account of an additional position in theWSN coordinate frame that is obtained by object detection inthe images of an observing camera. However, if no objectis detected the algorithm estimates positions based on theWSN distance measurements only. In this paper we refer to ascenario where the camera is aligned to the world coordinateframe as depicted in Figure 2b, i.e. the image plane is parallelthe xW -axis of the world coordinate frame.

Furthermore, we consider the WSN position to be in the(xW , yW )-plane of the world coordinate frame, i.e. all coor-dinates are mapped to the same height zW over the ground. Inorder to back-project the detected object position to 3D, wefirst transform the current position of the mobile node ontothe image plane using Equation (14). This yields the imagecoordinates (umobile, vmobile)

T and the scaling factor λmobile.Then we apply Equation (18) to

pC =

uobjvobj1

(19)

using λmobile as a compensation for the lack of depth in-formation in the camera image. Thus, the image coordi-nates (uobj , vobj)

T are projected to world coordinates yielding(xobj , yobj)

T where yobj = ymobile due to sharing a commonscaling factor λ (see Fig.2). Therefore, an object’s pixelposition in the camera coordinate frame (see Fig. 2a) directly

(a) (b)

Fig. 2. Relation between object appearance in image and world coordinateframes:(a) object in the image coordinate frame, (b) Alignment of the cameraw.r.t. the WSN coordinate frame.

contributes to the object’s position on the xW axis in worldcoordinates.

The coordinates (xobj , yobj)T resulting from back-

projection can be considered a new measurement in additionto the WSN distance measurements. Feeding this into theEKF is straightforward assuming only one object to betracked and a single camera observing the scene: A newmeasurement requires extending the n× 4 observation matrixH(k) in order to include the information gained. Since forgeometrical reasons in our case only xobj adds reasonably tothe position estimation of the mobile node, the measurementvector z(k) can be written as:

z(k) =

d1...dnxobj

, (20)

where di, i = 1, ..., n refers to n measured distances of themobile node to the anchor nodes. Thus, H(k) and R(k) needto be modified by adding one row in order to include xobjin the estimation process. The observation matrix for our datafusion approach can be written as:

H(k) =

(...

......

...1 0 0 0

). (21)

The measurement error covariance matrix becomes

R(k) =

σ21 + w1 0 . . . 0

0. . . 0

...... 0 σ2

n + wn 00 . . . 0 σ2

obj

, (22)

where σ2obj is the measurement error variance of the back-

projected result of the object detection algorithm. Furthermorewe add a scalar value wi = f(i) to each measurement’s errorvariance to decrease the relevance of older measurements inthe estimation process. The latest distance measured, n, ismore up-to-date than its predecessors and thus the relation

1: Set initial mobile node position2: repeat . start tracking3: EKF predict4: Start object detection5: if object found then6: Project predicted mobile node position into image7: Substitute λ into object’s coordinates8: Back-project the object’s image coordinates to 3D9: Adjust H(k), z(k), h(k) and R(k) for data fusion

10: else11: Adjust H(k), z(k), h(k) and R(k) for WSN

tracking12: end if13: EKF update14: until stop

Fig. 3. Pseudo code of the fusion algorithm

wi−1 > wi should hold. This approach has proven to be usefulin scenarios with fast moving targets.

An outline of the complete algorithm is given in Figure 3.Note that we use the predicted coordinates of the mobile nodefrom Equation (1) for computing the back-projection of theobject position.

V. EXPERIMENTAL RESULTS

Experiments in real world scenarios show that our methodhas a significant influence on the resulting position estimates.Figure 4a shows the setup of our testing scenario. Five anchornodes were fixed on the walls of a hallway. The test personcarried the mobile sensor node in the right hand. We run astandard people detection algorithm [4] on each image frameyielding a bounding box if a person is detected (green box).The center of mass of the bounding box (green asterisk) isback-projected to 3D coordinates. The current position of themobile node and its projection on the ground is depicted bythe red line in Figure 4a.

In our experiments we computed EKF positions based onthe n = 6 latest distance measurements between anchor nodesand the mobile node. We added wi = n− i, i = 1, ..., n ascorrection term to the measurement noise in Eq. (22). Figures4b and 4c show how the a-posteriori position estimates ofthe mobile node deviate under the influence of measurementsin the corresponding video image. It can be seen that theresulting trajectory with our data fusion approach (blue track)is considerably closer to the true trajectory than purely WSN-based tracking (red track) in horizontal direction.

The effect of incorporating people detection into the esti-mation process can also be illustrated by analyzing the EKF’sestimated error. Figures 5a and 5b show the difference in theresulting standard deviation of the a-posteriori error w.r.t. thex and y axis of the world coordinate frame. In our experimentsthe WSN’s geometry implies larger positioning errors inhorizontal direction, thus the camera viewing direction wasaligned parallel to the y-axis of the WSN coordinate system.As a result our data fusion approach leads to a significantly

(a) (b) (c)

Fig. 4. Test of our data fusion method in a people tracking scenario:(a) camera image with result of people detection (green box) and estimated position ofthe anchor node projected to the ground (red line), (b) Estimated and true trajectories for a person moving away from the camera position, (c) Estimated andtrue trajectories for a person moving towards the camera position.

(a) (b)

Fig. 5. Standard deviation of the position error in the EKF’s a-posteriori estimation: (a) in x-direction, (b) in y-direction.

lower error covariance in the horizontal direction. The influ-ence of the measurements obtained from the camera reducesthe standard deviation of the estimation error in horizontaldirection (Fig. 5a), whereas it remains approximately the samein vertical direction (Fig. 5b).

VI. CONCLUSION AND FUTURE WORK

In this paper we introduced a data fusion method forlocalization in Wireless Sensor Networks using an observingcamera as auxiliary sensor. We assumed a single camera thatobserves the position of a person carrying a mobile sensornode. Furthermore, we aligned the camera parallel to one axisof the WSN coordinate frame. Based on the EKF approachto WSN positioning, we showed how to include a 2D pixel

position obtained from a camera image into the estimation pro-cess. Testing the proposed algorithm in a real world scenarioshowed a significant improvement in the estimated trajectory.

Future work will comprise the generalization of our ap-proach to multiple cameras and arbitrary camera positions aswell as applying and testing our approach in an public trans-port vehicle like the experimental vehicle AutoTram R© (Fig.6), developed by the Fraunhofer Institute for Transportationand Infrastructure Systems IVI in Dresden.

ACKNOWLEDGMENT

This work was partly supported by the project Cool PublicTransport Information (CPTI), a part of the leading-edge

Fig. 6. The AutoTram R© experimental vehicle.

cluster COOL SILICON, co-funded by the European Unionand the Free State of Saxony, Germany.

The authors would like to thank ZIGPOS GmbH and AtmelGermany GmbH for their support.

REFERENCES

[1] R. Weber, E. Mademann, O. Michler, and S. Zeisberg, “Localizationtechniques for traffic applications based on robust wecols positioning inwireless sensor networks,” in Positioning Navigation and Communica-tion (WPNC), 2012 9th Workshop on, march 2012, pp. 215 –219.

[2] R. Baumbach, R. Weber, and O. Michler, “Localization and communi-cation techniques for road traffic applications based on ieee 802.15.4using wireless sensor networks,” in Wireless Congress: Systems andApplications, 2012.

[3] J. Engelbrecht, G. Forster, O. Michler, and R. Collmann, “Positioningestimation in public transport systems by leaky coaxial cables,” in Po-sitioning Navigation and Communication (WPNC), 2012 9th Workshopon, march 2012, pp. 175 –179.

[4] N. Dalal and B. Triggs, “Histograms of oriented gradients forhuman detection,” in International Conference on Computer Vision& Pattern Recognition, C. Schmid, S. Soatto, and C. Tomasi,Eds., vol. 2, INRIA Rhone-Alpes, ZIRST-655, av. de l’Europe,Montbonnot-38334, June 2005, pp. 886–893. [Online]. Available:http://lear.inrialpes.fr/pubs/2005/DT05

[5] M. Saxena, P. Gupta, and B. Jain, “Experimental analysis of rssi-basedlocation estimation in wireless sensor networks,” in CommunicationSystems Software and Middleware and Workshops, 2008. COMSWARE2008. 3rd International Conference on, jan. 2008, pp. 503 –510.

[6] J. Schmid, M. Volker, T. Gadeke, P. Weber, W. Stork, and K. Muller-Glaser, “An approach to infrastructure-independent person localizationwith an ieee 802.15.4 wsn,” in Indoor Positioning and Indoor Navigation(IPIN), 2010 International Conference on, 2010, pp. 1–9.

[7] D. Hai, Y. Li, H. Zhang, and X. Li, “Simultaneous localization andmapping of robot in wireless sensor network,” in Intelligent Computingand Intelligent Systems (ICIS), 2010 IEEE International Conference on,vol. 3, oct. 2010, pp. 173 –178.

[8] J. Zhang, S. Li, G. Lu, and Q. Zhou, “A new wireless sensor local-ization and pose tracking system for an autonomous mobile robot,” inMechatronics and Automation (ICMA), 2010 International Conferenceon, aug. 2010, pp. 1971 –1975.

[9] Y. Zhao, Y. Yang, and M. Kyas, “Comparing centralized kalmanfilter schemes for indoor positioning in wireless sensor network,” inIndoor Positioning and Indoor Navigation (IPIN), 2011 InternationalConference on, sept. 2011, pp. 1 –10.

[10] M. Muller, J. Lategahn, and C. Rohrig, “Pedestrian localization usingieee 802.15.4a tdoa wireless sensor network,” in Wireless Systems(IDAACS-SWS), 2012 IEEE 1st International Symposium on, sept. 2012,pp. 23 –27.

[11] Z. Xiong, F. Sottile, M. Spirito, and R. Garello, “Hybrid indoorpositioning approaches based on wsn and rfid,” in New Technologies,Mobility and Security (NTMS), 2011 4th IFIP International Conferenceon, feb. 2011, pp. 1 –5.

[12] J. Sanchez-Matamoros, J.-d. Dios, and A. Ollero, “Cooperative local-ization and tracking with a camera-based wsn,” in Mechatronics, 2009.ICM 2009. IEEE International Conference on, april 2009, pp. 1 –6.

[13] A. Gilbert, J. Illingworth, R. Bowden, J. Capitan, and L. Merino,“Accurate fusion of robot, camera and wireless sensors for surveillanceapplications,” in Computer Vision Workshops (ICCV Workshops), 2009IEEE 12th International Conference on, 27 2009-oct. 4 2009, pp. 1290–1297.

[14] S. Lanzisera, D. Lin, and K. Pister, “Rf time of flight ranging for wire-less sensor network localization,” in Intelligent Solutions in EmbeddedSystems, 2006 International Workshop on, june 2006, pp. 1 –12.

[15] A. Bensky, Wireless Positioning Technologies and Applications. Nor-wood, MA, USA: Artech House, Inc., 2007.

[16] Atmel AVR2150: RTB Evaluation Application -User’s Guide, Atmel Corporation, 2013. [Online].Available: http://www.atmel.com/Images/Atmel-8441-RTB-Evaluation-Application-Users-Guide Application-Note AVR2150.pdf

[17] R. I. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision, 2nd ed. Cambridge University Press, ISBN: 0521540518, 2004.

target tracking in wireless sensor networks by data fusion with video based object detection06533267

Documents