collaborative pedestrian mapping of buildings using inertial ...collaborative pedestrian mapping of...

Collaborative Pedestrian Mapping of BuildingsUsing Inertial Sensors and FootSLAM

Patrick Robertson†, Maria Garcia Puyol†∗, Michael Angermann†

† German Aerospace Center (DLR), Institute of Communications and Navigation, PO Box 1116, D-82230 Oberpfaffenhofen,Germany. E-Mail: [email protected]

∗ University of Malaga, Spain

Biography

Patrick Robertson received a Ph.D. from the University ofthe Federal Armed Forces, Munich, in 1995. He is cur-rently with DLR, where his research interests are naviga-tion, sensor based context aware systems, signal process-ing, and novel systems and services in various mobile andubiquitous computing contexts.

Maria Garcia Puyol received her Diploma in Telecom-munications Engineering from the University of Malaga inSeptember 2011. She developed her Diploma Thesis oncooperative FootSLAM at DLR during 2010/2011, whereshe will continue with her Ph.D. studies focusing on indoornavigation for pedestrians.

Michael Angermann received a Ph.D. from the Uni-versity of Ulm in 2004. He is currently with DLR, wherehis research interests are advanced and integrated commu-nication and navigation systems.

Abstract

The FeetSLAM technique builds on iterative processing ofmultiple sets of pedestrian odometry data, based on Foot-SLAM. The objective is to obtain maps of large areas basedon many data sets. The central idea is that maps originatingfrom other data sets are used as a so-called prior map for agiven data set. We show that this follows from the optimalFeetSLAM derivation but is more suited to practical com-putation limitations such as limited memory. It also yieldsmaps which are not overly dominated by one data set butrather balances the characteristics of each with the effect ofaveraging out errors. Over iterations, FootSLAM maps aregradually combined to yield a high-accuracy global map- the iteration speed is controlled by employing conceptsfrom simulated annealing. We validate our approach usingtwo data sets from two locations, consisting of four and fivewalks respectively.

1 Introduction and FeetSLAM PrinciplePedestrian navigation has been drawing significant researchand development interest over the last few years and en-compasses a wide range of research communities. In addi-tion to using satellite navigation receivers, or signals of op-portunity such as mobile radio or WLAN, the use of otherlow-cost sensors has become one of the addressed topics.For a number of years it has been known that foot mountedMEMS based inertial sensors (IMUs) can, in combinationwith known building plans, allow for stable positioning intwo and three dimensions even in the absence of other sig-nals [1, 2, 3].

As an extension to this work we recently presentedFootSLAM - Simultaneous Localization and Mapping forpedestrians - using foot mounted IMUs as the main sensors[4, 5]. Developed by the robotics community, traditionalSLAM for indoor and urban environments has drawn onsensors such as laser scanners and cameras whereas Foot-SLAM uses only the odometry - the noisy IMU-based mea-surements of a person’s step vectors. We use the termodometry because we are in principle agnostic towards theactual mechanism used to obtain the step estmates. Re-searchers have used a wide array of approaches, rangingfrom foot mounted IMUs (e.g. [6]), stride detection withmobile phones, visual odometry, to electromyography basedestimators. In a perfect world this odometry would be er-ror free and the pedestrian’s pose (location and orientation)could be estimated within the relative coordinate systemfor an unlimited distance travelled. Since state-of-the artodometry suffers from the gradual increase in errors, Foot-SLAM must search over many different odometry errorhypotheses finding one which best fits the previous posehistory. Hypotheses in which the pedestrian revisits ar-eas in the environments are rewarded and over time a re-liable map of essentially “walkable areas” is constructed.Real data from people walking within office environments

Patrick Robertson

Notiz

Published in ION GNSS 2011

robert_p

Notiz

Accepted festgelegt von robert_p

robert_p

Notiz

None festgelegt von robert_p

at two locations has so far been used to validate the mapbuilding and relative localization abilities of FootSLAM.The approach can use GPS as a provider of reference po-sition before and after entering a building, thus anchoringthe map with reasonable position accuracy. FootSLAM,and the generalization presented in this paper are appealingbecause existing maps are often inaccurate, unavailable,outdated, proprietary, and do not reflect important featuressuch as furniture, stalls, displays and other features of aplace that significantly limit or channel pedestrian motion.

In this paper we will present an extension to the Foot-SLAM method to the collaborative or multi-user case, hencethe term FeetSLAM. We shall address the problem wherebydifferent data sets (walks) are to be processed to generatea common map, that is more accurate than any single mapand encompasses the total area covered by all walks. Firstof all we distinguish these different cases of FeetSLAM:

1. A number of walks all starting at the same startingpoint and/or finishing at the same finishing point (orpose) and overlapping the explored area to a certaindegree.

2. Walks not necessarily starting/finishing in the samepoint (or pose) but overlapping in the explored areato a certain degree.

3. Walks not necessarily starting in the same point andnot necessarily overlapping in the explored area.

All these cases may be formulated as real-time or of-fline mapping problems. An interesting real-time usagescenario is mapping of a building by multiple collaborat-ing pedestrians with the objective of providing immediatemap and position information of all collaborating pedes-trians or others. Such a scenario may occur in emergencysituations were multiple teams of fire-fighters enter a build-ing through the same of different entrances and carry outsearch and rescue tasks and want to avoid unnecessarilyrevisiting areas or involuntarily leaving out areas. In lawenforcement applications, accurate determination of everyteam member’s position and providing this information ona map may significantly improve mutual situation aware-ness and potentially reduce the risk of accidentally harm-ing a team member. In this application the real-time re-quirements may be severe and no a priori map data may beavailable.

In this contribution we will focus on non-real timeprocessing for the second case presented above, in the ex-pectation that the techniques can be sped up to real-timecapabilities over time. In offline applications we wish toderive a map that will later serve as basis for localizing

pedestrians by map-aided pedestrian dead reckoning. Anexample of this is collaborative mapping of airports, mu-seums, shopping centers and other public buildings for usein tourism, travel, commerce, and any high-precision lo-cation based services. To support FeetSLAM, pedestriansroam through accessible rooms and areas on all levels ofa building - perhaps as a deliberate mapping effort or dur-ing activities of everyday life. The pedestrians carrying outthe mapping task needs to be equipped with some form ofodometry-generating sensor, such as a foot-mounted IMUand most likely a GPS receiver for anchoring in an absolutecoordinate system. In this scenario the measurement dataneeds to be recorded and will then be processed offline.The resulting map is then stored at a server or distributed tolocalization devices that use it to perform map-aided pedes-trian dead reckoning. As more data are collected the mapscan be refined to incorporate the new walks.

2 Proposed Iterative ProcessingIt can be shown that the optimal (in the Bayesian estimationsense) FeetSLAM estimator is a trivial extension of Foot-SLAM. In this case a single run of a sequential Bayesianestimator would process all data sets sequentially or in par-allel, and the state would include the unknown starting pose(starting conditions, SC) of each walk. The common ele-ment linking all the walks is the map of the environment.The relationship for two walks can be seen in the DynamicBayesian Network (DBN) in Figure 1, which is an exten-sion of the DBN from [4]. The main variables are:

• Pose Pk: the location and the orientation of the per-son in 2D at time step k.

• Step vector Uk : the change of pose at time k− 1 topose at time k.

• Inertial sensor errors Ek: all the correlated errors ofthe inertial system.

• Step measurement Zk: a measurement subject to cor-related errors Ek as well as white noise.

• The visual cues which the person sees at time k: Visk.

• The Intention of the person at time k: Intk.

• The Map M: it is time invariant and can include anyfeatures and information to let the pedestrian chooseInt.

• The Starting Conditions SC: the starting pose of thepedestrian, heading angle and scale factor of the un-derlying step measurements.

The starting conditions may, of course, be differentfor both walks. These starting conditions are a vital com-ponent of the state space and need to be estimated if theyare not known. In fact much of the work presented in thispaper is devoted to estimation of these starting conditions.

The goal of the Bayesian formulation for a two-Pedestrianscenario is to compute:

p(P10:k,P

20:k,U

10:k,U

20:k,E

10:kE2

0:k,SC1,SC2,M|Z11:k,Z

21:k) =

p({PUE}1:20:k ,SC1:2,M|Z1:2

1:k),

(1)

which can be easily extended to a NW -Pedestrian scenarioas follows:

p({PUE}1:NW0:k ,SC1:NW ,M|Z1:NW

1:k ). (2)

Note that for this simple representation, the time indices kare the same for the walks. This is not a requirement for ourmap merging algorithm, in which the data are processedoff-line, and hence can be obtained from walks occurringat different times.

P

U

Z E

Int

Vis

P

U

Z E

Int

Vis

P

U

Z E

Int

Vis

M

Time k-1 Time k Time k+1

Int

Vis

U

Z E

P

Int

Vis

U

Z E

P

Int

Vis

U

Z E

P

Pedestrian #1

Pedestrian #2

SC

SC

Figure 1. Dynamic Bayesian Network (DBN) for the estimationproblem with two pedestrians during three time slices.We have omitted an index that would differentiate thetwo segments for Pedestrian #1 and Pedestrian #2 forclarity.

In a particle filter implementation, particles wouldhave to explore the state space of all odometry error se-quences and all starting conditions. A further practical com-plication for a finite number of particles and the resultingparticle depletion is that in a sequential approach the trajec-tories will tend to favor early data, which will bias the map

to data processed early in the sequential estimation process.Later data will then tend to follow the rut from early data.While this can be a problem for single-data FootSLAM, itwill be confounded with the addition of more data sets.

The Dynamic Bayesian Network (DBN) of the two-pedestrian case from Figure 1 has some structural similar-ities to a family of error correction coding schemes fromdigital communications theory. In 1993 a family of codescalled “Turbo” Codes were developed and are decoded it-eratively at the receiver [7]. The codes are constructed byconcatenating two or more simple component codes andare now used in a wide range of modern mobile commu-nication standards. The optimal detector is prohibitivelycomplex but a suboptimal, iterative variant exhibits verygood error correction performance. From a Bayesian per-spective this kind of iterative processing can be seen as aform of loopy belief propagation in a Bayesian network [8].The name “Turbo” codes was chosen to reflect the natureof the iterative processing, as will now be explained in thecontext of FeetSLAM.

It can be shown by simple extension of the FootSLAMBayesian Estimator derivation that other walks can be in-corporated in a given FootSLAM estimation process in theform of prior counts in the FootSLAM maps. Ideally, wewould start a FootSLAM run of a specific data set initial-ized with the posterior distribution of the maps computedfrom the other walks.

In FootSLAM, the transition map is learned by eachparticle p by counting each transition it makes across theedges of the hexagons that lie within the coordinate system.Operating in this manner, each particle stores its whole paththrough the hexagon grid. Learning the map by each parti-cle p is based on Bayesian learning of multinomial distri-butions [4]. Each particle’s weight is updated as follows:

wpk = wp

k−1 ·{Ce

h +αeh

Ch +αh

}p, (3)

where Ceh are the transition counts for edge e of hexagon h

and Ch =e=5

∑e=0

Ceh stored in the individual map of the particle

p that had been computed up to step k− 1. The terms αeh

and αh =e=5

∑e=0

αeh represent the a priori knowledge regarding

the transition counts across the edges of hexagon h for par-ticle p. Note that when no other prior information is avail-able, αe

h has been empirically chosen to be 0.8∀{e,h, p}.Including other walks in a given FootSLAM estima-

tion process is only possible, however, if we are able to re-late all walks within the same coordinate system. This hasbeen illustrated in Figure 2, where two FootSLAM maps

(one in blue and one in green) from the same floor of abuilding have been generated using two data sets comingfrom two different walks and are framed within differentcoordinate systems.

Figure 2. Two FootSLAM maps (one in blue and one in green)of the same building that are computed within differentcoordinate systems.

Therefore, before we can apply a map as an a priorimap for another FootSLAM process we need to ensure thatboth data sets are within the same coordinate system. Whenthis is ensured, our proposed iterative “Turbo” FeetSLAMalgorithm presented in detail later starts by processing alldata in individual FootSLAM runs and then combines allcounts to a new map, which is used as a prior map in thenext iteration of FootSLAM runs. This is repeated for anumber of iterations. It should be noted that when con-structing the map we must not include the map contribu-tion that arose from a given walk the next time we processthat particular walk. This is in exact analogy to the “prior”construction in Turbo decoding.

In the next section we will describe how we alignmaps within the same coordinate system.

3 Aligning Maps3.1 Starting ConditionsAt this point we must first differentiate two terms that havebeen used in the paper so far: starting conditions and thecoordinate system in which two FootSLAM maps lie, sincealthough related, these are different concepts. The startingconditions define the user’s pose that places the odometrymeasurements - which are always differential, into a de-fined coordinate system in two or three spatial dimensions,plus initial heading, as well as scale. It is the nature of Foot-SLAM with finite number of particles to tend to snap intoa particular resulting map or small set of maps. Two runsof FootSLAM with the same starting conditions may stilllead to different maps that differ by way of a translation,

rotation and scale difference. Of course, different startingconditions will also lead to different maps. To summarize,a part of the difference in the maps’ coordinate systemscomes from different starting conditions, but also from thestochastic nature of FootSLAM as described above.

The starting conditions are specified by four Gaus-sian distributions, one for each one of the following startingparameters: x coordinate, y coordinate, heading and scalefactor. Each Gaussian distribution is defined by the mean(µ) and the standard deviation (σ). A smaller standard de-viation indicates greater certainty of the starting conditionsof the pedestrian.

Consider the case where three data sets (walks) are tobe combined. The prior map for any one data set is the sumof the maps of the other two (see (3)); but to allow us toadd the maps, they must be aligned to the same coordinatesystem. Transforming or aligning the maps to a commoncoordinate system is necessary in order to be able to useone map as the prior for another data set.

3.2 Transformation and ProjectionWhen a map is transformed (i.e. translated, rotated andscaled), its hexagons are not necessarily aligned with thehexagons of the underlying grid of hexagons in anotherFootSLAM map. In order to be able to combine the countsof two maps, one of which has been transformed, a furthermanipulation to the map is needed: projection of the countsto a common hexagon grid.

In a 2D context and for our application, a transforma-tion is the combination of a translation along x and y axes,a rotation around a center of rotation and a uniform scalingaround a center of scaling. In short, these four parametersinvolved in the transformation are referred to, respectively,as x shift, y shift, rotation, and scale factor.

The mathematical formula used for transformation canbe described by the following equation:

{xt = (x− xc) · s · cos(r)− (y− yc) · s · sin(r)+ xc +∆xyt = (x− xc) · s · sin(r)− (y− yc) · s · cos(r)+ yc +∆y

,

(4)

where (x,y) are the Cartesian coordinates of a given pointin 2D before the transformation, (xt ,yt) the Cartesian coor-dinates after the transformation, (xc,yc) the Cartesian co-ordinates of the center for rotation and scaling and r, s, ∆xand ∆y the rotation, scale factor, x shift and y shift, respec-tively.

Note that the rotation and scaling use the mean x and ycoordinates of the starting point of the walk as their center,that is, xc = µx and yc = µy.

After transforming every point it is projected onto a

target grid of hexagons, which serves as a common coordi-nate system for all the maps that are to be considered. Theprojection has been illustrated in Figure 3.

x

y

Figure 3. Illustration of the projection of a FootSLAM map ontoa grid of hexagons.

The projection involves two steps: the projection ofthe vertices of every edge of the map and the projection ofthe transition counts of every edge. The projection of thevertices is trivial since it is done from a 2D surface (a plane)to another 2D surface that is parallel to the first, and hencethe coordinates for the transformed and projected verticesare the same. The projection of the edges is more complexand will be explained later.

The transformation and projection of a map is per-formed on a hexagon per hexagon basis, and the followingis applied:

1. Scaling: the hexagon is scaled by multiplying itsradius by the scale factor as shown on the left sideof Figure 4. Then each of the edges of the scaledhexagon is rotated and shifted as shown on the rightside of Figure 4.

Hexagon before transformation Scaled

hexagon

A B B

A

Edge 0 before rotation and shift

Edge 0 after rotation and shift

Transformed hexagon

Figure 4. Example of a transformation of the top edge of onehexagon. The hexagon is first scaled and then the edgeis rotated and shifted.

2. Projection of the edge: each transformed edge isthen projected onto the target grid of hexagons byprojecting its two vertices (projection of the verticesA and B) as can be seen in Figure 5.

3. Projection of the transition counts: the transitioncounts - from now on referred to as C - are shared

among some of the edges of the target grid. To dothat, first the target hexagons (htg) - the hexagons ofthe target grid that will receive some of the transi-tion counts - have been identified: these are the twohexagons where the two vertices of the transformededge lie (points A and B) along with all their neigh-boring hexagons. See Figure 5.

Edge 0 after transformation

Hexagon B, where B vertex lies

Hexagon A, where A vertex lies

Neighbours of Hexagons A or B

B

ATransformed hexagon

Target Hexagon Grid

Figure 5. Target hexagons for transformed edge 0 (in red):hexagons A (in purple) and B (in green) where the twovertices of the transformed edge lie and their neighbors(in blue).

Once the target hexagons are defined, the counts Care shared among their edges. To compute how muchof C each target edge receives, two different factorsare used: a distance factor and an angular factor. Thedistance factor takes into account the distance be-tween a transformed hexagon and a target hexagon(see Figure 6). The angular factor takes into accountthe relative orientation between a transformed edgeand a target edge (see Figure 7) using an alterna-tive representation for the edges - a semicircle on theouter part of the edge - that approximates the proba-bility of the pedestrian crossing it at different angles.These two factors are then used to compute a weightthat states how much each target edge receives countsfrom the transformed edge.

3.3 Correlating Two MapsIn order to compute the transformation that we should ap-ply to one map so that it fits another we need to find a mea-sure of how well these two maps fit. When combining twomaps we can try all possible transformations on one mapand choose the one that leads to the best fit.

The correlation between two random variables de-scribes their statistical dependence. In the context of Foot-SLAM maps, a measure of how much one map looks likeanother map is needed. To this purpose, an appropriatefunction that reflects how well two maps fit each other has

A

trh

1

tgh2

tgh

3

tgh

4

tgh5

tgh

6

tgh

7

tgh

8

tgh

B

tgh

A

tgh

Figure 6. Illustration of how the distance factor is computed. Thered dot is the center of the transformed hexagon (htr)

with coordinates (xhtr ,yhtr ). The black dots are the cen-ters of the target hexagons (htg).

A

B

Target hexagon

Target edge etg

Transformed edge etr

Overlapping area

Figure 7. Example of a transformed edge (in red) that has beenmade to coincide with the six target edges (in black)of a hexagon and how their corresponding semicirclesoverlap (in green). The area in green is the angular fac-tor for each edge.

been derived. We have named the two maps in terms ofwhich one is transformed to fit the other:

• “Underneath Map”: This is the map that will stayfixed through the comparison of transformations, asa reference for the other map. It will be also referredto as MU .

• “Accounted for Map”: This is the map that will betransformed to fit the underneath map and then pro-jected onto its target grid of hexagons. From now onit will be referred to as “Accounted Map” or alterna-tively as MA. We will use the term Transformed Mapto refer to this map when it has been transformed.

We have chosen these names to reflect the roles of thetwo maps in the likelihood function presented as a justifi-cation of our correlation function. The counts of the under-neath map are used to explain or account for the counts ofthe map that was transformed.

Likelihood Function Choice: Our justification of the cor-relation function relies on the DBN shown in Figure 1. Wehave two walks, and hence two histories of two pedestrian’spose trajectories, PA

0:k and PU0:k. We are interested in find-

ing the transformation that will align the two pose trajec-tories. To do so, we will compute the posterior densityfunction of a transformation T conditioned on the pose tra-jectories: p(T|PA

0:k,PU0:k). The transformation T transforms

the pose trajectory PA0:k onto the map M. We are assuming

that PU0:k is aligned with the map M. Following the Foot-

SLAM derivation of [4], we can apply Bayes Theorem andthe chain rule to obtain:

p(T|PA0:k,P

U0:k) ∝ ∏

h∈PA

e=5

∏e=0

(Ceh

U +αeh

U

ChU +αh

U

)T (Ceh

A), (5)

which can be used to compute how well PA0:k fits PU

0:k. Notethat T (Ce

hA) is the number of transition counts for edge e

of hexagon h of the Accounted Map when it has undergonetransformation T .

As stated above, we must try all different transforma-tions T and compute the likelihood value for each.

The logarithmic form for the likelihood function: Sofar we have been able to obtain a formula that states howwell one map fits another. We can extend (5) to its sym-metric form, that is, also taking into account how well theUnderneath Map fits the transformed Accounted Map. Sothe likelihood value (LV ) between two maps can be com-puted as:

LV (MA,MU ,T ) = ∏h∈A

e=5

∏e=0

(Ce

hU +αe

ChU +αh

)T (Ceh

A)

· ∏h∈U

e=5

∏e=0

(T (Ce

hA)+αe

T (ChA)+αh

)Ceh

U.

(6)

We have chosen to implement this function in its logarith-mic form because it is numerically more robust, and wehave normalized it with the total number of counts in each

map:

logLV (MA,MU ,T ) =

∑h∈A

e=5

∑e=0

T (CAh,e) · log(

CUh,e +αe

CUh +αh

)

∑h∈A

e=5

∑e=0

T (CAh,e)

+

∑h∈U

e=5

∑e=0

CUh,e · log(

T (CAh,e)+αe

T (CAh )+αh

)

∑h∈U

e=5

∑e=0

CUh,e

.

(7)

Hexagon Correlation Factor: We found that the likeli-hood value formula needed to be augmented by incorpo-rating a heuristic term (γ) that takes into account also thecorrelation of the total counts of each hexagon. We foundthat this increases the robustness of the likelihood function:

γ = β ·∑h∈A

T (ChA) ·Ch

U , (8)

where the parameter β - a hexagon correlation factor - hasbeen empirically chosen to have a small value, in our ex-periments we chose β = 0.04. Finally, we obtain the aug-mented likelihood value:

logLV a(MA,MU ,T ) =

∑h∈A

e=5

∑e=0

T (CAh,e) · log(

CUh,e +αe

CUh +αh

)

∑h∈A

e=5

∑e=0

T (CAh,e)

+

β ·∑h∈A

T (CAh ) ·CU

h

∑h∈A

e=5

∑e=0

T (CAh,e)

+

∑h∈U

e=5

∑e=0

CUh,e · log(

T (CAh,e)+αe

T (CAh )+αh

)

∑h∈U

e=5

∑e=0

CUh,e

+

β ·∑h∈A

CUh ·T (CA

h )

∑h∈U

e=5

∑e=0

CUh,e

.

(9)

3.4 Searching for the Best TransformationOnce we have the capability of comparing two maps, weneed to look for the best fit between the maps. To do so,

one of the maps is transformed using different values forthe x and y shifts, rotation and scale factor, and the cor-responding likelihood value is computed using (9). To dothis in practice, we need to limit the range of transforma-tions (i.e. the search space):

Restriction of the search space An automatic restrictionof the area of search is needed in order to develop a com-pletely automated algorithm with manageable complexity.The area is easily restricted using the Starting Conditions ofthe two maps involved in the comparison, using the worstcase situation in which each map could be located at theopposite ends of their Gaussian distributions for each oneof the four Starting Conditions Parameters.

Automated Search: The search is undertaken over thespace of transformations in a discrete manner. These stepsizes along with the limits of the area of search determinewhich values for each one of the four transformation pa-rameters (rotation, scale, x shift and y shift) are used totransform one of the maps to make it fit the other.

The transformation that allows the Transformed Mapand the Underneath Map to have the largest likelihood value(equation (9)) is called the winning transformation. Thistransformation is then used to transform the Accounted Mapso that it can be used as a prior for the data correspondingto the Underneath Map or to compute a combined map.

3.5 Combining MapsCombining the maps after transformation and projection ofthe counts is simple: we just add the counts of the con-tributing maps and we can do this because after projectionthey are now aligned to the same hexagon grid.

4 Controlling the convergence of FeetSLAMAs mentioned above we want to process the individual datasets in such a way that a single data set does not dominatethe resulting joint or total map. We propose that modifica-tions of the prior maps can be beneficial in alleviating suchasymmetries. In particular, we want the iteration process tobe gradual, allowing improvements of individual maps topropagate to others through the priors. In effect, we wantthe prior to gradually herd the particles into the region ofthe state space that is close to the true state. This means thatparticles need not explore areas of the state space that areextremely unlikely, allowing for more diversity in the likelyregions. We achieve this by starting with a prior that is bothweakened and smoothed, and gradually reduce these twomodifications as iterations progress.

The weakening of a map consists of the division of itstransition counts by a weakening factor >1, that is, making

the map less strong. This can be used to control the in-fluence the prior map has on the update of the particles’weight.

The smoothing of a map is achieved by spreading thetransition counts of each edge of each hexagon of the mapamong that same edge of the six neighboring hexagons anditself. This filtering process has the effect of making themap wider, giving more freedom to the particles that willexplore the area using it as a prior.

5 Formal Description of the Iterative Algorithm:“Turbo” FeetSLAM

Simply put, the Turbo FeetSLAM technique builds on iter-ative processing of odometry data, using maps originatingfrom other data sets as a prior map for a given data set. Thealgorithm serves two main goals:

1. Obtaining a complete map, or total map, of the walk-able areas.

2. Obtaining more accurate individual maps.

5.1 The AlgorithmIn Figure 8 the basic structure of the algorithm is presented:

FootSLAM (Di)

Data sets

Combination (M1…Mn)

Transform(Mi)

Add counts

Starting Conditions

SC1…SCn

Transformations T1…Tn

Total Map

Individual Maps M1…Mn

Transformations T1…Tn from previous

iteration

Prior maps P1…Pn

Prior maps P1…Pn

Transform(SCi)

Starting Conditions SC1…SCn from

previous iteration

Starting Conditions of FS maps SC1…SCn

M1T…Mn

T

Data Sets D1…Dn

WeakeningFiltering

Figure 8. Schematic illustration of the proposed algorithm for au-tomatic map computation from several walks.

The algorithm operates as follows: at the zeroth iter-ation FootSLAM is run for each of the data sets to obtainthe Individual Maps {M1 · · ·Mn}, with n being the numberof data sets that is being considered at that iteration. Then,these maps are properly combined to obtain the Total Map,MC. Next, the Individual Maps are transformed one byone to fit the Total Map to generate the Individual Trans-formed Maps {MT

1 · · ·MTn }. The Individual Transformed

Maps are then used to generate the prior maps {MP1 · · ·MP

n }

for the next iteration: here, for each data set, the combina-tion of the other Individual Transformed Maps is used togenerate its prior map (except for the map from that dataset). The transformations {T1 · · ·Tn} found for the Individ-ual Maps to fit the Total Map along with the Starting Con-ditions {SC1 · · ·SCn} at the end of each FootSLAM processare used for the next iteration to obtain the new StartingConditions for each data set. The prior maps are also usedin the next iteration, appropriately weakened and filtered.

Our goal now is to explain the algorithm in greaterdetail, together with the remaining parameters and inputs.The index i will refer to the iteration number.

5.2 DataTo run the Turbo FeetSLAM Algorithm with NW walks atiteration i the following are needed:

1. Data Sets for the walks D= {D1,D2, · · · ,DNW }. Thesedata sets are just the result of converting the raw datafrom the walks into odometry. This odometry datasets do not change over the iterations.

2. Starting Conditions SCi = {SCi1,SCi

2, · · · ,SCiNW}. The

Starting Conditions for each walk.

3. Transformations for the Starting Conditions

Ti = {T i1 ,T

i2 , · · · ,T i

NW}. These transformations are

computed at iteration i− 1 and transform the Start-ing Conditions so that the map is located accordingto the Total Map computed at the previous iteration.

4. Prior Maps MPi= {MP

1i,MP

2i, · · · ,MP

NW

i}. The PriorMaps are computed at iteration i so that at iterationi+1, each one of the Data Sets Dd uses the informa-tion provided by the other NW −1 walks.

5.3 FootSLAM Map ComputationFootSLAM is run for each one of the NW data sets with thehelp of the Data Sets D = {D1,D2, · · · ,DNW }, the StartingConditions SCi = {SCi

1,SCi2, · · · ,SCi

NW}, the Prior Maps

MPi= {MP

1i,MP

2i, · · · ,MP

NW

i}. As a result, the IndividualMaps Mi = {M1

i,M2i, · · · ,MNW

i} are obtained.At the end of each FootSLAM process, a new set of

Starting Conditions is computed and is referred to as thewinning Starting Conditions.

5.4 Computing the combined mapThis is the most important part of the algorithm, where theNW Individual Maps are processed to generate a Total Map.

Comparing Maps Pairwise: We form a pool of mapsthat contains all the NW Individual Maps that were obtained

with FootSLAM for a single data set. In NW − 1 stages aTotal Map that encompasses the information provided byall the maps can be generated as follows:

• At each stage, the maps are taken two at a time andcompared. The comparison is performed as explainedin 3.4 for every possible combination of maps.

• At the end of each stage, the two maps that best fittogether - that is, the ones that had the greatest likeli-hood value (9)- are removed from the pool and theircombined map is added. This means that after everystage there is one fewer map in the pool.

• The comparisons that were already run in previousstages are not computed again.

The number of combinations Nc that need to be tried is

Nc =

(NW

2

)+

k=NW−2

∑k=1

k = (NW −1)2. (10)

The first part of equation 10,(NW

2

), is the combina-

torial number of the NW individual maps taken two at atime, (i.e. the number of handshakes between NW people).

The second part of the equation,k=NW−2

∑k=1

k accounts for the

possible combinations that still need to be computed everytime a new map is added to the pool, that is, the compar-isons between the new map and the other available maps inthe pool. Note that there is no need to recompute the com-parison between the maps that were already in the pool,since they were computed at the previous stage.

5.5 Transformations for Individual MapsThe transformations for the Individual Maps are obtainedby running the search to make each of the NW IndividualMaps fit the Total Map. The resulting Transformed Indi-vidual Maps MTi

= {MT1

i,MT

2i, · · · ,MT

NW

i} will be used togenerate the Prior Maps for the next iteration.

We use the transformation between the Individual Mapand the Total Map and not the transformation that was al-ready computed to generate the Total Map for each one ofthe Individual Maps because this transformation takes intoaccount a richer total map.

5.6 Prior Map ComputationThe prior maps for the next iteration for each one of theNW data sets are very easily computed: for each Data Set,the other (NW −1) Individual Transformed Maps are com-bined. This is done so that when FootSLAM is run for agiven Data Set Dd its own map is not explicitly included,but only the information provided by the rest of data sets.

So the prior map can be seen as the transition countsof the other maps, properly combined.

5.7 Weakening and FilteringThis block adjusts the parameters that control the influenceof the Prior Maps on the FootSLAM estimation process.The parameters that are readjusted from one iteration to thenext are:

• Prior map weakening factor: this factor is set at thezeroth iteration to a certain value greater than 1 andis slowly decreased to 1 over the iterations to makethe prior map stronger. In our experiments 1.9 wasempirically chosen as the starting value.

• Prior map filter factor: this factor is smaller than 1and it is slowly increased to 1 over the iterations.This is because the prior map will be more acurateover the iterations, and can be given more importanceduring the FeetSLAM process. For our experimentsit was chosen to start at 0.8.

5.8 Starting Conditions ComputationThe Starting Conditions for the next iteration for each ofthe NW data sets are computed by taking the winning Trans-formation computed in the Transformation of the Individ-ual Maps block and applying it to the winning StartingConditions for the FootSLAM process at the last iteration.

5.9 Zero’th Iteration InitializationSome characteristics of the zeroth iteration are:

• No use of prior: no previous knowledge of the tran-sition counts is available.

• Manually written Starting Conditions: a descriptionof the Starting Conditions for the walk is needed,when no information with absolute SC is available.

• One might not include all the NW data sets: datasets that do not converge without a prior might beincluded in in the algorithm at a later stage, in theexpectation that it will converge when a prior mapis available after processing the other data sets. Wehave not implemented this, and deliberately choseone of our experiments to have a data set that didnot converge at iteration zero.

6 Qualitative Performance Assessment of FootSLAMand FeetSLAM maps

In this section, a novel quantitative metric of performanceevaluation will be presented that counts the number of vio-lations of FeetSLAM or FootSLAM maps against a knownground truth map.

The ratio R of crossed walls and furniture for a mapM has been defined as follows:

R =

∑h∈M

∑eV

CeVh

e=5

∑e=0

∑h∈M

Ceh

, (11)

where ∑h

∑eV

CeVh is the sum of all the transition counts that

cross a wall or a piece of furniture (V stands for violation)

ande=5

∑e=0

∑h∈M

Ceh is the total transition counts in the map.

Since walls are lines and furniture are polygons twodifferent criteria can be differentiated to determine whethera transition count crosses a wall or a piece of furniture ornot.

A transition from a source hexagon hs to a target hexagonht across edge e represents the probability of a pedestriancrossing it when walking from hs to ht . The center of thehexagons can be used as starting and finishing points forthe transition across the edge as an approximation:

Criteria for the Walls: A wall between two hexagon cen-ters indicates a crossed (or violated) wall. The left side ofFigure 9 illustrates an example of a transition that crossesa wall.

Criteria for the Furniture: The center of the target hexagonht lying inside a piece of furniture indicates a wrong transi-tion. The right side of Figure 9 illustrates an example of atransition count that would essentially allow the pedestrianto step over a piece of furniture.

Wall

source

target

source

target

Piece of furniture

Figure 9. Illustration of transition counts that cross a wall (on theleft) and a piece of furniture (on the right).

Note that in the case when we have a transition countcrossing a wall and a piece of furniture at the same time, itis only counted as one violation.

We used an XML representation of the walls and piecesof furniture for the DLR scenario when we evaluated ourTurbo FeetSLAM algorithm quantitatively. The walls andpieces of furniture can be easily transformed (using the

same formula that we use when transforming a FootSLAMmap, (4), to fit any given FootSLAM map and computethe corresponding value for the ratio of crossed walls andfurniture. We used a computer program to search for thesmallest violation ratio for a given map since FootSLAMmaps are rotation, scale and translation invariant.

7 ResultsTwo sets of data were processed. One smaller set of fivewalks of about 6 to 15 minutes of data, called DLR. Theother set, called MIT, was collected in a larger building (atthe CSAIL campus of MIT) and consisted of four walksof roughly 15 minutes duration. The second data set con-tained more complex and diverse geometric regions and thewalks were not aligned to a main corridor region. For oneof the MIT data sets we discarded the last ca. 15% of theodometry data since it was affected by a large singular er-ror. Real world implementations of FeetSLAM will haveto identify strong deviations of a map from the total mapautomatically.

Figure 10 shows the results for DLR experiments af-ter nine iterations. The FootSLAM map that has been rep-resented is that of the particle with the highest posteriorlikelihood. Our results show that FeetSLAM can reduce theFootSLAM hexagon transition error rate from roughly 20%(iteration zero) to less than 2% (FeetSLAM after nine iter-ations), as shown in Figure 14. We chose one of the DLRwalks to be very short and provide almost no loop closureby itself. As expected, the map for this data set (without aprior map) did not converge to a single map. However, incombination with the other maps this individual map con-verged after one iteration and helped the overall mappingprocess.

Figure 10. Total Map after nine iterations for the DLR data andground truth and furniture arrangement. The red linerepresents the wall in the original plan designed by thearchitect. The black arrow points to the real locationof one of the walls, which was erroneous in our origi-nal ground truth map.

Figures 11(a) to 11(d) show the aggregated FootSLAM

maps of all the particles for each one of the four data sets ofthe MIT data for the zeroth iteration - that is, when no priorwas available. Figure 11(e) shows the best combination ofthose four maps at the end of that iteration and Figure 12shows the total maps after iteration 1 (Figure 12(a)) and 2(Figure 12(b)). The improvements of the quality of the To-tal Map from iteration 0 to iteration 1 are clearly visible.Videos of the FeetSLAM algorithm can be found in [9].

Figure 13 shows the results for the MIT experimentsafter 10 iterations. The FootSLAM map that has been drawnis the total map with the highest posterior likelihood. Theground truth has been manually transformed to fit the Foot-SLAM map.

With FeetSLAM we have even exposed an error inour building plan ground truth - the actual layout of thedrywall construction had been recently changed and not re-flected in the map, as shown in Figure 10, where we showthat one of the walls had been incorrectly represented in theoriginal reference map.

Both of our experiments have been run on the sameprocessor, with six dual cores and a clock speed of 3.46GHz. It took 37 and 42 hours to run, respectively, ten iter-ations for the 5 DLR data sets and the 4 MIT data sets with90000 particles.

8 Conclusion and OutlookWe have presented a collaborative form of 2D FootSLAM(FeetSLAM) that allows multiple data sets to be combinedin order to map larger areas. The proposed method signif-icantly improves the mapping accuracy of a single data setin addition to providing maps for the entire area. This isbecause maps from all data sets support each other in theconvergence process of FootSLAM.

Our approach is based on iterative processing, be-cause the optimal estimator is expected to suffer from parti-cle depletion due to the large state space for many data sets.We control the iteration process by gradually increasing thestrength of the other maps (the prior) over iterations.

In a future application, the maps are expected to beuseful not only for greatly improved positioning of pedes-trians, but also as a basis for semantic maps where placeshave meanings and these can be learnt automatically fromdata.

Future work will address mapping in three dimen-sions, robust mapping of larger areas and complexity anal-ysis.

AcknowledgementsWe would like to thank Daniela Rus, Brian Julian, JohnLeonard and Michael Kaess from the MIT CSAIL for theirkind support and fruitful discussions.

References[1] O. Woodman and R. Harle, “Pedestrian localisation for

indoor environments,” in Proc. of the UbiComp 2008,Seoul, South Korea, Sep. 2008.

[2] S. Beauregard, Widyawan, and M. Klepal, “IndoorPDR performance enhancement using minimal mapinformation and particle filters,” in Proc. of theIEEE/ION PLANS 2008, Monterey, USA, May 2008.

[3] B. Krach and P. Robertson, “Cascaded estimation ar-chitecture for integration of foot-mounted inertial sen-sors,” in Proc. of the IEEE/ION PLANS 2008, Mon-terey, USA, May 2008.

[4] P. Robertson, M. Angermann, and B. Krach, “Simulta-neous localization and mapping for pedestrians usingonly foot-mounted inertial sensors,” in Proc. UbiComp2009, Orlando, Florida, USA.

[5] B. K. P. Robertson, M. Angermann and M. Khider, “In-ertial systems based joint mapping and positioning forpedestrian navigation,” in Proc. ION GNSS 2009, Sa-vannah, Georgia, USA, Sep. 2009.

[6] E. Foxlin, “Pedestrian tracking with shoe-mounted in-ertial sensors,” IEEE Computer Graphics and Applica-tions, vol. 25, no. 6, pp. 38–46, Nov. 2005.

[7] C. Berrou, A. Glavieux, and P. Thitimajshima, “NearShannon limit error-correcting coding and decoding:Turbo-codes,” in Proc. ICC ’93, May 1993, pp. 1064–1070.

[8] R. McEliece, D. MacKay, and J. Cheng, “Turbo de-coding as an instance of Pearl’s belief propagation al-gorithm,” IEEE Journal on Selected Areas in Commu-nications, vol. 16, no. 2, 1998.

[9] “FootSLAM videos and reference data sets download,”http://www.kn-s.dlr.de/indoornav.

(a) M01 (b) M0

2

(c) M03 (d) M0

4

(e) M0C

Figure 11. (a) to (d) Individual maps obtained running Foot-SLAM with no prior (zeroth iteration) for the “MITdata”. (e) Total combined map at the end of the zerothiteration.

(a) M1C (b) M2

C

Figure 12. Best combination of the FootSLAM maps of the MITexperiment after (a) iteration 1 and (b) iteration 2.

Figure 13. Total Map after 10 iterations for the MIT data and anoverlay of the ground truth map of the building wherethe walks took place.

Figure 14. Ratio of crossed walls and furniture for the ”DLRdata” FootSLAM map over the iterations.

collaborative pedestrian mapping of buildings using inertial ...collaborative pedestrian mapping of...

Documents