scalable, distributed, real-time map generation · positioning systems scalable,distributed,...

9
© 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. For more information, please see www.ieee.org/portal/pages/about/documentation/copyright/polilink.html. MOBILE AND UBIQUITOUS SYSTEMS www.computer.org/pervasive Scalable, Distributed, Real-Time Map Generation Jonathan J. Davies, Alastair R. Beresford, and Andy Hopper Vol. 5, No. 4 October–December 2006 This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Upload: others

Post on 20-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

© 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or

for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be

obtained from the IEEE.

For more information, please see www.ieee.org/portal/pages/about/documentation/copyright/polilink.html.

MOBILE AND UBIQUITOUS SYSTEMS www.computer.org/pervasive

Scalable, Distributed, Real-Time Map Generation

Jonathan J. Davies, Alastair R. Beresford, and Andy Hopper

Vol. 5, No. 4 October–December 2006

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works

may not be reposted without the explicit permission of the copyright holder.

Page 2: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

1536-1268/06/$20.00 © 2006 IEEE ■ Published by the IEEE CS and IEEE ComSoc PERVASIVEcomputing 47

P O S I T I O N I N G S Y S T E M S

Scalable, Distributed,Real-Time MapGeneration

Modern vehicles have a plethoraof onboard computing equip-ment. Today’s cars have up to50 microprocessors governingvarious aspects of their oper-

ation. We foresee a future in which vehicles’sensor, communication, and computational re-sources are harnessed to improve the trans-portation infrastructure and have a positiveimpact on society.

To start working toward that vision, we canexploit modern vehicles’ com-puting facilities to provide ahost of participative mobileapplications. Such uses includevastly improved collection anddissemination of weather dataand real-time measurement of

road surface conditions. In particular, the shar-ing of movement data from vehicles might helpfacilitate city planning, improve fleet manage-ment, and enforce congestion charging (chargingfees for driving in certain areas at certain times).

The map data in a vehicle’s navigation unitforms the basis of the unit’s routing decisions. Thisdata is prone to cartographic errors and to inac-curacies due to recent changes in the road net-work, including temporary road closures. Theseproblems can frustrate drivers when they cause theunit to give impossible driving instructions.

So, organizations that produce digital maps,such as Navteq and Tele Atlas, must invest signif-icant effort in maintaining and improving theirdata’s accuracy. Details about new roads and themodification or closure of existing roads must beincorporated into the databases in a timely fash-

ion. Mapping organizations obtain data aboutroad network changes from various sources,including local authorities and building contrac-tors, but this data tends to be highly inaccurate.Aerial photographs can help mapmakers deduceroads’ presence and shape but are prohibitivelyexpensive to update frequently. Instead, mappingcompanies typically own fleets of probe vehiclesthat investigate discrepancies and explore newroads. Tele Atlas spends tens of millions of dollarseach year in North America to keep its databasesup-to-date, while in 2004 Navteq employed morethan 500 analysts who drove a total of 3.5 millionmiles throughout North America and Europe.

To simplify this mapmaking process, vehiclescould use their computational resources todirectly generate accurate map data. We envisagevehicles on the road network forming a wide-scale mobile sensing and computing platform.Toward that goal, we’ve developed an algorithmfor keeping digital maps up-to-date, using ordi-nary vehicles making normal journeys rather thanfleets of dedicated probe vehicles. However, fur-ther development is required to address severalremaining challenges, including the choice ofarchitecture to support the algorithm.

Automatically generating digital maps

Information about a road network can be rep-resented as a directed graph with metadata asso-ciated with its edges. Such a graph can be used toboth render a graphical depiction of the road net-work and serve as an input to in-vehicle naviga-tion systems. The graph’s edges represent roads(or road lanes), and its vertices represent junctions.

In this application, many vehicles participate in the collection,processing, and dissemination of data to automatically generate digitalroad maps.

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

Jonathan J. Davies, Alastair R.Beresford, and Andy HopperUniversity of Cambridge

Page 3: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

Generating a directed graphThe application based on our algo-

rithm transforms GPS traces from mul-tiple vehicles into a road map. Process-ing involves four basic stages:

1. Generate a 2D histogram indicatingthe number of GPS fixes found ineach cell.

2. Deduce the road edges’ positions.3. Compute the positions of the roads’

centerlines.4. Determine the direction of travel

permitted along each road.

We now examine these stages in detail,using figure 1 as a running example.

A trace of location fixes obtained froma vehicle’s GPS unit will show whichroads the vehicle has traveled along. Thetrace will contain errors due to uncer-tainty in the location fixes and missedsightings when the GPS satellite signal isobscured. From a single trace, distin-guishing between junctions and bends inroads is difficult. Moreover, the GPSreadings’ inherent errors might mean thatthe trace misrepresents the roads’ truepositions. However, if we superimposethe traces from several vehicles travelingalong different routes, junctions will soon

become apparent. Also, the roads’ truepositions will become clearer as noise dueto errors becomes less significant.

Creating a histogram. Splitting up 2Dspace in the horizontal plane into cells—small, tessellating, square units of area—we wish to determine the likelihood thateach cell is part of a road. The Nyquist-Shannon sampling theorem dictates thatthe cell width should be at most half theminimum road width to prevent aliasing.In practice, this equates to a few meters.A GPS reading falling in a cell is a goodindication that the cell might be part of aroad. So, if we associate with each cellthe number of GPS points that fall in it,cells with higher frequencies will morelikely be parts of a road. In this way, wegroup our 2D real-valued GPS fixes intodiscrete cells. (For a brief look at similarresearch and other research related tomap generation, see the “Related Workin Map Generation” sidebar.)

At highway speeds, with GPS fixesobtained at 1 Hz, consecutive fixes willfall approximately 30 meters apart. Thus,with a cell size of a few meters, succes-sive fixes won’t lie in adjacent cells, leav-ing us with disjoint regions of road. Butif the GPS fixes are temporally ordered,

we know that road exists between con-secutive fixes. So, we can also incrementthe value in the cells that are betweenthose cells in which the fixes lie. If the fre-quency of readings is greater than a fewHertz, then linear interpolation betweenthe GPS fixes will likely be acceptable.However, higher-order interpolationmight yield more realistic results.

The value attached to each cell repre-sents the confidence that the cell is part ofa road. Our application increments a cell’svalue by an amount proportional to thelength of the line that passes through thecell. In this way, if the line traverses onlya cell’s corner, then that cell’s value isincremented by only a small amount.

After the application has processed allthe available GPS fixes in this way, wehave a 2D histogram that estimates theconfidence of each cell constituting partof a road, based on all the journeystraced out in the data (see figure 1a).

Despite the interpolation between suc-cessive GPS fixes, gaps will likely existbecause a cell with a low frequencymight be surrounded by cells with highfrequencies. This might be due to a ran-dom paucity of data collected in that cell;systematic errors intrinsic in the originalGPS data; real-world features that

48 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

REFERENCES

1. H.P. Moravec and A. Elfes, “High Resolution Maps from Wide AngleSonar,” Proc. 1985 IEEE Int’l Conf. Robotics and Automation, IEEE Press,1985, pp. 116–121.

2. A. Elfes, “Using Occupancy Grids for Mobile Robot Perception and Nav-igation,” Computer, June 1989, pp. 46–57.

3. S. Thrun, “Robotic Mapping: A Survey,” Exploring Artificial Intelligence inthe New Millennium, G. Lakemeyer and B. Nebel, eds., Morgan Kauf-mann, 2002, pp. 1–36.

4. H. Choset and K. Nagatani, “Topological Simultaneous Localization andMapping (SLAM): Toward Exact Localization without Explicit Localiza-tion,” IEEE Trans. Robotics and Automation, vol. 17, no. 2, 2001, pp.125–137.

5. R.K. Harle, “Maintaining World Models in Context-Aware Environ-ments,” doctoral thesis, Laboratory for Communications Eng., Dept. ofEng., Univ. of Cambridge, 2004.

Related Work in Map Generation

D iscretizing the space covered by sensor readings into a grid

of cells is an established practice. Robotics researchers use

certainty grids1 or occupancy grids2 to store the probability that

an obstacle exists in any particular cell in an environment that ro-

bots will explore.3 Unlike that research, we don’t have the luxury

of being able to determine obstacles’ presence. Robotics research

also heavily employs Voronoi graphs,4 particularly because they

describe an environment’s topological features, making them suit-

able for route planning.

Robert Harle investigated generating descriptions of an in-

door environment purely from records of location fixes.5 While

it’s unrealistic to expect exhaustive coverage of a typical in-

door environment, we can expect vehicles to exhaust all avail-

able road space. This is because the road network imposes

greater constraints on the users’ movement.

Page 4: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

obstruct vehicles, such as lampposts orroad barriers; or blackspots (areas with-out GPS coverage) such as under bridges.In any case, these gaps are undesirableand should be removed because we’remerely aiming for a directed graph of theroad network’s topology where connec-tivity is important.

To remove such gaps in the histogram,we apply a blur filter. This removes smallgaps because the filter averages cell valueswith neighboring cells, but larger gaps willpersist. Similarly, the filter will smoothjagged edges. However, performing a blurmight undesirably merge two nearby butdistinct roads. Figure 1b depicts the dataconvolved with a three-by-three-cell uni-form blur convolution filter.

Deducing edge positions. At this pro-cessing stage, we have a good idea ofwhether a road exists in any given cell.So, we binarize this metric to a Booleanvalue: is there a road or not? To do this,we apply a global threshold to our cells.The threshold’s value will relate to thedegree of confidence we want in the roadnetwork deduced from our algorithm. Alower threshold will be more susceptibleto noise due to GPS errors. Figure 1cshows the result of thresholding the data.

After thresholding, we apply a contourfollower1 to the image. This extracts a setof closed polygons describing the roadregions’ outline (see figure 1d). This out-line might not coincide precisely with theroads’ real-life edges, owing to errors inthe original GPS data. However, if theseerrors are symmetrically distributed, thecenterline between the edges will coincidewith the road’s actual centerline.

Computing centerlines. A Voronoi graphis the set of points equidistant from the

nearest two points on the boundaries of aset of closed polygons.2 We can computethe road centerlines by producing theVoronoi graph of the contours describingthe roads’ edges and discarding the result-ing edges that lie outside the roads.

However, because the roads’ edgesaren’t convex, many short edges of theVoronoi graph will be attached to themain trunk of each road. This gives thegraph a rather “hairy” appearance. Theseedges don’t correspond to real-life roads;

they’re artifacts from our initial dis-cretization of GPS fixes into square cells.We can remove the edges that are shorterthan a threshold representing the mini-mum permitted road length, leaving just the main edges running the roads’lengths. Figure 1e depicts the Voronoigraph generated from the contours, withthe short edges in red.

Determining road direction. The finalstage is to deduce which edges of the

OCTOBER–DECEMBER 2006 PERVASIVEcomputing 49

(a) (b)

(c) (d)

(e) (f)

Shavededges

Retainededges

One−wayroads

Two−wayroads

Figure 1. Generating a map of the citycenter of Cambridge, UK: (a) a histogram, with one pixel per cell,(b) a blurred histogram, (c) a thresholded histogram, (d) contours, (e) a Voronoi graph, and (f) a directed graph.

Page 5: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

undirected Voronoi graph represent uni-directional roads and which representbidirectional roads. If the original GPSdata was temporally ordered, we canproduce a further data structure that willhelp us determine this. Again splittingspace into cells, in each cell we now keeptrack of the number of journeys embod-ied in the GPS traces that pass through

the cell in each of the eight compass direc-tions. We generate this by quantizing thebearing of the displacement vectorbetween each successive pair of fixes andincrementing the count for that direction.

For each compass direction, we sumthe counts of journeys associated witheach cell on a road. Unidirectional roadshave significantly more cars travelingparallel to the road in one direction thanin the other. Figure 1f shows which edgesthe algorithm deduced as unidirectionaland as bidirectional.

Complexity. The algorithm’s overall timecomplexity is O(n + m), where n is thenumber of GPS readings and m is thetotal number of cells. Individually, thestages involving the processing of GPSreadings are O(n) and the stages involv-ing image processing are O(m). However,if we can store the latest versions of thehistogram and direction data, the timecomplexity of generating the new mapincorporating an incremental set of GPSreadings of size �n is merely O(�n + m).

Reflecting road changesWe wanted our application to produce

digital maps that reflect the creation ofnew roads, the closure of old roads, andthe change in geometry of existing roads.Our algorithm makes it trivial to estab-

lish that a new road has been opened: itacquires GPS traces of vehicles using theroad and regenerates the map to showthe new road. However, we can’t say thatabout the other two goals. Once we havesome GPS traces traveling down a par-ticular road, the digital maps that ourbasic application produces will alwayscontain that road.

To reflect changes to existing roads overtime, we must place lower trust in olderdata than in more recent data. This meansthat vehicles must continue to travel downroads regularly to maintain our level oftrust in the roads’ existence. To adjust thetrust levels accordingly, we could adjustthe values in the histogram by smallerincrements when processing an older GPStrace than when processing a more recentone. So, when a road is closed, we wouldno longer receive GPS traces showing itin use. Eventually the histogram cells’value would fall below our binarizationthreshold, causing the road to disappearfrom the map. However, this implies thata certain latency would exist betweenwhen the road closes and when the mapreflects the closure. We could reduce thisdelay if we can obtain more GPS tracesfrom more vehicles.

Map regenerationFor the map to be suitable for naviga-

tion, we need to associate metadata witheach edge in the directed graph. How-ever, this task might be relatively expen-sive because it might involve manualeffort. While metadata such as the speedlimit could be inferred from the GPSdata, metadata such as the road namecould be determined only by visitingeach road or, perhaps in the future, from

active road signs. Because we want theapplication to generate up-to-date maps,we must execute the algorithm repeti-tively as new GPS traces come to light.However, we need to avoid the cost ofreassociating the metadata from scratchevery time we regenerate a map.

On the basis of the knowledge ob-tained from additional GPS traces, roadsin the old version might change theirshape, and junctions might shift position.Furthermore, roads—and thus junctions—might appear or disappear. This makes itdifficult to determine which roads in theold version correspond to which roads inthe present version, thus making it diffi-cult to transfer the old roads’ metadatato the new roads.

To estimate which roads in the old ver-sion correspond to which roads in thepresent version, and which roads existonly in one version, we can employ aweighted bipartite graph. A bipartitegraph is a special type of undirectedgraph that splits vertices into two disjointsets, with no edges between two verticesin the same set. A weighted bipartitegraph has costs associated with its edges.

In this case, the two sets of verticesrepresent the roads in the old version andthe roads in the present version. Theedges have an associated cost relating tothe similarity between a pair of roadsfrom those sets. So, a low-cost edgebetween two vertices in the bipartitegraph means that a road in the old ver-sion very closely corresponds to a roadin the present version. The cost metricrelates to the distance that the ends ofthe road network edges have movedbetween the two versions and to the sim-ilarity in the edges’ shape.

Figures 2a and 2b show a map’s old andnew versions; figure 2c shows a bipartitegraph constructed from the two versions.

A minimum-cost maximal matchingon this bipartite graph will thereforeindicate which edges in the old and pre-sent versions best correspond and will

50 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

For the map to be suitable for navigation, we

need to associate metadata with each edge in

the directed graph.

Page 6: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

contain high-cost edges between theremaining vertices. (A matching is a sub-set of the graph’s edges with no verticesin common. A maximal matching em-ploys as many edges as possible. A min-imum-cost maximal matching minimizesthe sum of its edges’ costs.) The applica-tion can ignore these high-cost edges,which correspond to pairs of roads inone version of the map but not the other.For the remaining low-cost edges, theapplication can transfer metadata be-tween the roads corresponding to thevertices. In figure 2d, the red edge rep-resents a discarded high-cost edge. Thismatching indicates that road B hasclosed and roads 1, 3, and 7 are new.

ScalabilityTo process GPS traces from a vast

number of vehicles covering a large geo-graphical area, our algorithm must scalegracefully. Fortunately, it’s highly paral-lelizable, by dividing up space into tes-sellating regions or tiles. (A tile comprisesmany cells and might cover severalsquare kilometers.) Then, an individual processing node can use the GPS tracesfalling in a tile to produce a directedgraph of the road network in that tile.

Once each processing node has pro-duced its tile’s graph, we need to stitchthe results back together into one com-plete graph. However, we can’t expectthat roads spanning the tiles’ edges willnecessarily align and thus be contiguouswhen juxtaposed. This is because wecan’t be certain about the results pro-duced near a tile’s border when that tilehas been processed in isolation.

To solve this problem, we process a setof tiles, each of which overlaps the adja-cent tiles by the number of cells corre-sponding to the region of uncertainty.

Then, to stitch together the resultingdirected graphs from each tile, we simplyclip the roads to the tile’s central region.A road that originally traversed twoadjacent tiles will now be contiguous, sowe join into a single edge the edges thatmeet at the seam between their respec-tive tiles. The ratio of the overlap region’sarea to that of the entire map corre-sponds to the parallelization overhead.

We’ve tested this technique successfullyby partitioning the data used in figure 1into four quadrants, mimicking separateprocessing nodes. Processing each quad-rant individually and stitching the result-ing graphs together produced the sameoutput as processing all the data at once.

Possible system architecturesMany modern satellite navigation

units are, in fact, small computers con-taining general-purpose processors andhard disks. Some (such as the TomTomGO) already use a GPRS (General PacketRadio Service) connection from an

attached mobile telephone to downloadreal-time traffic reports. Navigation unitswith Internet connectivity could supportour mapping application by sharing GPStraces at regular intervals. Further in thefuture, vehicles will likely also carry IEEE802.11 communications equipment,which could upload data when withinrange of a suitable access point.

A broad spectrum of system architec-tures could support map generation. Wecan view a network of vehicles and anyinfrastructure support as a set of nodesin a distributed-memory parallel com-puter. At one end of the spectrum is thefully centralized approach, where vehi-cles upload their raw GPS readings to acentral server that generates new mapdata. Despite bringing benefits, particu-larly timely data delivery and homoge-neous results, this approach will likelybe impractical because of the communi-cation bandwidth needed to transmit allthe GPS readings to the server.

To distribute the communications

OCTOBER–DECEMBER 2006 PERVASIVEcomputing 51

A

B

C

D

E

1

2

3

4

5

6

7

A

B

C

D

E

1

2

3

4

5

6

7

Low costHigh cost

A

B

C

D

E

1

2

3

4

5

6

7

Low costHigh cost

(a) (b)

(c)(d)

Figure 2. Estimating which roads in anold map correspond to which roads in anew map: (a) an old version of a map, (b) a new version of a map, (c) aweighted bipartite graph of the maps,and (d) a minimum-cost maximal match-ing on the bipartite graph.

Page 7: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

bandwidth, we can employ multipleservers, each processing data for a dif-ferent geographical region. Commercialoperators seeking to gather and processdata cheaply might prefer this approach.Furthermore, to avoid requiring a costlybackhaul network between regionalservers, the vehicles themselves could dis-tribute data across region boundaries.

This is a topic of ongoing research in theCarTel project, which uses vehicles ashigh-bandwidth “data mules.”3

An increasingly common solution isto execute some processing on the vehi-cles themselves. For example, in theVehicle Data Stream Mining (VEDAS)project, vehicles process their own sensordata, uploading the results to a remotecentral server over a low-bandwidthwireless network.4

Another architecture is to providepublic data caches that are connected tothe Internet and store data but don’t con-tain any processing facilities. In this sce-nario, the vehicles must acquire as manyGPS traces as they can from the nearestpublic data cache (perhaps by IEEE802.11 communications) and executethe map application locally. Consumersconcerned about location privacy mightprefer this approach; such a decentral-ized scheme can limit the transmissionof personally identifiable data.

At the far end of the architecture spec-trum is the fully peer-to-peer scenario in which vehicular ad hoc networks(VANETs) share GPS and map data. Al-though this solution won’t have anyongoing service costs to customers, it’sunlikely that it can gather sufficient datato produce up-to-date, reliable maps.VANETs will likely be more suited to

small-scale cooperative driver assistancesystems such as TrafficView5 and thoseenabled by Network-on-Wheels.6

Performance limitationsOur application has several funda-

mental performance limitations.GPS reading errors are typically mod-

eled by a bivariate normal distribution.

The standard deviation, �, in the error ofthe position estimate for a modern GPSreceiver has recently been estimated as4.25 m, giving a 95 percent confidenceinterval of 8.5 m.7 (Others have esti-mated the value to be 3.5 m.8) We believethat a reasonable minimum distancebetween two adjacent parallel roadsshould be 4�. This means that no morethan approximately 2.5 percent of theposition estimates of vehicles at oneroad’s edge can overlap with no morethan approximately 2.5 percent of theposition estimates of vehicles at the otherroad’s nearest edge. If this limit isn’tobserved, the region between the roadsmight be filled with stray GPS fixes, caus-ing no discernable gap between the roadsafter thresholding. For example, with �= 4 m, the minimum road spacing toler-ated is approximately 16 m.

The distribution of GPS readings forvehicles traveling in all lanes of a roadwon’t be normal. However, we can ap-proximate it using a multimodal distri-bution consisting of one normal distri-bution per lane. We assume that thisdistribution’s underlying mean is on theroad’s centerline (roughly, that the traf-fic volume is evenly distributed aboutthe centerline). By the central limit the-orem, the error on the mean of the GPSreadings, when compared with the

road’s actual centerline, will be normallydistributed. Also, its standard deviationwill decrease at a rate of 1/�n for an n-fold increase in the number of samples.So, with sufficient samples, GPS databecomes an accurate predictor of thereal centerline.

More specifically, we can model thedistribution of GPS samples collectedfrom a two-lane road as a distribution(N(�1, �2) + N(�2, � 2))/2, where �1 and�2 are the positions of the lanes’ center-lines and the mean is the road’s center-line. By the central limit theorem, with73 samples, the estimate of the road’scenterline with the lanes’ centers 3 mapart will be within 1 m of the true posi-tion, 95 percent of the time.

High GPS sampling rates are desir-able. Ideally, the sampling rate shouldbe such that when an abrupt change ofdirection occurs, consecutive positionfixes are no more than one cell widthapart. This corresponds to a frequencyof v/w for maximum cornering speed vand cell width w. For a cell width of afew meters, a 1 Hz sampling rate typi-cally gives adequate performance. Withlower frequencies, when the samplesare linearly interpolated, the change ofdirection won’t be as sharp as in real-ity. When a vehicle is traveling rapidly,the distance between consecutive fixeswill be larger, but abrupt directionchanges aren’t possible owing to phys-ical limits on deceleration. To decreasethe volume of GPS data that a vehiclecollects, we could adopt a strategy suchas recording only the points where thedirection changes substantially.

Roads with little traffic volume willrequire more time for changes to appearin the map. Furthermore, less popularroads might even fall under the thresholdand thus not appear in the generatedmap because they aren’t distinguishablefrom erroneous road segments.

When the algorithm receives a newGPS trace, the degree to which the trace

52 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

We can view a network of vehicles and any

infrastructure support as a set of nodes in a

distributed-memory parallel computer.

Page 8: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

OCTOBER–DECEMBER 2006 PERVASIVEcomputing 53

affects the generated map varies. If thetrace uses roads that haven’t been visitedrecently, the map won’t likely be affectedbecause the contributions to cells in thehistogram won’t rise above the thresh-old. On the other hand, if it uses roadsthat have been recently heavily visited,the map will be minimally affectedbecause the trace will have little impacton the centerlines’ positions. Betweenthese two extremes, the trace will havea more significant effect. Because of thecost of regenerating the map, we canchoose to regenerate it only when wehave sufficient new data to significantlyaffect the output. We can further reducethe cost of executing the algorithm byregenerating only the map parts thathave received new data.

Practical analysisTo investigate our algorithm’s efficacy

with real-world data, we performed apractical experiment. Our applicationgenerated the images in figures 1 and 3from GPS traces constituting nearly onemillion position readings collected by asingle vehicle that we drive in and aroundCambridge. (For more on that vehicle,see the Works in Progress department inthis issue. We’ve also generated roadmaps from other GPS data sources.)

We compared our application’s outputwith a UK Ordnance Survey map of thesame region (see figure 3). Our proof-of-concept implementation of the algorithmtakes less than two minutes to completeon a standard desktop workstation, for a40 km2 region comprising approximatelyone million cells, with approximately onemillion GPS samples. In the histogram,93 percent of the cells were empty, with

22 percent of the remainder falling belowan empirically determined threshold.

According to the CambridgeshireCounty Council, approximately 1,000vehicles per hour use a typical city-cen-ter road.9 If we need 73 samples at a par-ticular point along a road’s length toachieve acceptable accuracy, we shouldregenerate the map 14 times per hour—once every four minutes—with freshdata. Because the algorithm’s time com-plexity is linear with area, our imple-mentation could process an area approx-imately 80 km2 every four minutes. So,to process the entire UK, we would needapproximately 3,000 processing nodes.However, we believe that an optimizedimplementation could execute at least anorder of magnitude more quickly.

On the roads that have a sufficientdensity of GPS readings, the generatedroad segments align well with the OSroad segments. However, we noticed

five main differences between our mapand the OS map.

First, some road segments exist in thegenerated map but not in the OS map.These correspond to newly constructedroads that don’t yet appear in the OSdata, justifying our claim that you canuse our algorithm to highlight the cre-ation of new roads. The yellow circle infigure 3 indicates such new roads.

Second, the generated segments aremuch more jagged than the OS segments.This is because the algorithm extracts thesegments from a Voronoi graph of jaggedboundaries. A topic for further researchis whether a line simplification algorithmsuch as the Douglas-Peucker algorithm10

could smooth the segments.Third, the generated map interprets

road bridges as crossroads (see the red cir-cle in figure 3). This is because we discardaltitude data when initially forming the2D histogram. This problem has two

© Crown Copyright/database right 2005. An Ordnance Survey/EDINA supplied service.

Figure 3. A UK Ordnance Survey map ofthe Cambridge area, with a generatedroad map overlaid in black. The yellowcircle highlights new roads. The red circlehighlights a bridge misinterpreted as ajunction. The blue circle highlights a misaligned junction. The green circlehighlights two junctions that havemerged into one.

Page 9: Scalable, Distributed, Real-Time Map Generation · POSITIONING SYSTEMS Scalable,Distributed, Real-Time Map Generation M odern vehicles have a plethoraof onboard computing equip-ment

54 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

potential solutions, which we plan toexplore. By using a 3D histogram, extract-ing the 3D surface (using an algorithmsuch as Marching Cubes11), and produc-ing a 3D Voronoi graph, we should findthat the two roads no longer intersect.Alternatively, we could analyze each gen-erated junction and determine whichturns the vehicles can actually make. Thealgorithm would then interpret bridges ascrossroads with no permissible turns.

Fourth, the generated map has someskewed junctions (see the blue circle infigure 3). This results from the errorsinherent in the initial GPS fixes, causingthe histogram to inaccurately reflect thetrue road layout. This problem is hardto fix but might be solved by havingvehicles use accelerometers and sensorfusion techniques to improve the loca-tion fixes’ accuracy.

Finally, some pairs of nearby junctionshave merged (see the green circle in figure3). This results from an excessively grossdiscretization resulting from too large acell size or too heavy a blur. So, we canpotentially fix this by adjusting theseparameters.

Generating road maps in this waydoesn’t produce a perfect result, but nei-ther do traditional mapmaking tech-niques. By using data from privatelyowned vehicles, we have the advantageof being able to discover changes to theroad network. We hope that the degreeof automation will increase as we solvethe issues outlined in this section.

We plan to investigate mak-ing the histogram’s cellsizes adaptive, so that ouralgorithm can more ac-

curately inspect areas with many GPSreadings. We also plan to analyze vehi-cles’ direction of approach to and depar-ture from junctions to determine the junc-tions’ nature more precisely.

Future research on such a system’s

architectural design will simulate vari-ous architectures and investigate theirapplicability to applications with otherdata and processing requirements. Re-search is also necessary on social andsecurity issues related to such participa-tive applications, such as protecting vehi-cle owners’ privacy and protecting thesystem from malicious users.

ACKNOWLEDGMENTSWe thank Andrew Rice, David Cottingham, RobertHarle, and Ripduman Sohan for useful discussions; Ri-chard Gibbens for assistance with the statistical an-alysis; Andrew Rice for the use of his contour followerimplementation; and Keith Farkas for his helpful advice.

REFERENCES1. S. Yokoi, J.-I. Toriwaki, and T. Fukumura,

“An Analysis of Topological Properties ofDigitized Binary Pictures Using Local Fea-tures,” Computer Graphics and Image Pro-cessing, vol. 4, no. 1, 1975, pp. 63–73.

2. F. Aurenhammer, “Voronoi Diagrams—ASurvey of a Fundamental Geometric DataStructure,” ACM Computing Surveys, vol.23, no. 3, Sept. 1991, pp. 345–405.

3. B. Hull et al., “CarTel: A Distributed Mo-bile Sensor Computing System,” to be pub-lished in Proc. 4th ACM Conf. EmbeddedNetwork Sensor Systems (SenSys 06), ACMPress, 2006.

4. H. Kargupta et al., “VEDAS: A Mobile andDistributed Data Stream Mining System forReal-Time Vehicle Monitoring,” Proc.SIAM Int’l Data Mining Conf., CambridgeUniv. Press, 2004, pp. 300–311.

5. T. Nadeem et al., “TrafficView: Traffic DataDissemination Using Car-to-Car Commu-nication,” Mobile Computing and Comm.Rev., vol. 8, no. 3, 2004, pp. 6–19.

6. M. Torrent-Moreno, A. Festag, and H.Hartenstein, “System Design for Informa-tion Dissemination in VANETs,” Proc. 3rdInt’l Workshop Intelligent Transportation(WIT), Technische Universität Hamburg-Harburg, 2006, pp. 27–33.

7. R. Prasad and M. Ruggieri, Applied SatelliteNavigation Using GPS, GALILEO, and Aug-mentation Systems, Artech House, 2005.

8. K.D. McDonald and C. Hegarty, “Post-modernization GPS Performance Capabil-

ities,” Proc. IAIN World Congress and theION 56th Ann. Meeting, Inst. of Naviga-tion, 2000, pp. 242–249.

9. 2005 Traffic Monitoring Report, Cam-bridgeshire County Council, 2006, www.cambridgeshire .gov.uk/ transport /monitoring/network/traffic+monitoring+report.htm.

10. D.H. Douglas and T.K. Peucker, “Algo-rithms for the Reduction of the Number ofPoints Required to Represent a Line or ItsCaricature,” The Canadian Cartographer,vol. 10, no. 2, 1973, pp. 112–122.

11. W.E. Lorensen and H.E. Cline, “MarchingCubes: A High Resolution 3D Surface Con-struction Algorithm,” SIGGRAPH ComputerGraphics, vol. 21, no. 4, 1987, pp. 163–169.

the AUTHORSJonathan J. Davies is a PhDcandidate in the Universityof Cambridge’s ComputerLaboratory. His research in-terests include intelligenttransportation systems andsentient computing. He re-ceived his BA in computerscience from the University

of Cambridge. Contact him at the ComputerLaboratory, 15 JJ Thomson Ave., Cambridge,CB3 0FD, UK; [email protected]; www.cl.cam.ac.uk/~jjd27.

Alastair R. Beresford is aresearch associate in the Uni-versity of Cambridge’s Com-puter Laboratory and a re-search fellow at RobinsonCollege. His research inter-ests include ubiquitous sys-tems, computer security,and networking. He received

his PhD in engineering from the University ofCambridge. Contact him at the Computer Lab-oratory, 15 JJ Thomson Ave., Cambridge, CB30FD, UK; [email protected]; www.cl.cam.ac.uk/~arb33.

Andy Hopper is a professorof computer technology andthe department head in theUniversity of Cambridge’sComputer Laboratory. Hisresearch interests includenetwork design and sustain-able and mobile computing.He received his PhD in com-

puter science from the University of Cambridge.He’s a fellow of the Royal Academy of Engineer-ing and the Royal Society and is a trustee of theInstitution of Engineering and Technology. Con-tact him at the Computer Laboratory, 15 JJThomson Ave., Cambridge, CB3 0FD, UK;[email protected]; www.cl.cam.ac.uk/~ah12.