smashq: spatial mashup framework for k-nn queries in time …chiychow/papers/dapd_2012.pdf · 2012....

Distributed and Parallel Databases manuscript No.(will be inserted by the editor)

SMashQ: Spatial Mashup Framework for k-NN Queriesin Time-Dependent Road Networks

Detian Zhang · Chi-Yin Chow ·Qing Li · Xinming Zhang · Yinlong Xu

Received: date / Accepted: date

Abstract The k-nearest-neighbor (k-NN) query is one of the most popularspatial query types for location-based services (LBS). In this paper, we focus onk-NN queries in time-dependent road networks, where the travel time betweentwo locations may vary significantly at different time of the day. In practice,it is costly for a LBS provider to collect real-time traffic data from vehiclesor roadside sensors to compute the best route from a user to a spatial object

The work described in this paper was partially supported by grants from City University ofHong Kong (Project No. 7002686 and 7002606) and the National Natural Science Foundationof China under Grant No. 61073185 and 61073038.

Detian ZhangDepartment of Computer Science and Technology, University of Science and Technology ofChina, Hefei, ChinaDepartment of Computer Science, City University of Hong Kong, Hong Kong, ChinaUSTC-CityU Joint Advanced Research Center, Suzhou, ChinaE-mail: [email protected]

Chi-Yin ChowDepartment of Computer Science, City University of Hong Kong, Hong Kong, ChinaE-mail: [email protected]

Qing LiDepartment of Computer Science, City University of Hong Kong, Hong Kong, ChinaUSTC-CityU Joint Advanced Research Center, Suzhou, ChinaE-mail: [email protected]

Xinming ZhangDepartment of Computer Science and Technology, University of Science and Technology ofChina, Hefei, ChinaUSTC-CityU Joint Advanced Research Center, Suzhou, ChinaE-mail: [email protected]

Yinlong XuDepartment of Computer Science and Technology, University of Science and Technology ofChina, Hefei, ChinaUSTC-CityU Joint Advanced Research Center, Suzhou, ChinaE-mail: [email protected]

2 Detian Zhang et al.

of interest in terms of the travel time. Thus, we design SMashQ, a server-sidespatial mashup framework that enables a database server to efficiently evaluatek-NN queries using the route information and travel time accessed from anexternal Web mapping service, e.g., Microsoft Bing Maps. Due to the expensivecost and limitations of retrieving such external information, we propose threeshared execution optimizations for SMashQ, namely, object grouping, directionsharing, and user grouping, to reduce the number of external Web mappingrequests and provide highly accurate query answers. We evaluate SMashQusing Microsoft Bing Maps, a real road network, real data sets, and a syntheticdata set. Experimental results show that SMashQ is efficient and capable ofproducing highly accurate query answers.

Keywords Spatial mashups · location-based services · k-nearest-neighborqueries · time-dependent road networks · Web mapping services

1 Introduction

With the ubiquity of wireless Internet access, GPS-enabled mobile devices andthe advance in spatial database management systems, location-based services(LBS) have been realized to provide valuable information for their users basedon their locations [15, 17]. LBS are an abstraction of spatio-temporal queries.Typical examples of spatio-temporal queries include range queries (e.g., “Howmany vehicles in a certain area”) [9, 20, 24] and k-nearest-neighbor (k-NN)queries (e.g., “What are the three nearest gas stations to my car”) [13,20,22,27].

The distance between two locations in a road network is measured in termsof the network distance, instead of the Euclidean distance, to consider the phys-ical movement constrains of the road network [24]. It is usually defined by thedistance of their shortest path. However, this kind of distance measure wouldhide the fact that the user may take longer time to travel to his/her nearestobject of interest (e.g., restaurant and hotel) than other ones due to many real-istic factors, e.g., heterogeneous traffic conditions and traffic accidents. Traveltime (e.g., driving time) is in reality a more meaningful and reliable distancemeasure for LBS in the road network [6, 7, 10]. Figure 1 depicts an examplewhere Alice wants to find the nearest clinic for emergency medical treatment.A traditional shortest-path based NN query algorithm returns Clinic X. How-ever, a driving-time based NN query algorithm returns Clinic Y because Alicewill spend less time to reach Y (3 mins.) than X (10 mins.).

Since travel time is highly dynamic, e.g., the driving time on a segment ofI-10 freeway in Los Angeles, USA between 8:30AM to 9:30AM changes from30 minutes to 18 minutes, i.e., 40% decrease in driving time [6], it is almostimpossible to accurately predict the travel time between two locations in theroad network based on their network distance. The best way to provide real-time travel time computation is to continuously monitor the traffic in the roadnetwork; however, it is difficult for every LBS provider to do so due to veryexpensive deployment cost.

Title Suppressed Due to Excessive Length 3

Clinic X

Clinic Y

Distance: 500 meters

Travel time: 10 mins

Distance: 1000 meters

Travel time: 3 mins

Alice: Where is my nearest clinic?

Fig. 1 Shortest-path-based nearest object X versus travel-time-based nearest object Y .

A mashup, one of the key Web 2.0 technologies, is a web application thatcombines data, representation, and/or functionality from multiple web appli-cations to create a new application [30]. A server-side spatial mashup (or GISmashup) provides a cost-effective way for a database server to access the routeand travel time information between two locations in the road network fromexternal Web mapping services, e.g., Google Maps [11], Yahoo! Maps [32],Microsoft Bing Maps [19] and government agencies. Web mapping serviceproviders have many sources to collect data (e.g., historical traffic statistics andreal-time traffic conditions) to estimate the travel time and direction betweenany two locations. The application scenario of this work is that location-basedservice (LBS) providers can subscribe direction services from Web mappingservices through their APIs to provide services for their users based on cur-rent traffic conditions. In other words, our proposed framework enables theLBS provider to outsource the real-time traffic monitoring and traffic analysisoperations to the Web mapping service provider, and thus, the LBS providercan focus on its core business activities and reduce its operational costs.

However, existing spatial mashups suffer from the following limitations.(1) It is costly to access direction information from aWeb mapping service, e.g.,retrieving travel time from the Microsoft MapPoint web service to a databaseengine takes 502 ms while the time needed to read a cold and hot 8 KB bufferpage from disk is 27 ms and 0.0047 ms, respectively [18]. (2) There is a chargeon the number of requests issued to a Web mapping service, e.g., Google Mapsallows only 2,500 requests per day for evaluation users and 100,000 requestsper day for business license users [12]. A location-based service provider needsto pay more to have higher usage limits. (3) The use of retrieved route informa-tion is restricted, e.g., the route information must not be pre-fetched, cached,or stored, except only limited amount of content can be temporarily stored forthe purpose of improving system performance [12]. (4) Existing Web mappingservices only support primitive operations, e.g., the travel direction and timebetween two locations. A database server has to issue a large number of exter-


nal requests to collect small pieces of information from the supported simpleoperations to process relatively complex spatial queries, e.g., k-NN queries.

In this paper, we design SMashQ, a framework that uses a server-sideSpatial Mashup paradigm to process k-NN Queries in the time-dependentroad network. Given a set of objects R and a k-NN query with a user U ’slocation, SMashQ finds at most k objects from R with the shortest traveltime from U ’s location. The objectives of SMashQ are to reduce the numberof external Web mapping requests and provide highly accurate query answers.To accomplish these objectives, SMashQ first prunes the objects that are defi-nitely not part of a query answer from R to find a set of candidate objects forthe query answer. Then, three shared execution optimizations, namely, objectgrouping, direction sharing, and user grouping, are designed for SMashQ. Themain idea of the object grouping optimization is to group candidate objects toa set of intersections I. For each intersection I ∈ I, one external Web mappingrequest is issued to retrieve the route and travel time from U to I, and thenthe travel time from I to each of its objects is estimated. To further reduce thenumber of external requests, the direction sharing optimization aims at usingthe route and travel time information from U to an intersection in I to findthat from U to other intersections in I. The user grouping optimization groupsqueries to a set of intersections. For each intersection, its queries are processedat the same time, as they can share the route and travel time information fromthe intersection.

To evaluate the performance of SMashQ, we compare SMashQ with twobasic approaches using a real road network, two real-world data sets, syn-thetic data sets, and Microsoft Bing Maps [19]. Experimental results showthat SMashQ outperforms the basic approaches in terms of the number ofexternal Web mapping requests and it provides highly accurate travel timeestimation and k-NN query answers.

The remainder of this paper is organized as follows. Section 2 highlightsrelated work. Section 3 describes the system model. Section 4 presents anexact algorithm. Section 5 describes the three shared execution optimizationsfor SMashQ. Experimental results are analyzed in Section 6. Finally, Section 7concludes this paper.

2 Related Work

In this section, we first highlight related work in the areas of spatial queries indistributed systems and shared execution for spatial queries. Then, we focuson the related work in the areas of spatial queries in road networks and spatialqueries with external data or services.

Spatial queries in distributed systems. Location-based query pro-cessing algorithms have been designed for various distributed systems, e.g.,client-server systems [20, 23], peer-to-peer systems [26], mobile peer-to-peersystems [4,34], and wireless sensor networks [8,31]. In the client-server systems,a centralized database server stores all objects and employs query execution


optimizations to process location-based queries issued by mobile users [20,23].A distributed spatial index structure has been designed for the peer-to-peersystem to process location-based queries [26]. When a distributed server doesnot have enough spatial object information to process a location-based query,it uses the spatial index structure to enlist other peer servers that store therequired information for help. Mobile peer-to-peer systems enable users tocache their nearby spatial objects to process location-based queries withoutenlisting any server for help [4,34]. Wireless sensor networks (WSNs) are morechallenging than peer-to-peer systems, because WSNs have many limitations(e.g., limited storage and computational capacity, scarce battery power, andrestricted communication ranges and bandwidth). Spatial query processing al-gorithms designed for WSNs have to be optimized for such limitations [8,31].

Shared execution for spatial queries. Shared execution techniquesare one of the widely used query execution optimizations for spatial queries. Ashared execution scheme has been used to enable a system to scale up to a largenumber of concurrent continuous range and k-nearest-neighbor queries [20,21].The basic idea is to group similar queries together and then evaluate a groupof queries at the same time through a spatial join between moving objectsand queries. Another shared execution scheme has been designed to optimizethe query execution by clustering moving objects and queries based on theircommon spatial and temporal properties [23]. However, none of existing sharedexecution techniques is designed for sharing direction information retrievedfrom a Web mapping service to process spatial queries.

Spatial queries in road networks. Existing location-based query pro-cessing algorithms can be categorized into two main classes: location-basedqueries in time-independent road networks and in time-dependent road net-works. (1) Time-independent road networks. In this class, location-basedquery processing algorithms assume that the cost or weight of a road segment,which can be in terms of distance or travel time, is constant (e.g., [14,24,25]).These algorithms mainly rely on pre-computed distance or travel time infor-mation of road segments in road networks. However, the actual travel time ofa road segment may vary significantly during different times of the day due todynamic traffic on road segments [6,7]. (2)Time-dependent road networks.The location-based query processing algorithms designed for time-dependentroad networks have the ability to support dynamic weights of road segmentsand topology of a road network, which can change with time. Location-basedqueries in time-dependent road networks are more realistic but also more chal-lenging. George et al. [10] proposed a time-aggregated graph, which uses timeseries to represent time-varying attributes. The time-aggregated graph can beused to compute the shortest path for a given start time or to find the beststart time for a path that leads to the shortest travel time. In [6,7], Demiryureket al. proposed solutions for processing k-NN queries in time-dependent roadnetworks where the weight of each road segment is a function of time.

Spatial queries with external data or services. In this paper, we focuson k-NN queries in time-dependent road networks. This paper distinguishesitself from previous work [6, 7, 10] that it does not model the underlying road


network based on different criteria. Instead, it employs third party Web map-ping services, e.g., Microsoft Bing Maps, to compute the travel time of roadnetworks and provide the route and travel time information through server-side spatial mashups. Since the use of external Web mapping requests is moreexpensive than accessing local data [18], we propose three shared executionoptimizations for SMashQ to reduce the number of such external requests andprovide highly accurate query answers.

There are some query processing algorithms designed to deal with expen-sive attributes that are accessed from external Web services (e.g., [1, 2, 18]).To minimize the number of external requests, these algorithms mainly focusedon using either some cheap attributes that can be retrieved from local datasources [1, 18] or sampling methods [2] to prune a whole set of objects intoa smaller set of candidate objects, and they only issue absolutely necessaryexternal requests for candidate objects. The closest work to this paper is [18],where Levandoski et al. developed a framework for processing spatial skylinequeries by pruning candidate objects that are guaranteed not to be part of aquery answer and only issuing an external request for each remaining candi-date object to retrieve the travel time from the user to the object. However,all previous work did not study how to use shared execution techniques andthe road network topology to reduce the number of external requests that isexactly the problem that we solved in this paper.

3 System Model

In this section, we describe our system architecture, road network model, andproblem definition. Figure 2 depicts the system architecture of SMashQ thatconsists of three entities, users, a database server, and a Web mapping serviceprovider. Users send k-NN queries to the database server at a LBS providerthrough their mobile devices at anywhere, anytime. The database server pro-cesses queries based on local data (e.g., the location and basic informationof restaurants) and external data (i.e., route and travel time information) ac-cessed from the Web mapping service provider. In general, accessing externaldata is much more expensive than accessing internal data [18].

A road map is modeled as a graph G = (V,E) comprising a set V ofvertices with a set E of edges, where each road segment is an edge and eachintersection of road segments is a vertex. For instance, Figure 3a depicts a

1: Location-based queries

Database Server(e.g., restaurant data)

(Low Cost Local Data)

(High Cost External Data)3: Route and travel time

2: Web mapping requests

Web Mapping Service Provider

SMashQUsers

4: Answers

Fig. 2 System architecture.


(a) A road map

I15

I14

I12

I11

I8

I7

I10

I6

I13

I9

I5

I1

I2

I4

Road segment Intersection

I3

(b) A graph model

Fig. 3 Road network model.

real road map that is modeled as an undirected graph (Figure 3b), where anedge represents a road segment (e.g., I1I2 and I1I5) and a square representsan intersection of road segments (e.g., I1 and I2).

Our problem can be defined as follows. Given a set of objects R and a NNquery Q = (λ, k) from a user U , where λ is U ’s location, and k is the maximumnumber of returned objects, SMashQ returns U at most k objects in R withthe shortest travel time from λ based on the route and travel time informationaccessed from a Web mapping service provider, e.g., Microsoft Bing Maps [19].Since the access to the Web mapping service is expensive, our objectives areto reduce the number of external Web mapping requests and provide highlyaccurate a k-NN query answer.

4 Exact Algorithm

In this section, we present an exact algorithm that produces accurate answersfor k-NN queries. Since there could be a very large number of objects in aspatial data set, it is extremely inefficient to issue an external Web mappingrequest to retrieve the route and travel time from a user to each object in thedata set. To this end, the exact algorithm employs the existing network ex-pansion technique designed for processing range queries in a road network [24]and existing pruning techniques designed for query processing with externaldata [1,2,18] to minimize the number of external Web mapping requests. Givenan underlying road network modeled as a graph G = (V,E), a spatial objectset R, a query Q = (λ, k) from a user U , and U ’s maximum movement speedU.max speed, the exact algorithm performs an incremental search from U tofind an accurate k-NN query answer.


Algorithm 1 Exact Algorithm1: input: G = (V,E), R, Q = (λ, k), U.max speed2: Initialize a current answer set A, a node set S, and a priority queue Q3: distmax ←∞; Tmax ←∞4: Ni ← the nearest node from U5: NDist← Dist(U,Ni)6: Q.enqueue(Ni, NDist); S ← {Ni}7: while Q is not empty do8: (Ni, NDist)← Q.dequeue()9: if A contains k objects and NDist > distmax then10: break11: end if12: if Ni is an object (i.e., Ni ∈ R) then13: Issue an external Web mapping request to retrieve T ime(U → Ni)14: if A contains less than k objects then15: A ← A∪ {Ni}16: if A contains k objects then17: Tmax ← the largest travel time of the objects in A18: distmax ← U.max speed× Tmax

19: end if20: else if Time(U → Ni) < Tmax then21: Replace the object with Tmax with Ni in A22: Tmax ← the largest travel time of the objects in A23: distmax ← U.max speed× Tmax

24: end if25: end if26: for Each neighbor node Nj of Ni, where Nj /∈ S do27: Q.enqueue(Nj , NDist+Dist(Ni, Nj)); S ← S ∪ {Nj}28: end for29: end while30: return A

Algorithm 1 depicts the pseudo code of the exact algorithm. It considersthe intersections and objects in the graph as two types of nodes that will beincrementally inserted into a priority queue Q. Each entry in Q is defined as(Ni, NDist), where Ni is a node identifier and NDist is the network distancefrom U to Ni.

When we dequeue Q, the node with the smallest distance is returned andremoved from Q. The dequeue operation of Q returns the node with the small-est network distance from U and removes it from Q. Without loss of generality,we assume that U can only U-turn at an intersection. Initially, U ’s nearest nodeNi with their distance Dist(U,Ni) is enqueued to Q (Lines 4 to 6 in Algo-rithm 1). For each iteration, a node (Ni, NDist) is dequeued from Q. Ni isprocessed based on the following two cases:

Case 1: Ni is an intersection. For each neighbor node Nj of Ni thathas not been enqueued to Q, Nj is enqueued to Q along with the networkdistance from U to Nj , i.e., Dist(Ni, Nj) +NDist, where Dist(Ni, Nj) is thedistance from Ni to Nj (Lines 26 to 28 in Algorithm 1).

Case 2: Ni is an object. An external Web mapping request is issuedto retrieve the travel time from U to Ni, Time(U → Ni). If a current an-swer A contains less than k objects, Ni is inserted into A (Lines 15 to 18


Object

I15I14

I12I11

I8I7

I10

I6

I13

I9

I5

I1 I2 I4

R1

I3

R2

R10

R9

R8

R7

R6R5

R4

R3U

User Intersection

2

1

3

3

1

1

5

2

1

3

3

2

2

1

2

1

2

3

3

3 3 3

1 2

2 132 1

3 6

3

Fig. 4 An example for the exact algorithm.

in Algorithm 1). If Ni is the k-th object inserted into A, we set the largesttravel time of the objects in A to Tmax. In case that A contains k objectsand Time(U → Ni) is less than the largest travel time of the objects in A(i.e., Tmax), the object with Tmax is replaced with Ni and Tmax is updatedaccordingly (Lines 21 to 23 in Algorithm 1). Then, the neighbor nodes of Ni

are processed as in Case 1.

The algorithm repeatedly dequeues nodes from Q until an accurate answeris found. Whenever Tmax is updated, we calculate the maximum possible net-work distance distmax that U can travel within the time interval Tmax (i.e.,distmax = U.max speed × Tmax). For each iteration, if the network distanceof the node dequeued from Q is larger than distmax, the algorithm terminatesand returns the current answer A to U (Line 9 in Algorithm 1). This is be-cause the travel time from U to any unprocessed object is larger than Tmax,and thus, it cannot be part of A.

Example. Figure 4 shows a running example for the exact algorithm,where a user U issues a query Q = (λ, k = 2), U.λ is represented by a triangle,10 objects (i.e., R = {R1, . . . , R10}) are represented by circles, the distanceof each edge is indicated by a number beside it, and the maximum movementspeed U.max speed is 25 m/s (i.e., 90 km/h). Since U is moving towards I6 andthe distance from U to I6 is 1 km, (I6, 1) is enqueued to an empty Q. Figure 5depicts the 12 iterations performed by the algorithm to find an accurate k-NNquery answer.

1. The algorithm dequeues (I6, 1) from Q. Since I6 is an intersection, itsneighbor nodes (i.e., (R5, 4), (I5, 4), and (I10, 4)) are enqueued to Q.

2. (R5, 4) is dequeued. SinceR5 is an object, an external Web mapping requestis issued to get the travel time from U to R5, Time(U → R5) = 350seconds. Since the current answer A is empty, R5 is simply inserted into


Node Ni Time(U®Ni) Priority Queue CurrentAnswer A Tmax distmax

- - (I6,1) - ¥ ¥

(I6,1) - (R5,4), (I5,4), (I10,4) - ¥ ¥

(R5,4) 350 s (I5,4), (I10,4), (R4,5) R5 ¥ ¥

(I5,4) - (I10,4), (R4,5), (R2,6), (I1,9) R5 ¥ ¥

(I10,4) - (R4,5), (R1,5), (R2,6), (I11,7), (I14,7), (I1,9) R5 ¥ ¥

(R4,5) 500 s (R1,5), (R2,6), (I2,6), (I11,7), (I14,7), (I1,9) R5, R4 500 s 12.5 km

(R2,6) 200 s (I2,6), (I11,7), (I14,7), (I9,7), (I1,9) R5, R2 350 s 8.75 km

(R1,5) 400 s (R2,6), (I2,6), (I11,7), (I14,7), (I9,7), (I1,9) R5, R1 400 s 10 km

(I2,6) - (I11,7), (I14,7), (I9,7), (I1,9), (I3,9) R5, R2 350 s 8.75 km

(I11,7) - (I14,7), (I9,7), (R9,8), (I1,9), (I3,9), (R10,9) R5, R2 350 s 8.75 km

(I14,7) - (I9,7), (R9,8), (I1,9), (I3,9), (R10,9), (I13,10), (I15,13) R5, R2 350 s 8.75 km

(I9,7) - (R9,8), (I1,9), (I3,9), (R10,9), (I13,10), (I15,13) R5, R2 350 s 8.75 km

(R9,8) 600 s (I1,9), (I3,9), (R10,9), (I13,10), (R8,10), (I15,13) R5, R2 350 s 8.75 km

Fig. 5 The intermediate steps of the exact algorithm.

A. R4 is the only neighbor of R5 that has not been processed by thealgorithm, so (R4, 5) is enqueued to Q.

3. (I5, 4) is dequeued fromQ and two of its neighbor nodes, (R2, 6) and (I1, 9),are enqueued to Q.

4. This iteration dequeues another intersection (I10, 4) from Q and enqueuesthree of its neighbor nodes (R1, 5), (I11, 7), and (I14, 7) to Q.

5. The algorithm dequeues (R4, 5) from Q and issues an external request forR4 to get Time(U → R4) = 500 seconds. Since A contains less than k = 2objects, R4 is inserted into A. Then, Tmax is set to the largest travel timeof the objects in A, i.e., 500 seconds. As U.max speed = 25 m/s, distmax

is set to 25m/s × 500s=12.5 km. One of R4’s neighbor nodes, i.e., (I2, 6),is enqueued to Q.

6. Since (R1, 5) is an object, an external request is issued for R1. AsTime(U → R1) = 400 is less than Tmax, R4 is replaced with R1. Then,Tmax and distmax are updated to 400 seconds and 25 × 400 = 10 km, re-spectively. One of R1’s neighbor node, I9, has not been processed by thealgorithm, so (I9, 7) is enqueued to Q.

7. (R2, 6) is dequeued from Q and an external request is issued. SinceTime(U → R2) = 200 is less than Tmax, R1 is replaced with R2 in A. Then,Tmax and distmax are updated to 350 seconds and 25 × 350 = 8.75 km,respectively. As all the neighbor nodes of R2 have been processed by thealgorithm, no node is enqueued to Q.

8. (I2, 6) is dequeued from Q. (I3, 9) is enqueued to Q.9. (I11, 7) is dequeued from Q. (R9, 8) and (R10, 9) are enqueued to Q.10. (I14, 7) is dequeued from Q. (I13, 10) and (I15, 13) are enqueued to Q.11. (I9, 7) is dequeued from Q and no node is needed to be enqueued to Q.12. The algorithm dequeues (R9, 8) from Q and issues an external request for

R9 to get Time(U → R9) = 600. Since Time(U → R9) is larger than


Tmax = 350, no update is needed for A. One of R9’s neighbor node, i.e.,(R8, 10), is enqueued to Q.

After the above 12 iterations, (I1, 9) is dequeued from Q. SinceNDist(U, I1) = 9 is larger than distmax = 8.75, the algorithm terminatesand returns A = {R5, R2} to U as a query answer.

5 SMashQ

Although the exact algorithm can prune a large number of objects to reducethe number of external Web mapping requests, if the object density in thevicinity of a querying user is very high, the database server would still need toissue a large number of external requests. To this end, we propose three sharedexecution optimizations, namely, object grouping, direction sharing, and usergrouping, for SMashQ based on the road network topology to further reducethe number of external requests. This section is organized as follows: Sec-tion 5.1 presents the object grouping optimization for SMashQ. Section 5.2extends SMashQ to support the direction sharing optimization. Section 5.3discusses the user grouping optimization that further improves SMashQ. Fi-nally, Section 5.4 analyzes the performance of our algorithms.

5.1 Object Grouping Optimization

We observe that many spatial objects are generally located in clusters in realworld. For example, many restaurants are located in a downtown area (Fig-ure 3a). Thus, it makes sense to group nearby objects to a representative point.The database server issues only one external request to the Web mapping ser-vice for an object group. Then, it shares the retrieved travel time and routeinformation from a user to the representative point among the objects in thegroup to estimate the travel time from the user to each object.

There are two challenges in grouping objects: (a) How to select representa-tive points in a road network? and (b) How to group objects to representativepoints? Our solution is to select intersections as representative points in a roadnetwork and group objects to them. The reason is twofold: (1) Since there isonly one possible path from the intersection of a group G to each object in G,it is easy to estimate the travel time from the intersection to each object inG. (2) A road segment should have the same conditions, e.g., speed limit anddirection. The estimation of travel time would be more nature and accurateby considering a road segment as a basic unit [16].

Figure 6 gives an example for object grouping where objects R8, R9 andR10 are grouped together by intersection I11. The database server only needsone external Web mapping request to retrieve the route information and traveltime from U and I11, and then it estimates the travel time from I11 to each ofits group members.


Object

I15

I14

I12

I11

I8

I7

I10

I6

I13

I9

I5

I1

I2

I4

I3

R2

R10

R9

R8

R7

R6R

5

R4

R3U

User Intersection

Intersection of an object group

R1

Fig. 6 Object grouping optimization.

5.1.1 Greedy Object Grouping Algorithm

To minimize the number of external Web mapping requests, we should find aminimal set of intersections that cover all road segments having target objects.Given a k-NN query issued by a user U and a set R of candidate objects, whichis computed by Algorithm 2, a subgraph Gu = (Vu, Eu) is extracted from thegraph G of the underlying road network model, where Eu is an edge set whereeach edge contains at least one candidate object inR and Vu is a vertex set thatconsists of the intersections of the edges in Eu. Algorithm 2 is similar to theexact algorithm (Algorithm 1), except that after Algorithm 2 issues k externalWeb mapping requests for the k-nearest objects of U to find an initial answer,and then computes Tmax and distmax for the initial answer (Lines 13 to 18in Algorithm 2). After determining the initial answer, Algorithm 2 does notissue any external Web mapping requests for other candidate objects. Instead,Algorithm 2 adds such candidate objects to a candidate object set R (Line 21in Algorithm 2). The greedy object grouping algorithm returns a vertex setI ⊆ Vu that is a minimal one covering all the edges in Eu. The object groupingproblem is the vertex cover problem [5], which is NP-complete, so we use agreedy algorithm to find I.

Algorithm. Algorithm 3 depicts the pseudo code of a greedy object group-ing algorithm. The input consists of the graph G of the underlying road net-work model and a set R of candidate objects, which are found by Algorithm 2,for a k-NN query issued by a user U . The algorithm first extracts a subgraphGu from G and calculates the number of edges connected to a vertex as thedegree of that vertex (Lines 3 to 4 in Algorithm 3). Then, it selects a vertexv from Vu with the largest degree to I. In case of a tie, the vertex with theshortest distance to U is selected. Each object on each edge connected to v


Algorithm 2 Candidate object search algorithm1: input: G = (V,E), R, Q = (λ, k), U.max speed2: Initialize a current answer set A, a node set S, a priority queue Q, and a candidate

object set R3: distmax ←∞; Tmax ←∞4: Ni ← the nearest node from U5: NDist← Dist(U,Ni)6: Q.enqueue(Ni, NDist); S ← {Ni}7: while Q is not empty do8: (Ni, NDist)← Q.dequeue()9: if A contains k objects and NDist > distmax then10: break11: end if12: if Ni is an object (i.e., Ni ∈ R) then13: if A contains less than k objects then14: Issue an external Web mapping request to retrieve Time(U → Ni)15: A ← A∪ {Ni}16: if A contains k objects then17: Tmax ← the largest travel time of the objects in A18: distmax ← U.max speed× Tmax

19: end if20: else21: R← R∪ {Ni}22: end if23: end if24: for Each neighbor node Nj of Ni, where Nj /∈ S do25: Q.enqueue(Nj , NDist+Dist(Ni, Nj)); S ← S ∪ {Nj}26: end for27: end while28: return R

Algorithm 3 Greedy object grouping algorithm1: input: Graph G = (V,E) and a set of candidate objects R from Algorithm 22: I ← {∅}3: Extract a subgraph Gu = (Vu, Eu) from G = (V,E) such that each edge in Eu contains

some objects in R and Vu contains the intersections of the edges in Eu

4: Calculate the degree of each vertex in Vu

5: while Vu is NOT empty do6: v ← the vertex with the largest degree in Vu

7: Assign each object on each edge connected to v to the object group of v8: Insert v into I and remove v from Vu

9: Remove edges connected to v from Eu

10: Update the degree of each vertex in Vu

11: Remove the vertex without any edge from Vu

12: end while

is assigned to the object group of v. Note that each candidate object is as-signed to one or two object groups. v is removed from Vu and the edges ofv are removed from the Eu. After that, the degrees of the vertices of the re-moved edges are updated accordingly. The selection process is repeated untilVu becomes empty (Lines 5 to 12 in Algorithm 3).

Example. Figures 7 and 8 depict an example for the greedy object group-ing algorithm, where a subgraph Gu = (Vu, Eu) extracted from G, as depicted


I15I14

I12I11

I8I7

I10

I6

I13

I9

I5

I1 I2 I4I3

R1

R2

R10

R9

R8

R6R5

R4

R3U

ObjectUser Intersection

R7

Intersection of a group

Current Answer={R4, R5}

Fig. 7 Example of the object grouping optimization.

I12

I11

I7

I5

I9

d=2

d=1

d=1

d=2

d=1

(a) I = { I11 }

I12

I7

I5

I9

d=2

d=1

d=0

d=0

(b) I = { I11, I9 }

I10

d=1

I10

d=1

I5

d=0

(c) I = { I11, I9 }

I10

d=0

Fig. 8 Example of the greedy object grouping algorithm.

in Figure 8a. Note that since R4 and R5 is the current answer A, which isfound by Algorithm 2, the route and travel time from U to R4 and R5 havebeen retrieved from the Web mapping service. Thus, R4 and R5 are not can-didate objects in R = {R1, R2, R8, R9, R10} (Figure 7), which are found byAlgorithm 2. Since edges I5I9, I9I10, I7I11 and I11I12 contain some candidateobjects in R, these four edges constitute Eu and their vertices constitute Vu.Then, the greedy algorithm calculates the degree for each vertex in Vu. Forexample, the degree of I11 is two because two edges I7I11 and I11I12 are con-nected to I11. Figure 8a shows that I11 has the largest degree and is closer toU than I9, I11 is selected to I and the edges connected to I11 are removed fromGu. Figure 8b shows that the degrees of I7 and I12 of the two deleted edgesare updated accordingly. Any vertex in Gu without any edge (i.e., I7 and I12)is removed from Vu. Since I9 has the largest degree, I9 is selected to I andthe edges connected to I9 are removed. Since I5 and I10 have no edge, theyare removed from Vu. Finally, Vu becomes empty, so the greedy algorithm ter-minates and clusters the candidate objects into two object groups (indicatedby dotted rectangles), I9 = {R1, R2} and I11 = {R8, R9, R10}, as indicated inFigure 7.


5.1.2 Travel Time Estimation for Grouped Objects

Most Web mapping service providers, e.g., Microsoft Bing Maps, returnturn-by-turn route information for a request. For example, if the databaseserver sends a request to the Web mapping service to retrieve the routeand travel time information from user U to intersection I9 in Figure 7,the service returns the turn-by-turn route information, i.e., Route(U →I9) = {⟨Dist(U → I6), T ime(U → I6)⟩, ⟨Dist(I6 → I5), T ime(I6 → I5)⟩,⟨Dist(I5 → I9), T ime(I5 → I9)⟩}, where Dist(A → B) and Time(A → B) arethe distance and travel time from location A to location B, respectively.

If a group contains only one object, the database server simply issues oneexternal request to retrieve the route information from the user to that object.Otherwise, our algorithm performs two steps to estimate the travel time foreach object and select the best route (if appropriate) as follows.

Step 1: Travel time calculation. Based on whether a candidate objectis on the last road segment of a retrieved route, we can distinguish two cases.

Case 1: An object is on the last road segment. In this case, we assumethat the speed of the last road segment is constant. For example, given theinformation of the route from U to I9 (i.e., Route(U → I9)) retrieved from theWeb mapping service, object R2 is on the last road segment of the route, i.e.,I5I9. Hence, the travel time from U to R2 is calculated as: Time(U → R2) =

Time(U → I9)− Time(I5 → I9)× Dist(R2→I9)Dist(I5→I9)

.

Case 2: An object is NOT on the last road segment. In this case, a candi-date object is not on a retrieved route, i.e., the object is on a road segment Sconnected to the last road segment S′ of the route. We assume that the travelspeed of S is the same as that of S′. For example, given the retrieved informa-tion of the route from U to I9 (i.e., Route(U → I9)), object R1 is not on the lastroad segment of the route, i.e., I5I9. Hence, the travel time from U to R1 is cal-

culated as: Time(U → R1) = Time(U → I9) + Time(I5 → I9)× Dist(I9→R1)Dist(I5→I9)

.

Step 2: Route selection. The greedy object grouping algorithm mayassign a candidate object to two object groups. In this case, the databaseserver can find the routes from U to the object via each of the intersections.This step selects the route with the shortest travel time. For example, if boththe intersections of edge I9I10, i.e., I9 and I10, are selected (Figure 7), the traveltime from user U to R1 is selected as Time(U → R1) = min(Time(U → I9 →R1), T ime(U → I10 → R1)).

5.1.3 Algorithm

Algorithm 4 depicts the pseudo code of SMashQ with the object groupingoptimization. It first executes Algorithm 2 to find a candidate object set R,a current answer A, the maximum travel time Tmax of the objects in A, andthe maximum travel distance distmax for a querying user U moving at itsmaximum possible speed (i.e., U.max speed) for time period Tmax (Line 2 inAlgorithm 4). Then, it executes the greedy object grouping algorithm (i.e., Al-gorithm 3) to find an intersection set I. The intersections in I are sorted based


Algorithm 4 SMashQ with the object grouping optimization1: input: G = (V,E), R, Q = (λ, k), U.max speed2: Execute Algorithm 2 to find a set of candidate objects R, a current answer A, Tmax

and distmax

3: Execute the Greedy Object Grouping Algorithm to get a set of intersections I (Algo-rithm 3)

4: I is sorted according to the intersection’s network distance from U in non-decreasingorder

5: while I is not empty do6: Ii ← the intersection in I with the shortest network distance from U7: Issue a Web mapping request to retrieve Route(U → Ii) and T ime(U → Ii)8: for Each object Rj in the object group of Ii do9: Estimate the time from U to Rj , i.e., T ime(U → Rj)10: if Time(U → Rj) < Tmax then11: Replace the object with Tmax in A with Rj

12: Tmax ← max{Time(U → Rl)|Rl ∈ A}13: distmax ← U.max speed× Tmax

14: for Each object Rk ∈ R with the shortest network distance from U > distmax

do15: Remove Rk from R and its associated object group(s)16: end for17: end if18: end for19: Remove Ii and the intersection with an empty object group from I20: end while21: return A

on their shortest network distance from U in non-decreasing order (Line 4 inAlgorithm 4). The algorithm repeatedly processes the intersection with theshortest network distance Ii ∈ I from U because Ii has a higher chance tofind objects that would be part of a query answer and prune more objectsin R. For each Ii, the algorithm issues an external Web mapping requestto retrieve the route and travel time information from U to Ii, denoted asRoute(U → Ii) and Time(U → Ii), respectively (Line 7 in Algorithm 4). Foreach object Rj in the object group of Ii, the travel time from U to Rj isestimated. If Time(U → Rj) < Tmax, the object with Tmax in A is replacedwith Rj . Tmax and distmax are then updated accordingly. After that, for eachobject Rk ∈ R with the shortest network distance from U larger than distmax

(i.e., distmax = U.max speed×Tmax), Rk is pruned from R and its associatedobject group(s) (Lines 14 to 16 in Algorithm 4). Ii and the intersection withan empty object group are removed from I. This pruning process is repeateduntil I becomes empty (Lines 5 to 20 in Algorithm 4).

In the running example, the exact algorithm has to issue five externalWeb mapping requests, i.e., U → R1, U → R2, U → R4, U → R5, andU → R9.. With the object grouping optimization, SMashQ only needs to issuefour requests, i.e., U → R4, U → R5, U → I9, and U → I11.


5.2 Direction Sharing

As described in Section 5.1, the object grouping optimization can effectivelyreduce the number of external Web mapping requests. To further reduce thenumber of requests, we design the direction sharing optimization to furtherboost shared execution in SMashQ. This section is organized as follows. Wepresent the main idea and challenges of the direction sharing optimization (Sec-tion 5.2.1), its technical details (Section 5.2.2), and the algorithm of SMashQwith the object grouping and direction sharing optimizations (Section 5.2.3).

5.2.1 Main Idea and Challenges

Most Web mapping service providers will return the route Route(A → B) withthe shortest travel time, the turn-by-turn route and travel time information fora request from location A to location B. Let X be an intermediate intersectionin Route(A → B), i.e., T = Route(A . . . → X → . . . B). We observe that everyprefix in T , i.e., Route(A → X), is also the route with the shortest travel timefrom A to X. The correctness of this observation is shown in the followingtheorem.

Theorem 1 Given a route T = Route(A . . . → X → . . . B), where X is anintermediate intersection in T , with the shortest travel time from location Ato location B, every prefix in T (i.e., Route(A → X)) is also the route withthe shortest travel time from location A to intersection X.

Proof A simple cut-and-paste argument is used to prove this theorem [5].Suppose that there is a route T ′ = Route(A . . . → Y → . . . X), where Y /∈ T ,with a travel time shorter than the prefix Route(A → X) in T , then we couldcut Route(A → X) out of T and paste in T ′, resulting in a route with a shortertravel time than T , thus contradicting the optimality of T . ⊓⊔

Based on Theorem 1, given a set of intersections I computed by the greedyobject grouping algorithm for a k-NN query issued from a user U , we shouldfirst process the intersection Ii ∈ I such that the direction information fromU to Ii can be shared with the largest subset I ′ of other intersections in I(i.e., Ii has the highest sharing ability). The sharing ability of an intersection

Ii ∈ I is measured as |Set(Route(U→Ii))∩I||I| , where Set(Route(U → Ii)) is a set

of intersections in Route(U → Ii). The direction information from U to Ii willbe used to determine the route and travel time from U to each intersection inI ′ without issuing additional external Web mapping requests.

Consider an example that the greedy object grouping algorithm finds I ={I5, I6, I8, I9, I11, I15} which are represented by solid squares (Figure 9). If weknow that Route(U → I9) passes through I1, I2, I5, I6, and I9, its directioninformation can be shared with intersections I5, I6, and I9, and thus the

sharing ability of I9 is|{I5,I6,I9}|

6 = 36 = 0.5. In this example, only three external

Web mapping requests are needed, i.e., U → I8, U → I9, and U → I15, becausetheir direction information can be shared with all other intersections in I.


I15I14

I12I11

I8I7

I10

I6

I13

I9

I5

I1 I2 I4I3

U

User Intersection

Intersection of an object group

Route(U I9)

Route(U I8)

Route(U I11)

Fig. 9 Direction sharing optimization.

Since the traffic condition would be very dynamic, as discussed in Section 1,we could not accurately predict the route between two locations. We will de-scribe a histogram approach (Section 5.2.2) that is employed to estimate thesharing ability for an intersection based on the historical route informationretrieved from the Web mapping service.

5.2.2 Histogram Approach

In this section, we describe a histogram approach that is used to maintainhistorical route information and provide a heuristic measure of sharing abilityfor an intersection. Since querying users are mobile users, it is impossibleto store an entry from every possible location to an intersection. Coupledwith the fact that the sharing ability measure is not based on the queryinguser’s location, the histogram approach only maintains the historical routeinformation from one intersection to another one. Given a querying user U whois moving towards an intersection Iu, Iu is considered as a starting locationfor the sharing ability measure.

The histogram approach basically maintains a two-dimensional matrixH inwhich an entry H[Ii, Ij ] (1 ≤ i, j ≤ |V |, where V is the number of intersectionsin the graph G of the road network model) stores the number of intersectionsfrom intersection Ii to intersection Ij , inclusive of Ii and Ij , based on the latestroute information from Ii to Ij . The rationale behind the histogram approachis that H[Ii, Ij ] is used to estimate the sharing ability from Ii to Ij becausea route with more intersections has a higher chance to be shared with otherintersections. Initially, if Ii = Ij , H[Ii, Ij ] = 1; otherwise, H[Ii, Ij ] = −1.

Given a route from Iu to Id, we can distinguish two cases for the sharingability measure based on the value of H[Iu, Id]:

Case 1: H[Iu, Id] > 0. The sharing ability of Id is simply equal to H[Iu, Id].


Algorithm 5 SMashQ with the object grouping and direction sharing opti-mizations1: input: G = (V,E), R, Q = (λ, k), U.max speed, H2: Execute Algorithm 2 to find a set of candidate objects R, a current answer A, Tmax

and distmax

3: Execute the Greedy Object Grouping Algorithm to get a set of intersections I (Algo-rithm 3)

4: I is sorted according to the intersection’s sharing ability in non-decreasing order5: while I is not empty do6: Ih ← the intersection in I with the highest sharing ability7: Issue a Web mapping request to retrieve Route(U → Ih) and Time(U → Ih)8: I′ ← the intersections included in Set(Route(U → Ih)) ∩ I9: for Each intersection Ii ∈ I′ do10: for Each object Rj in the object group of Ii do11: Estimate the time from U to Rj , i.e., Time(U → Rj)12: if T ime(U → Rj) < Tmax then13: Replace the object with Tmax in A with Rj

14: Tmax ← max{Time(U → Rl)|Rl ∈ A}15: distmax ← U.max speed× Tmax

16: for Each object Rk ∈ R with the shortest network distance from U >distmax do

17: Remove Rk from R and its associated object group(s)18: end for19: end if20: end for21: end for22: Remove Ii, the intersection with an empty object group, and I′ from I23: Update H based on Route(U → Ih) accordingly24: end while25: return A

Case 2: H[Iu, Id] = −1. Since the histogram has no historical route informa-tion from Iu to Id, we employ a shortest path algorithm [5] (e.g., Dijkstra’salgorithm and A∗ algorithm) to find the shortest network path from Iu andId. Then, H[Iu, Id] and the sharing ability of Id are set to the number ofintersections in the shortest network path, inclusive of Iu and Id.After SMashQ receives the route and travel time information from U toan selected intersection Id (i.e., Route(U → Id) and Time(U → Id)),it updates the entry in H corresponds to the route from the first inter-section to Id in Route(U → Id) and every entry in H corresponds toeach prefix route in Route(U → Id). For example, the database server re-ceives Route(U → I9) = {⟨Dist(U → I6), T ime(U → I6)⟩, ⟨Dist(I6 →I2), T ime(I6 → I2)⟩, ⟨Dist(I2 → I1), T ime(I2 → I1)⟩, ⟨Dist(I1 →I5), T ime(I1 → I5)⟩, ⟨Dist(I5 → I9), T ime(I5 → I9)⟩} from the Web map-ping service provider, it will update H[I6, I2] = 2, H[I6, I1] = 3, H[I6, I5] = 4,and H[I6, I9] = 5.

5.2.3 Algorithm

Algorithm 5 depicts the pseudo code of SMashQ with the object group-ing and direction sharing optimizations. Similar to Algorithm 4, Algorithm 5


executes Algorithm 2 to find a set R of candidate objects, a current answerA, the maximum travel time Tmax of the objects in A, and the maximumtravel distance distmax for a querying user U moving at its maximum possi-ble speed (i.e., U.max speed) for time period Tmax (Line 2 in Algorithm 5).Then, Algorithm 3 is executed to find a set of intersections I. There are threemodifications in the pruning process for the direction sharing optimization.

The first modification is that the intersections in I are sorted by their shar-ing ability in non-decreasing order, instead of their shortest network distancefrom U in Algorithm 4 (Line 4 in Algorithm 5), and the algorithm repeatedlyprocesses the intersection Ih with the highest sharing ability (Line 6 in Algo-rithm 5). Since the route and travel time information retrieved for a requestfrom U to Ih can also be shared with the prefix intersections included in I, thesecond modification is to use such information to process the object groups ofa set intersections I ′ included in both the set of intersection in Route(U → Ih)and I (i.e., I ′ = Set(Route(U → Ih))∩I) (Lines 8 to 22 in Algorithm 5). Thethird modification is to update the histogram H based on the retrieved routeinformation, i.e., Route(U → Ih), accordingly.

5.3 User Grouping

To further reduce the number of external Web mapping requests for a databaseserver with a high workload, e.g., a large number of users or continuous queries,we design another shared execution for grouping users. Similar to object group-ing, we consider intersections in the road network as representative points. Thedatabase server only needs to issue one external request from the intersectionof a user group Gu to the intersection of an object group Go to estimate thetravel time from each user Ui in Gu to each object Rj in Go.

The key difference between user grouping optimization and object groupingoptimization is that users are moving. Grouping users has to consider theirmovement direction, so a user is grouped to the nearest intersection to whichthe user is moving. Figure 10 depicts an example, where user A on edge I9I10is moving towards I10, so A is grouped to the user group of I10. Similarly,users D and E are also grouped to the user group of I10. Users C and B onedges I1I2 and I6I2, respectively, are both moving to I2, so they are groupedto the user group of I2.

After having grouped users, we process queries in terms of user groupsinstead of users. The database server finds an overall query answer A for allusers in a user group, and then extract the query answer from A for each userin the group. For example, given a user group Gu and the intersection Iu ofGu, the database server first search the kmax nearest objects to Iu, where kmax

is the largest value of k of the k-NN queries issued by the users in Gu. Then,kmax and Iu are used to replace k and U in the algorithm of SMashQ with theobject grouping optimization (i.e., Algorithm 4) or the algorithm of SMashQwith the object grouping and direction sharing optimizations (Algorithm 5) tofind an overall query answer A for all the users in Gu. For the ki-NN query of


each user Ui in Gu, the database server takes the ki objects with the shortesttravel time in A as Ui’s query answer Ai, and estimates the travel time fromthe user’s location to Iu as in Section 5.1.2 to calculate the travel time from Ui

to each object in Ai. Therefore, each query in Gu can be effectively processed.

5.4 Performance Analysis

We now evaluate the performance of our algorithms. Since accessing traveltime and direction information from an external Web mapping service is moreexpensive than local data processing [18], we compare the performance of ouralgorithms in terms of the number of external Web mapping requests. Let Mand N be the number of querying users and the average number of candidateobjects per query, respectively.

– The exact algorithm needs to issue N ×M external Web mapping queriesto answer all M queries; hence, its cost is CostExact = N ×M

– By using the object grouping optimization, SMashQ can reduce the averagenumber of candidate objects from N to N ′ through grouping objects tointersections, where N ′ ≤ N ; hence, CostO = N ′ ×M .

– If SMashQ employs the object grouping and direction sharing optimizationtechniques, it can further reduce N ′ to N ′′ by sharing the direction infor-mation among intersections, where N ′′ ≤ N ′; hence, CostOD = N ′′ ×M .

– When SMashQ employs all the optimization techniques (i.e., object group-ing, direction sharing, and user grouping), it can reduce M to M ′ by group-ing users to intersections, where M ′ ≤ M ; hence, CostODU = N ′′ ×M ′.

I15I14

I12I11

I8I7

I10

I6

I13

I9

I5

I1 I2 I4I3

R1

R2

R10

R9

R8

R6R5

R4

R3

A

ObjectUser Intersection

R7

Intersection of a user group

B

C

D

E

Fig. 10 User grouping optimization.


In general, N ′′ ≪ N ′ ≪ N and M ′ ≪ M , so CostExact ≪ CostO ≪CostOD ≪ CostODU . We will verify our performance analysis through theexperiments in Section 6.

6 Performance Evaluation

In this section, we first give our evaluation model in Section 6.1, including abasic approach, performance metrics, and default experiment settings. Then,we analyze the experiment result in Section 6.2.

6.1 Evaluation Model

Basic approach. To our best knowledge, there is no other existing algorithmsabout k-NN queries using spatial mashups in time-dependent road networks.We design a basic approach (denoted as BasicExact), as proposed in our previ-ous work [33]. The basic idea of BasicExact is to issue external Web mappingrequests for the k-nearest objects to find an initial answer, calculate distmax

based on the largest travel time in the initial answer and the user’s maximummovement speed, and find other objects within the network distance distmax

from the user as a candidate object set R for a query answer. Then, BasicEx-act issues one external Web mapping request from the querying user to eachcandidate object in R to get an exact k-NN query answer. We also considerthe exact algorithm as another basic approach (denoted as Exact) to evaluatethe performance of our SMashQ with only the object grouping and directionsharing optimizations (denoted as SMashQ-OD) and our SMashQ with all thethree shared execution optimizations (i.e., object grouping, direction sharing,and user grouping) (denoted as SMashQ-ODU).

Performance metrics. We evaluate the performance of the basic ap-proaches and our SMashQ algorithms in terms of four metrics. (1) The aver-age number of external Web mapping requests per query and (2) the averageresponse time per query are metrics evaluating the effectiveness of our algo-rithms. The response time of a query is defined as the sum of the local CPUprocessing time and the response time from Microsoft Bing Maps (includ-ing the communication cost and the CPU processing time at Microsoft BingMaps). It is calculated as the sum of its local CPU query processing timeand the multiplication of the number of external requests and the average re-sponse time per external request. From our experimental results, the averageresponse time per external request from Microsoft Bing Maps is 14.3 millisec-onds. (3) The accuracy of estimated travel time (denoted as Acc T ime) is ametric computed by:

Acc T ime = 1−min

(|T − T |

T, 1

), (1)


(a) The restaurant data (b) The hotel data

Fig. 11 Two real data sets.

where T and T are the actual travel time (retrieved by the exact algorithm)and the estimated one, respectively, and 0 ≤ Acc T ime ≤ 1. (4) The accuracyof k-NN query answers (denoted as Acc Ans) evaluates the accuracy returnedby our SMashQ algorithms is calculated by:

Acc Ans =|A ∩A||A|

, (2)

where A is an exact query answer returned by the exact algorithm and A is aquery answer returned by our SMashQ algorithms and 0 ≤ Acc Ans ≤ 1.

Experiment settings. We evaluated the performance of the basic ap-proaches (i.e., BasicExact and Exact) and the SMashQ algorithms (i.e.,SMashQ-OD and SMashQ-ODU) using C++ and Microsoft Bing Maps [19]with a real road network of Hennepin County, MN, USA [29]. We select an areaof 8×8 km2 that contains 6,109 road segments and 3,593 intersections, and thelatitude and longitude of its left-bottom and right-top corners are (44.898441,-93.302791) and (44.970094, -93.204015), respectively. We retrieved two realdata sets within the selected area from Google Maps, i.e., a restaurant dataset of 1,835 restaurants and a hotel data set of 363 hotels, as depicted in Fig-ures 11a and 11c, respectively. Unless mentioned otherwise, we used the realrestaurant data set and uniformly generated 5,000 querying users in the roadnetwork for all the experiments, and the default requested number of nearestobjects of k-NN queries (i.e., the value of k) is 10. The maximum allowedspeed is 90 km per hour.


0

200

400

600

800

1000

1200

1400

1 10 20 30 40 50

Ave

rage

No.

of E

xter

nal R

eque

sts

Requested Number of Nearest Objects (k)

BasicExactExact

SMashQ-ODSMashQ-ODU

(a) Number of external requests

0

2

4

6

8

10

12

14

16

18

20

1 10 20 30 40 50

Ave

rage

Que

ry R

espo

nse

Tim

e (s

)


BasicExactExact

SMashQ-ODSMashQ-ODU

(b) Average query response time

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

1 10 20 30 40 50

Acc

urac

y of

Est

imat

ed T

rave

l Tim

e


SMashQ-ODSMashQ-ODU

(c) Accuracy of travel time

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

1 10 20 30 40 50

Acc

urac

y of

Que

ry A

nsw

ers


SMashQ-ODSMashQ-ODU

(d) Accuracy of query answers

Fig. 12 Effect of the Requested Number of Nearest Objects (k).

6.2 Experiment Results

We evaluated the scalability and efficiency of the basic approaches (i.e.,BasicExact and Exact) and the SMashQ algorithms (i.e., SMashQ-OD andSMashQ-ODU) with respect to the number of required number of nearest ob-jects, i.e., the value of k (Section 6.2.1), the number of objects (Section 6.2.2),and the number of querying users (Section 6.2.3).

6.2.1 Effect of the Requested Number of Nearest Objects

Figure 12 shows the performance of the basic approaches and our SMashQalgorithms with respect to various numbers of requested number of nearestobjects (i.e., k) from 1 to 50. As depicted in Figure 12a, the SMashQ algo-rithms (i.e., SMashQ-OD and SMashQ-ODU) outperform the basic approachesin terms of the average number of external Web mapping requests. The ex-act algorithm (Exact) reduces the number of external requests by 32% onaverage compared to BasicExact. SMashQ-OD further significantly reduces thenumber of external requests by 62%. With the user grouping optimization,SMashQ-ODU yields the least number of external requests. This is because


0

200

400

600

800

1000

1200

1400

1000 2000 3000 4000 5000

Ave

rage

No.

of E

xter

nal R

eque

sts

Number of Objects

BasicExactExact

SMashQ-ODSMashQ-ODU


0

2

4

6

8

10

12

14

16

18

20

1000 2000 3000 4000 5000

Ave

rage

Que

ry R

espo

nse

Tim

e (s

)

Number of Objects

BasicExactExact

SMashQ-ODSMashQ-ODU


0.84

0.85

0.86

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

1000 2000 3000 4000 5000

Acc

urac

y of

Est

imat

ed T

rave

l Tim

e

Number of Objects

SMashQ-ODSMashQ-ODU


0.85

0.86

0.87

0.88

0.89

0.9

0.91

0.92

1000 2000 3000 4000 5000

Acc

urac

y of

Que

ry A

nsw

ers

Number of Objects

SMashQ-ODSMashQ-ODU


Fig. 13 Effect of the number of objects using synthetic data.

SMashQ-ODU has the ability to process queries from several users at the sametime instead of processing them one by one in SMashQ-OD. It is expected thatwhen k gets larger, there are more candidate objects, and thus the databaseserver needs to issue more external requests. Since processing external requestsis more expensive than local query processing, the query response time has thesame trend as the number of external requests (Figure 12b).

The two basic approaches BasicExact and Exact provide the exact traveltime and k-NN query answers. Conversely, the shared execution optimizationsdesigned for SMashQ require travel time estimation to reduce the number ofexternal requests, so we need to evaluate the accuracy of estimated travel time(measured by Equation 1) and the accuracy of k-NN query answers (measuredby Equation 2) provided by SMashQ. Figures 12c and 12d depict that bothSMashQ-OD and SMashQ-ODU provide highly accurate travel time (at least84% by SMashQ-OD and 83% by SMashQ-ODU) and query answers (at least83% by SMashQ-OD and 82% by SMashQ-ODU), respectively. Both SMashQ-OD and SMashQ-ODU need to estimate the travel time from an intersectionto each object in an object group, but SMashQ-ODU also needs to estimatethe travel time from each querying user in a user group to an intersection.As a result, the accuracy of SMashQ-ODU is slightly worse than SMashQ-


0

100

200

300

400

500

600

363(hotels) 1835(restaurants)

Ave

rage

No.

of E

xter

nal R

eque

sts

Number of Objects

BasicExactExact

SMashQ-ODSMashQ-ODU


0

1

2

3

4

5

6

7

8


Ave

rage

Que

ry R

espo

nse

Tim

e (s

)

Number of Objects

BasicExactExact

SMashQ-ODSMashQ-ODU


0.89

0.9

0.91

0.92

0.93

0.94

0.95

0.96


Acc

urac

y of

Est

imat

ed T

rave

l Tim

e

Number of Objects

SMashQ-ODSMashQ-ODU


0.88

0.89

0.9

0.91

0.92

0.93


Acc

urac

y of

Que

ry A

nsw

ers

Number of Objects

SMashQ-ODSMashQ-ODU


Fig. 14 Effect of the number of objects using real data.

OD. When k increases, the accuracy of our SMashQ algorithms improves. Thereason for this is that the larger k value requires them to process objectsfurther away from the querying user, so the ratio of the distance that weneed to estimate its travel time to the total distance between two locationsdecreases. Thus, the accuracy of the travel time estimation improves, as k getslarger (Figure 12c). The more accurate travel time estimation leads to moreaccurate query answers, as depicted in Figure 12d.

The results of this experiment confirm that our SMashQ algorithms notonly effectively reduce the number of external Web mapping requests, but theyalso provide highly accurate k-NN query answers.

6.2.2 Effect of the Number of Objects

Both synthetic data and real data sets were used to evaluate the performanceof SMashQ with respect to the number of objects.

Synthetic data. In the synthetic data set, the objects are randomly dis-tributed in the road network. Figure 13 depicts the performance of the basicapproaches and our SMashQ algorithms with respect to the increase of thenumber of synthetic objects from 1,000 to 5,000. Figure 13a depicts that our


0

100

200

300

400

500

600

700

800

900

2000 4000 6000 8000 10000

Ave

rage

No.

of E

xter

nal R

eque

sts

Number of Querying Users

BasicExactExact

SMashQ-ODSMashQ-ODU


0

2

4

6

8

10

12

14

2000 4000 6000 8000 10000

Ave

rage

Que

ry R

espo

nse

Tim

e (s

)


BasicExactExact

SMashQ-ODSMashQ-ODU


0.87

0.88

0.89

0.9

0.91

0.92

0.93

2000 4000 6000 8000 10000

Acc

urac

y of

Est

imat

ed T

rave

l Tim

e


SMashQ-ODSMashQ-ODU


0.87

0.88

0.89

0.9

0.91

0.92

0.93

2000 4000 6000 8000 10000

Acc

urac

y of

Que

ry A

nsw

ers


SMashQ-ODSMashQ-ODU


Fig. 15 Effect of the number of querying users.

SMashQ algorithms require much less external requests than the basic ap-proaches, when the number of objects increases, and thus our SMashQ algo-rithms can reduce the query response time (Figure 13b). The accuracy of traveltime estimation and query answers gets worse, when the number of objects in-creases, as depicted in Figures 13c and 13d, respectively. This is because whenthere are more objects, the required search range is smaller; hence, the ratioof the distance that we need to estimate its travel time to the total distancebetween two locations increases. As a result, a small error in the travel timeestimation has a larger impact on the accuracy measure. However, SMashQ-OD and SMashQ-ODU still provide accurate travel time estimation and queryanswers (over 85% accuracy).

Real data. The experimental results of the two real data sets (363 ho-tels and 1,835 restaurants) exhibit the same trend as the synthetic data.The SMashQ algorithms perform much better than the basic approaches inboth the data sets, when there are more objects (Figures 14a and 14b). TheSMashQ algorithms also achieve high accuracy for travel time estimation andquery answers. The accuracy of travel time estimation is over 90% and theaccuracy of query answer is over 88%, as depicted in Figures 14c and 14d,respectively.


The experimental results of both the synthetic and real data sets confirmthat our SMashQ algorithms are able to scale up to a large number of objects.

6.2.3 Effect of the Number of Querying Users

Figure 15 depicts the performance of the basic approaches and our SMashQalgorithms with respect to the increase of the number of querying users from2,000 to 10,000. The number of querying users only slightly affects the per-formance of the basic approaches and SMashQ-OD because these algorithmsprocess k-NN queries one by one (Figures 15a and 15b). Conversely, SMashQ-ODU is able to process a group of queries issued from different users at thesame time. When the number of users increases, there is a higher chance toform larger user groups, and thus, the number of external Web mapping re-quests per query of SMashQ-ODU reduces. The number of querying users hasno obvious impact on the accuracy of travel time estimation and query answer,as depicted in Figures 15c and 15d, respectively. The results of this experi-ment also verify that the SMashQ algorithms can scale up to a large numberof users.

7 Conclusion

In this paper, we proposed a spatial mashup framework SMashQ for a databaseserver to process k-nearest-neighbor (NN) queries in time-dependent road net-works. SMashQ issues external requests to the Web mapping service, e.g.,Google Maps and Microsoft Bing Maps, to retrieve the route and travel timebetween two locations. Based on the retrieved route and travel time informa-tion, SMashQ is able to answer k-NN queries. An exact algorithm was designedfor SMashQ based on the existing pruning techniques for external data. Sinceexternal Web mapping requests are expensive, three shared execution tech-niques, namely, object grouping, direction sharing, and user grouping, wereproposed for SMashQ to effectively reduce the number of external requests.We conducted extensive experiments to evaluate the performance of SMashQwith the shared execution optimizations using Microsoft Bing Maps, the realroad network, the real data sets, and the synthetic data set. The experimen-tal results show that SMashQ significantly reduces the number of externalrequests, it provides highly accurate query answers, and it is able to scale upto large numbers of objects and users.

7.1 Future Research Directions

We have two future research directions for this work. In this paper, we assumedthat the location-based service provider is trusted, so there is no privacy issuefor the user. However, if the user does not trust the service provider, one possi-ble solution is place a location anonymizer between the user and SMashQ that


is responsible for blurring a user’s exact location into a cloaked road segmentset S that satisfies the user’s privacy requirements, namely, k-anonymity andminimum road segment length [3]. The location anonymizer can transform theuser’s exact location into a set of external intersections of S, and then send allthese external intersections to SMashQ as the user’s location to find a k-NNquery answer for each external intersection. After the location anonymizer re-ceives the query answers, since it knows the user’s location, it can computethe exact answer and send it to the user. As a result, SMashQ and the Webmapping service provider should not be able to identify the actual queryinguser with a probability larger than 1/k or pinpoint the user’s location withinS. Stricter privacy requirements lead to more external intersections that haveto be processed by SMashQ. Thus, new query optimization techniques shouldalso be developed for SMashQ to find query answers for external intersectionsefficiently.

Google Maps API has recently provided a new service, called distancematrix, that returns the network distance and travel time of every combinationfrom a set of source locations to a set of destination locations defined in adistance matrix request [28]. Unfortunately, we have chosen Microsoft BingMaps to implement SMashQ, but it does not support the distance matrixservice. The second research direction is to investigate how to incorporatethe distance matrix service into SMashQ, overcome its limitations (e.g., theevaluation license of Google Distance Matrix API limits to 100 locations perquery and 10 seconds and 2,500 locations per 24 hours), and design new queryoptimization techniques for SMashQ.

References

1. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessibledatabases. In: Proceedings of the IEEE International Conference on Data Engineer-ing (2002)

2. Chang, K.C.C., Hwang, S.W.: Minimal probing: Supporting expensive predicates fortop-k queries. In: Proceedings of the ACM Conference on Management of Data (2002)

3. Chow, C.Y., Mokbel, M.F., Bao, J., Liu, X.: Query-aware location anonymization forroad networks. GeoInformatica 15(3), 571–607 (2011)

4. Chow, C.Y., Mokbel, M.F., Leong, H.V.: On efficient and scalable support of continuousqueries in mobile peer-to-peer environments. IEEE Trans. on Mobile Computing 10(10),1473–1487 (2011)

5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms (3.ed.). MIT Press (2009)

6. Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Efficient k-nearest neighbor searchin time-dependent spatial networks. In: Proceedings of the International Conference onDatabase and Expert Systems Applications (2010)

7. Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Towards k-nearest neighbor searchin time-dependent spatial network databases. In: International Workshop on Databasesin Networked Systems (2010)

8. Fu, T.Y., Peng, W.C., Lee, W.C.: Parallelizing itinerary-based KNN query processingin wireless sensor networks. IEEE Trans. on Knowledge and Data Engineering 22(5),711–729 (2010)

9. Gedik, B., Liu, L.: Mobieyes: A distributed location monitoring service using movinglocation queries. IEEE Trans. on Mobile Computing 5(10), 1384–1402 (2006)


10. George, B., Kim, S., Shekhar, S.: Spatio-temporal network databases and routing al-gorithms: A summary of results. In: Proceedings of the International Symposium onSpatial and Temporal Databases (2007)

11. Google Maps: URL http://maps.google.com12. Google Maps/Google Earth APIs Terms of Service: URL http://code.google.com/

apis/maps/terms.html13. Hu, H., Xu, J., Lee, D.L.: A generic framework for monitoring continuous spatial queries

over moving objects. In: Proceedings of the ACM Conference on Management of Data(2005)

14. Huang, X., Jensen, C.S., Saltenis, S.: The islands approach to nearest neighbor queryingin spatial networks. In: Proceedings of the International Symposium on Spatial andTemporal Databases (2005)

15. Jensen, C.S.: Database aspects of location-based services. In: Location-Based Services,pp. 115–148. Morgan Kaufmann (2004)

16. Kanoulas, E., Du, Y., Xia, T., Zhang, D.: Finding fastest paths on a road networkwith speed patterns. In: Proceedings of the IEEE International Conference on DataEngineering (2006)

17. Lee, D.L., Zhu, M., Hu, H.: When location-based services meet databases. MobileInformation Systems 1(2), 81–90 (2005)

18. Levandoski, J.J., Mokbel, M.F., Khalefa, M.E.: Preference query evaluation over ex-pensive attributes. In: Proceedings of the International Conference on Information andKnowledge Management (2010)

19. Microsoft Bing Maps: URL http://www.bing.com/maps/20. Mokbel, M.F., Xiong, X., Aref, W.G.: SINA: scalable incremental processing of contin-

uous queries in spatio-temporal databases. In: Proceedings of the ACM Conference onManagement of Data (2004)

21. Mokbel, M.F., Xiong, X., Hammad, M.A., Aref, W.G.: Continuous query processing ofspatio-temporal data streams in PLACE. Geoinformatica 9(4), 343–365 (2005)

22. Mouratidis, K., Papadias, D., Hadjieleftheriou, M.: Conceptual partitioning: An effi-cient method for continuous nearest neighbor monitoring. In: Proceedings of the ACMConference on Management of Data (2005)

23. Nehme1, R.V., Rundensteiner, E.A.: SCUBA: Scalable cluster-based algorithm for eval-uating continuous spatio-temporal queries on moving objects. Proceedings of the Inter-national Conference on Extending Database Technology (2006)

24. Papadias, D., Zhang, J., Mamoulis, N., Tao, Y.: Query processing in spatial networkdatabases. In: Proceedings of the International Conference on Very Large Data Bases(2003)

25. Samet, H., Sankaranarayanan, J., Alborzi, H.: Scalable network distance browsing inspatial databases. In: Proceedings of the ACM Conference on Management of Data(2008)

26. Tanin, E., Harwood, A., Samet, H.: Using a distributed quadtree index in peer-to-peernetworks. The International Journal on Very Large Data Bases 16(2), 165–178 (2007)

27. Tao, Y., Papadias, D., Shen, Q.: Continuous nearest neighbor search. In: Proceedingsof the International Conference on Very Large Data Bases (2002)

28. The Google Distance Matrix API (Last visited: May 18, 2012): URL https://

developers.google.com/maps/documentation/distancematrix/29. TIGER/Line Shapefiles 2009 for: Hennepin County, Minnesota: URL http://www2.

census.gov/cgi-bin/shapefiles2009/county-files?county=2705330. Vancea, A., Grossniklaus, M., Norrie, M.C.: Database-driven web mashups. In: IEEE

ICWE (2008)31. Wu, S.H., Chuang, K.T., Chen, C.M., Chen, M.S.: DIKNN: An itinerary-based KNN

query processing algorithm for mobile sensor networks. In: Proceedings of the IEEEInternational Conference on Data Engineering (2007)

32. Yahoo! Maps: URL http://maps.yahoo.com33. Zhang, D., Chow, C.Y., Li, Q., Zhang, X., Xu, Y.: Efficient evaluation of k-NN queries

using spatial mashups. In: Proceedings of the International Symposium on Spatial andTemporal Databases (2011)

34. Zhu, Q., Lee, D.L., Lee, W.C.: Collaborative caching for spatial queries in mobile P2Pnetworks. In: Proceedings of the IEEE International Conference on Data Engineering(2011)

smashq: spatial mashup framework for k-nn queries in time …chiychow/papers/dapd_2012.pdf · 2012....

Documents