ieee transactions on knowledge and data engineering, vol...

15
Effective and Efficient: Large-Scale Dynamic City Express Siyuan Zhang, Lu Qin, Yu Zheng, Senior Member, IEEE, and Hong Cheng Abstract—Due to the large number of requirements for city express services in recent years, the current city express system is found to be unsatisfactory for both the service providers and customers. In this paper, we are the first to systematically study the large-scale dynamic city express problem. We aim to increase both the effectiveness and the efficiency of the scheduling algorithm. The challenges of the problem stem from the highly dynamic environment, the NP-completeness with respect to the number of requests, and real-time demands for the scheduling result. We introduce a basic algorithm to assign a request to a courier on a first-come, first-served basis. To improve the effectiveness of the basic algorithm, we adopt a batch assignment strategy that computes the pickup-delivery routes for a group of requests received in a short period rather than dealing with each request individually. To improve the efficiency of the algorithm, we further design a two-level priority queue structure to reduce redundant shortest distance calculation and repeated candidate generation. We develop a simulation system and conduct extensive performance studies on the real road network of Beijing city. The experimental results demonstrate the high effectiveness and efficiency of our algorithms. Remarkably, our system can achieve much better service quality and largely reduce the operation cost of a city express company simultaneously. Index Terms—City express service, logistics, batch assignment Ç 1 INTRODUCTION W ITH the development of logistics industry and the rise of E-commerce, city express services have become increasingly popular in recent years [1]. We demonstrate how current city express systems work in Fig. 1: A city is divided into several regions (e.g., R 1 and R 2 ), each of which covers some streets and neighborhoods. A transit station is built in a region to temporarily store the parcels received in the region (e.g., ts 1 in R 1 ). The received parcels in a transit station are further organized into groups according to their destinations. Each group of parcels will be sent to a corre- sponding transit station by trucks regularly (e.g., from ts 1 to ts 2 ). In each region, there are a team of couriers delivering parcels to and receiving parcels from specific locations in the region. When a truck carrying parcels arrives at a transit station, each courier will send a portion of these parcels to their final destinations by a small delivery van, or a bike, or a motorcycle, which has a limited capacity. Before departing from the transit station, they will pre-compute the delivery routes (e.g., the blue lines in Fig. 1) usually based on their own knowledge. During the delivery, each courier could receive pickup requests (e.g., r 5 , r 6 and r 7 ) from a central dispatch system or directly from end users. Each pickup request is associated with a location and a deadline of pickup time. A courier may change the originally planned route to fetch the new parcels, or decline the pickup request due to the constraints in their schedule or their vehicle capacity. All the couriers are required to return to their own transit station by some specific time (so as to fit the schedule of trucks that travel between transit stations regularly), or when fully loaded. The service quality and operational efficiency of current express services have never been found satisfactory due to the following three reasons. First, current central dispatch systems process each pickup request individually without a global optimization. For example, a new pickup request (e.g., r 5 ) is usually assigned to the nearest courier (e.g., c 1 ) to the pickup location. In the meantime, each courier makes a decision on whether to pick up a new parcel solely based on his own situation without knowing the status of other cou- riers in the same region (e.g., c 3 can pick up a parcel at r 5 instead of c 1 ). Second, the dispatch systems do not know the current status of a courier either, e.g., the remaining capac- ity, the number of parcels that have not been delivered, and the following pickup-delivery route, before assigning a new pickup request. Note that these statuses would change dynamically due to the new pickup requests. Third, requests near the boundary of a region (e.g., r 7 ) are ignored by cou- riers (e.g., c 1 ) in other regions because couriers only pick up parcels in their own regions. Motivated by the huge number of requirements and drawbacks in current city express systems, in this paper, we study the dynamic city express problem and aim to design a central dispatch system with an effective scheduling algo- rithm for couriers to deliver and pick up parcels in real time. Each courier in the system carries a handheld device record- ing his location, uploading his status when he finishes a delivery or pickup task, and receiving new pickup requests. The central dispatch system receives information of couriers from their handheld devices and manages their schedules S. Zhang and H. Cheng are with The Chinese University of Hong Kong, Hong Kong. E-mail: {syzhang, hcheng}@se.cuhk.edu.hk. L. Qin is with the University of Technology Sydney, Ultimo, 2007, Australia. E-mail: [email protected]. Y. Zheng is with Microsoft Research, Beijing 100080, China. E-mail: [email protected]. Manuscript received 11 Jan. 2016; revised 27 July 2016; accepted 21 Aug. 2016. Date of publication 31 Aug. 2016; date of current version 2 Nov. 2016. Recommended for acceptance by F. Afrati. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TKDE.2016.2604806 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016 3203 1041-4347 ß 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

Effective and Efficient: Large-ScaleDynamic City Express

Siyuan Zhang, Lu Qin, Yu Zheng, Senior Member, IEEE, and Hong Cheng

Abstract—Due to the large number of requirements for city express services in recent years, the current city express system is found

to be unsatisfactory for both the service providers and customers. In this paper, we are the first to systematically study the large-scale

dynamic city express problem. We aim to increase both the effectiveness and the efficiency of the scheduling algorithm. The challenges

of the problem stem from the highly dynamic environment, the NP-completeness with respect to the number of requests, and real-time

demands for the scheduling result. We introduce a basic algorithm to assign a request to a courier on a first-come, first-served basis. To

improve the effectiveness of the basic algorithm, we adopt a batch assignment strategy that computes the pickup-delivery routes for a

group of requests received in a short period rather than dealing with each request individually. To improve the efficiency of the

algorithm, we further design a two-level priority queue structure to reduce redundant shortest distance calculation and repeated

candidate generation. We develop a simulation system and conduct extensive performance studies on the real road network of Beijing

city. The experimental results demonstrate the high effectiveness and efficiency of our algorithms. Remarkably, our system can achieve

much better service quality and largely reduce the operation cost of a city express company simultaneously.

Index Terms—City express service, logistics, batch assignment

Ç

1 INTRODUCTION

WITH the development of logistics industry and the riseof E-commerce, city express services have become

increasingly popular in recent years [1]. We demonstratehow current city express systems work in Fig. 1: A city isdivided into several regions (e.g., R1 and R2), each of whichcovers some streets and neighborhoods. A transit station isbuilt in a region to temporarily store the parcels received inthe region (e.g., ts1 in R1). The received parcels in a transitstation are further organized into groups according to theirdestinations. Each group of parcels will be sent to a corre-sponding transit station by trucks regularly (e.g., from ts1 tots2). In each region, there are a team of couriers deliveringparcels to and receiving parcels from specific locations inthe region. When a truck carrying parcels arrives at a transitstation, each courier will send a portion of these parcels totheir final destinations by a small delivery van, or a bike, ora motorcycle, which has a limited capacity. Before departingfrom the transit station, they will pre-compute the deliveryroutes (e.g., the blue lines in Fig. 1) usually based on theirown knowledge. During the delivery, each courier couldreceive pickup requests (e.g., r5, r6 and r7) from a centraldispatch system or directly from end users. Each pickuprequest is associated with a location and a deadline ofpickup time. A courier may change the originally planned

route to fetch the new parcels, or decline the pickup requestdue to the constraints in their schedule or their vehiclecapacity. All the couriers are required to return to their owntransit station by some specific time (so as to fit the scheduleof trucks that travel between transit stations regularly), orwhen fully loaded.

The service quality and operational efficiency of currentexpress services have never been found satisfactory due tothe following three reasons. First, current central dispatchsystems process each pickup request individually without aglobal optimization. For example, a new pickup request(e.g., r5) is usually assigned to the nearest courier (e.g., c1) tothe pickup location. In the meantime, each courier makes adecision on whether to pick up a new parcel solely based onhis own situation without knowing the status of other cou-riers in the same region (e.g., c3 can pick up a parcel at r5instead of c1). Second, the dispatch systems do not know thecurrent status of a courier either, e.g., the remaining capac-ity, the number of parcels that have not been delivered, andthe following pickup-delivery route, before assigning a newpickup request. Note that these statuses would changedynamically due to the new pickup requests. Third, requestsnear the boundary of a region (e.g., r7) are ignored by cou-riers (e.g., c1) in other regions because couriers only pick upparcels in their own regions.

Motivated by the huge number of requirements anddrawbacks in current city express systems, in this paper, westudy the dynamic city express problem and aim to design acentral dispatch system with an effective scheduling algo-rithm for couriers to deliver and pick up parcels in real time.Each courier in the system carries a handheld device record-ing his location, uploading his status when he finishes adelivery or pickup task, and receiving new pickup requests.The central dispatch system receives information of couriersfrom their handheld devices and manages their schedules

� S. Zhang and H. Cheng are with The Chinese University of Hong Kong,Hong Kong. E-mail: {syzhang, hcheng}@se.cuhk.edu.hk.

� L. Qin is with the University of Technology Sydney, Ultimo, 2007,Australia. E-mail: [email protected].

� Y. Zheng is with Microsoft Research, Beijing 100080, China.E-mail: [email protected].

Manuscript received 11 Jan. 2016; revised 27 July 2016; accepted 21 Aug.2016. Date of publication 31 Aug. 2016; date of current version 2 Nov. 2016.Recommended for acceptance by F. Afrati.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TKDE.2016.2604806

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016 3203

1041-4347� 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

consisting of pickup and delivery time and routes. After col-lecting pickup requests from customers in a short period, thesystem processes the requests in a batch according to theminimum incurred distance. Finally, the system sends theupdated schedules to all the couriers and the confirmation ordeclinemessages to the customers.

Challenges. In the literature, related problems such as vehi-cle routing with time window [2], [3], [4], [5], [6], and taxiridesharing [7], [8], [9] have been studied. However, the cityexpress problem is challenging due to the following reasons.

(1) Real-Time Demand and Dynamic Scheduling for a LargeNumber of Requests and Couriers. The real-time property of thecity express problem requires the couriers to adjust their routedynamically when new pickup requests are issued. Mean-while, the schedules of the couriers change dynamically ascouriers finish a delivery or pickup task. However, most exist-ing solutions for vehicle routing problem with time windoware based on the static assumption [2], [3] where all requestsare given in advance. Moreover, existing solutions fordynamic vehicle routing with time window [4], [5], [6] canonly handle a small number of requests per hour (less than100). In fact, thousands of pickup requests can be issuedwithin an hour in real time. A city express system shouldwork out the schedule of all couriers in a short time before thegathered information of couriers and requests is outdated.

(2) Exponential Search Space to Find a Feasible Schedule. Asshown in Section 2.2, the city express problem is an NP-complete problem even if all requests are given in advance.The optimal algorithm is usually with time complexityexponential to the capacity of a courier, i.e., the maximumnumber of parcels that can be carried by a courier. On theother hand, the spatio-temporal constraints of the cityexpress problem are looser than those of the taxi ridesharingproblem [7], [8], [9], which studies the routing problem oftaxis under specific time windows. More candidate couriersfor a pickup request and more feasible scheduling possibili-ties of a courier exist in the city express problem. It reducesthe pruning power of existing spatio-temporal index [7], [8]and search space index [9].

Contributions. In this paper, we tackle the above chal-lenges and make the following contributions.

(1) Systematic Study of Large-Scale Dynamic City ExpressProblem. We formally define the dynamic city express

problem which involves the scheduling of multiple couriersto serve pickup requests and delivery tasks in real time. Tothe best of our knowledge, this is the first work that system-atically studies the large-scale dynamic city express prob-lem on road networks.

(2) High Effectiveness Using Incurred Distance and BatchAssignment. We design a basic algorithm that handles pickuprequests on a first-come, first-served basis, and assigns a newrequest to the courier with the minimum incurred distance(the increased cost for a courier to serve the request). Such astrategy has been proven to be effective for the TravellingSalesman Problem (TSP) [10], [11], [12], which is a relaxationof the city express problem. We further observe that the shortresponse time from issuing the request to the confirmation ofthe request allows the system to gather multiple requests andassign the requests in a batch mode other than individually.Therefore, we propose a batch assignment algorithm toimprove the effectiveness of the basic solution.

(3) High Efficiency Using Space Reduction Strategies. Wefurther improve the computational efficiency of our solutionbased on a two-level priority queue structure. The proposedalgorithm, SIDF�, can reduce redundant shortest distancecomputation by utilizing a global priority queue, and avoidrepeated candidate generation by maintaining a local prior-ity queue for each courier.

(4) Extensive Performance Studies. We conduct extensiveperformance studies with a city express simulation systemon the real Beijing city road network. The experimentalresults show that our proposed algorithms can achieve bothhigh effectiveness and efficiency.

Outline. Section 2 provides the preliminaries, formallydefines the dynamic city express problem, and proves thecomplexity of the problem. Section 3 introduces our basicsolution based on the incurred distance with a simple lazypath computation strategy to reduce the computationalcost. Section 4 presents a new solution to improve the effec-tiveness of the algorithm using batch assignment. Section 5designs a two-level priority queue to improve the efficiency.Section 6 presents extensive experimental results on a realcity road network. Section 7 reviews related work, and Sec-tion 8 concludes the paper.

2 OVERVIEW

2.1 Preliminaries

We model a road network as a directed weighted graphGðV;EÞ, where V is a set of nodes (road intersections), and Eis a set of edges (roads). An edge ðu; vÞ 2 E connects twonodes u and v in V . We also define I to be the set of intermedi-ate nodes (with degree 2) which lie in edges. Each edgeðu; vÞ 2 E is associated with a positive weight wðu; vÞ denot-ing the time to travel along the edge. Note that travel distancecan be easily converted to travel timewhen the average travelspeed is given. Thus, for the simplicity of presentation, we usetravel time to denote travel cost in the rest of the paper.

For any two nodes v0 and vk in V [ I, a path from v0 to vk inG is a sequence p ¼ ðv0; v1; v2; . . . ; vk�1; vkÞ such that vi 2 Vfor any 1 � i � k� 1, and ðvi; viþ1Þ is an edge in G for any0 � i � k� 1. Note that ðv0; v1Þ (or ðvk�1; vkÞ) can be a partialedge when v0 2 I (or vk 2 I). Given a path p ¼ ðv0; v1; . . . ; vkÞ,the cost of the path, denoted as costðpÞ, is calculated as

Fig. 1. Dynamic city express.

3204 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016

Page 3: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

costðpÞ ¼Pk�1i¼0 wðvi; viþ1Þ. Given two nodes u and v inG, the

shortest path from u to v is the path pmin with the minimumcost among all paths from u to v in G. We denote the corre-sponding cost of pmin as costðu; vÞ.Definition 2.1 (Request). Given a road network GðV;EÞ, a

request is denoted as r ¼ ðl; dÞ, where lðrÞ is the location of therequest that lies in the road network G, and dðrÞ is the deadlineto pick up the parcel in the request.

We suppose that each request has the same service time ts,which is the time spent on serving a customer (e.g., filling theforms and wrapping up the parcels) for a courier. However,our proposed algorithms can easily handle requests with dif-ferent service time. After a request is issued, a staff will con-tact the customer within a short period tr to confirm ordecline the request. If the request is accepted, it will beassigned to a courier.

Let C ¼ fc1; c2; . . . ; cng be the set of couriers. As describedin introduction, each courier also has parcels for delivery onboard. We use the same form for pickup request to representa delivery task. The delivery tasks in each trip are assignedto the courier before the trip, and the pickup requests areassigned to the courier in real time during the trip. All thedelivery tasks are required to be finished before their dead-lines, whereas a new pickup request may ormay not be satis-fied. If it is not satisfied by any courier, the request isdeclined or asked tomodify its deadline for consideration.

Definition 2.2 (Schedule). For a courier ci 2 C, a schedule forci, denoted as Si ¼ ðri;1; ri;2; . . . ; ri;mi

Þ, is a sequence ofunserved tasks, such that if following the sequence to pick up/deliver the parcels, the courier can (1) arrive at the locationlðri;jÞ before the deadline dðri;jÞ for every 1 � j � mi, and (2)return to the transit station after serving ri;mi

within tmax timeafter setting off.

Note that Si only keeps the unserved tasks. Once a task isserved by courier ci, it is removed from the schedule of ci.Suppose lðciÞ is the current location of courier ci and tsðciÞ isthe location of the transit station where ci sets off. Given theschedule Si for courier ci, the cost of Si, denoted as costðSiÞ,is calculated as

costðSiÞ ¼ costðlðciÞ; lðri;1ÞÞ þX

1�j<mi

costðlðri;jÞ; lðri;jþ1ÞÞ

þ ts �mi þ costðlðri;miÞ; tsðciÞÞ;

where costðlðciÞ; lðri;1ÞÞ is the time spent on traveling fromthe current location of courier ci to the location of the firsttask in the schedule Si;

P1�j<mi

costðlðri;jÞ; lðri;jþ1ÞÞ is time

spent on traveling to the locations of all the tasks in Si fol-lowing the order in the schedule; ts �mi is the total servicetime for all the mi tasks in Si; and costðlðri;mi

Þ; tsðciÞÞ is thetime to travel from the location of the last task to the transitstation. For ease of representation, we use lðri;0Þ to denotelðciÞ and use lðri;miþ1Þ to denote tsðciÞ. Then costðSiÞ can be

simplified as follows:

costðSiÞ ¼X

0�j�mi

costðlðri;jÞ; lðri;jþ1ÞÞ þ ts �mi: (1)

In other words, we treat the current location of ci and thetransit station of ci as the locations of two special tasks ri;0and ri;miþ1 without any service time. The deadline of the

special task ri;miþ1 can be calculated as dðri;miþ1Þ ¼tsetðciÞ þ tmax, where tsetðciÞ is the set-off time for courier cifrom the transit station tsðciÞ.

A list of notations used in this paper is summarized inTable 1.

2.2 The Dynamic City Express Problem

In this paper, we study the dynamic city express problem(DCEP), which is defined as follows: Given a set of n couriersC ¼ fc1; c2; . . . ; cng, a set of delivery tasks at different transit sta-tions, and a stream of pickup requests, DCEP aims to update theschedule for each courier dynamically as new pickup requestsstream in, such that (1) all delivery tasks are satisfied, and (2) thepickup requests are satisfied as many as possible.

The following lemma shows the complexity of the prob-lem when all requests are given in advance. In the dynamiccase, the problem becomes even more difficult to handle.

Lemma 2.1. Given the set of couriers C and all requests inadvance, the problem to decide whether a certain ratio P ofrequests can be satisfied by the couriers is an NP-completeproblem.

Proof Sketch. Given the set of couriers C and all requests,we show that the problem to decide whether a certainpercentage P of requests can be satisfied by the couriersis a generalization of the decision version of the TravelingSalesman Problem (TSP) which is proven to be NP-com-plete [10]. Given a set of m nodes and the travel costbetween each pair of the m nodes, the problem is todecide whether it is possible to find a route shorter than L

TABLE 1List of Notations

Notation Definition

wðu; vÞ The weight of edge ðu; vÞcostðu; vÞ The shortest traveling time from u to vr ¼ ðl; dÞ A request for city expresslðrÞ The location of the request rdðrÞ The deadline to serve the request rC ¼ fc1; . . . ; cng The set of couriersSi ¼ fri;1; . . . ; ri;mi

g The schedule for courier cicostðSiÞ The cost of the schedule Si

lðciÞ or lðri;0Þ The current location of courier citsðciÞ or lðri;miþ1Þ The location of the transit station of citsetðciÞ The set-off time for ci from tsðciÞts Service time for a requesttr The maximum time to confirm/decline

the requesttmax The maximum time spent for a triptcur The current timeajðSiÞ The time for ci to arrive lðri;jÞ in Si

djðSiÞ The latest time for ci to arrive lðri;jÞ in Si

candðrÞ The candidate couriers to serve rcandðciÞ The candidate requests that ci can servecostðu; vÞ A lower bound of costðu; vÞDdistjðr; SiÞ The incurred distance of r to segment j

of Si

Ddistðr; SiÞ The incurred distance of r to Si

Ddistjðr; SiÞ A lower bound of Ddistjðr; SiÞ

ZHANG ETAL.: EFFECTIVE AND EFFICIENT: LARGE-SCALE DYNAMIC CITY EXPRESS 3205

Page 4: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

that visits every node once and returns to the originalnode. Given an instance of TSP, we can construct a corre-sponding instance of DCEP as follows.

(1) We construct a graph G of m nodes each of whichcorresponds to a node in TSP. For each pair ofnodes in G, we add an edge in G, and the weightof the edge is the corresponding travel costbetween the pair of nodes in TSP.

(2) We addm pickup requests on them nodes respec-tively, and each request sets its deadline to beinfinity.

(3) We set the number of couriers to be 1, P to be 100percent, ts to be 0, and tmax to be L. After the con-struction, solving the instance of TSP is exactlyequivalent to solving the corresponding con-structed instance of DCEP. This completes theproof. tu

2.3 Solution Overview

We first introduce a basic solution in Section 3. After receiv-ing a request r1, the system identifies candidate couriersthat can possibly handle it using the courier index. Amongthe candidate couriers, the basic algorithm finds the onewith the minimum incurred distance, and then updates thecourier’s schedule.

In contrast to the basic solution that handles requests ona first-come, first-served basis, our proposed solution inSection 4 collects a set of requests fr1; r2; . . . ; rmg that areissued within a short time, and processes them in a batch.The batch strategy allows more flexibility in assignment,thus can find more cost-effective pickup-delivery routesand potentially serve more requests. To improve thecomputational efficiency, we use a two-level priority queuestructure described in Section 5 to reduce redundant short-est distance calculations and avoid repeated candidategeneration.

In real practice, an express company may decline arequest either because it is impossible to serve it or the ser-vice cost is too high [2]. Existing solutions [5], [7], [8], [9]consider accepting or declining a request once after itarrives, which share a similar idea of our basic solution. Butin this work we will show that a batch processing mode cansatisfy more requests than the basic solution.

3 THE BASIC SOLUTION

In this section we first introduce a basic solution for DCEP.Note that our dynamic city express problem shares similarspatio-temporal constraints with the dynamic taxi rideshar-ing problem [7], [8], where a taxi ridesharing query asks ifthere exists a taxi in a city that can take the customer fromhis/her current location to a specific destination under timewindow and capacity constraints. We modify the existingmethod for taxi ridesharing service [8] to solve our problemand introduce a basic solution in this section. Our basicsolution adopts the same framework of taxi ridesharing,which is processed as:

(1) A list of candidate taxis that can satisfy pickup anddelivery time window constraints is first returned bya grid spatio-temporal index.

(2) The pickup and delivery locations are inserted intothe schedule of a candidate taxi with the minimumincurred distance.

Based on the same framework, in this section, we firstintroduce the spatio-temporal index we use for candidategeneration. Then we describe the basic solution to process anew request r in three steps: (1) identifying candidate cou-riers that can possibly serve r; (2) finding the courier thatcan serve r with the minimum incurred distance; and(3) updating the schedule of that courier. These componentsalso serve as basic building blocks in our improved algo-rithms introduced in Sections 4 and 5. These componentsand the related notations are first introduced in this sectionfor the ease of understanding.

3.1 Indexing

In order to compute the candidate couriers to serve a pickuprequest and avoid unnecessary shortest path computations,we build a Network Voronoi Diagram (NVD) to index allthe couriers in the road network G. Specifically, we selectthe set of transit stations as the generators and build a NVDon the road network using the algorithm introduced in [13].NVD partitionsG into a set of Voronoi regions each of whichis represented by a generator tsi. Note that an edge e 2 Emay belong to two different Voronoi regions. In this case,we add a new node in V that splits e into two edges, toensure that each edge in E belongs to only one Voronoiregion. For each location l in G, we use tsðlÞ to denote thegenerator for the region that l lies in. For each node v 2 V ,we precompute costðv; tsðvÞÞ and costðtsðvÞ; vÞ. For any twogenerators tsi and tsj, we precompute costðtsi; tsjÞ. We alsoprecompute the radius of each generator tsi, denoted asradiusðtsiÞ, which is the maximum cost from tsi to any nodein the Voronoi region of tsi. With the NVD index, given anylocation l in G, suppose l lies on the edge ðu; vÞ, the distancefrom l to tsðlÞ can be calculated as

costðl; tsðlÞÞ ¼minfwðl; uÞ þ costðu; tsðuÞÞ;wðl; vÞ þ costðv; tsðvÞÞg: (2)

For any two locations li and lj in G, the lower bound ofcostðli; ljÞ can be calculated as

costðli; ljÞ ¼maxf0; costðtsðliÞ; tsðljÞÞ� costðtsðliÞ; liÞ � costðlj; tsðljÞÞg:

(3)

Obviously, with the NVD index, for any two locations li andlj in G, costðli; ljÞ can be calculated in constant time. Notethat we can update the NVD index periodically to handlethe dynamic update of travel cost in road network.

Example 3.1. Fig. 2a shows the NVD index for a road net-work G with three transit stations ts0; ts1 and ts2 as

Fig. 2. The NVD index.

3206 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016

Page 5: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

generators. Accordingly, G is partitioned into threeregions by the bold lines. Roads in different Voronoiregions are depicted in different line styles. The distancematrix for generators is shown in Fig. 2b. The lowerbound of costðv1; v2Þ can be calculated as

costðv1; v2Þ ¼ maxf0; d02 � costðts0; v1Þ � costðv2; ts2Þg:

3.2 Candidate Courier Generation

Given a new request r, as the first step, we compute the setof candidate couriers for r, which is the set of couriers thatcan possibly arrive at the location lðrÞ before the deadlinedðrÞ from their current locations. We define the candidateset candðrÞ for a request r as follows

candðrÞ ¼ fcijcostðlðri;0Þ; lðrÞÞ � dðrÞ � tcurg: (4)

Here, costðlðri;0Þ; lðrÞÞ is calculated by Eq. (3) using theNVD index. In order to efficiently identify candðrÞ, wefirst compute the set of candidate Voronoi regionscandtsðrÞ as

candtsðrÞ ¼ ftsjjcostðlðtsjÞ; lðrÞÞ � radiusðtsjÞ� dðrÞ � tcurg:

After computing candtsðrÞ, for each tsj 2 candtsðrÞ, we enu-merate all couriers ci that lie in the Voronoi region of tsj andadd ci into candðrÞ if costðlðri;0Þ; lðrÞÞ � dðrÞ � tcur.

3.3 Incurred Distance Calculation

For each courier ci 2 candðrÞ, we insert request r into theschedule Si of ci and calculate the incurred distance for theinsertion. Due to the spatio-temporal constraints of ourproblem, we only calculate the incurred distance of validinsertion. In the rest of this paper, we denote any two conse-cutive requests ri;j and ri;jþ1 as segment j of Si (0 � j � mi).We call a segment j in Si a valid segment w.r.t. request r iffafter inserting r between ri;j and ri;jþ1, the deadlines of allrequests in the new Si can be satisfied. Specifically, two con-ditions need to be satisfied:

(1) Condition 1: After inserting r into segment j of Si,courier ci can serve r before its deadline dðrÞ, whichis formalized as

ajðSiÞ þ ts þ costðlðri;jÞ; lðrÞÞ � dðrÞ; (5)

where ajðSiÞ is the courier’s arrival time at lðri;jÞ, andts is the service time for ri;j.

(2) Condition 2: After inserting r into segment j of Si,other requests ri;k with j � k � mi þ 1 on Si can stillbe served on time, which can be formalized as

ajðSiÞ þ 2� ts þ costðlðri;jÞ; lðrÞÞþ costðlðrÞ; lðri;jþ1ÞÞ � djþ1ðSiÞ;

(6)

where we define djþ1ðSiÞ as the latest time for ci toarrive at lðri;jÞ in Si, such that each request ri;k withj � k � mi þ 1 can be served before its deadlinedðri;kÞ. For each 0 � j � mi þ 1, we can compute

djðSiÞ recursively as follows:

djðSiÞ ¼dðri;miþ1Þ j ¼ mi þ 1

minfdðri;jÞ; djþ1ðSiÞ�ajþ1ðSiÞ þ ajðSiÞg otherwise.

8><>:

(7)

Note that there is another condition that the number ofparcels carried by a courier cannot exceed the capacity ofthe courier. Since such a condition is trivial to handle, weomit it for the ease of discussion in the rest of the paper.

After we find the valid segment(s) in Si of courier ci, wecalculate the shortest incurred distance Ddistðr; SiÞ wheninserting r into the valid segment(s) of Si. Ddistðr; SiÞ isdefined as follows:

Ddistðr; SiÞ ¼ min0�j�mi; segment j of Si is valid w:r:t:r

Ddistjðr; SiÞ; (8)

where Ddistjðr; SiÞ is the incurred distance if we insertrequest r between ri;j and ri;jþ1 in Si, and is computed as

Ddistjðr; SiÞ ¼ costðlðri;jÞ; lðrÞÞ þ costðlðrÞ; lðri;jþ1ÞÞ� costðlðri;jÞ; lðri;jþ1ÞÞ:

(9)

For each candidate courier in candðrÞ, we calculate the short-est incurred distance when inserting r into the valid segment(s) of his schedule. Among all candidates, we find the courierci with the minimum incurred distance Ddistðr; SiÞ andupdate the schedule Si by inserting r into the corresponding

valid segment j. The values ajðSiÞ and djðSiÞ for 0 � j � mi

should be updated due to the insertion of r in Si.

3.4 Lazy Path Computation

According to Eq. (9), calculating the incurred distanceDdistjðr; SiÞ involves calculating costðlðri;jÞ; lðrÞÞ and costðlðrÞ;lðri;jþ1ÞÞ using the costly Dijkstra’s algorithm (or its variantA*) [14], with time complexity Oðmþ n � log ðnÞÞ, on a roadnetwork G with n nodes and m edges. In order to minimizethe number of exact shortest path computations, we computea lower bound ofDdistjðr; SiÞ for any segment j as

Ddistjðr; SiÞ ¼ maxf0; costðlðri;jÞ; lðrÞÞ þ costðlðrÞ; lðri;jþ1ÞÞ� costðlðri;jÞ; lðri;jþ1ÞÞg;

where costðl; l0Þ for any two locations l and l0 can be calcu-lated by Eq. (3) using the NVD index. The rationale is that ifthe Ddist of a segment is large, we can prune it without com-puting its Ddist. We call this strategy lazy shortest path com-putation. Accordingly, we design an algorithm BestR to findthe best courier for a new request r using lazy shortest pathcomputation, which is shown in Algorithm 1. We use a pri-ority queue H to maintain the candidate segments. Eachentry ðci; ri;j;Ddist; flagÞ in H denotes segment j in Si withincurred distance Ddist. flag is either true or false, denotingwhether Ddist is the exact Ddistjðr; SiÞ or the lower boundDdistjðr; SiÞ respectively. H maintains the segment with the

minimum Ddist, and is initialized to be ; (line 1). For eachcandidate ci in candðrÞ, we insert into H every segment jthat satisfies conditions 1 (Eq. (5)) and 2 (Eq. (6)) by replac-ing cost with cost (lines 2-6). The incurred distance of eachsegment j inH is the lower bound Ddistjðr; SiÞ. At this stage,

no exact distance has been computed. Next, we iteratively

ZHANG ETAL.: EFFECTIVE AND EFFICIENT: LARGE-SCALE DYNAMIC CITY EXPRESS 3207

Page 6: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

pop out the entry ðci; ri;j;Ddist; flagÞ inHwith the minimumDdist. If Ddist is the lower bound Ddistjðr; SiÞ (flag is false),

we compute the exact incurred distance Ddistjðr; SiÞ usingcostðlðri;jÞ; lðrÞÞ and costðlðrÞ; lðri;jþ1ÞÞ (line 11), check theconditions 1 (Eq. (5)) and 2 (Eq. (6)) using the exact cost(line 12), and push the entry with the exact incurred dis-tance into H if both conditions are satisfied (line 13). Other-wise, the segment ði; jÞ is the one with the minimumDdistjðr; SiÞ and is returned by the algorithm (lines 9-10). If

no segment is returned when H becomes ; (line 7), wereturn ð�1;�1Þ indicating that request r cannot be satisfied.

Algorithm 1. BestRðRequest r, Candidate Set candðrÞÞ1: H ;;2: for all ci 2 candðrÞ do3: compute costðlðrÞ; lðri;jÞÞ for all ri;j 2 Si;4: for all segment j in Si do5: if ValidLBðSi; j; rÞ then6: H:Pushððci; ri;j;Ddistjðr; SiÞ; falseÞÞ;7: whileH 6¼ ; do8: ðci; ri;j;Ddist; flagÞ H:PopðÞ;9: if flag ¼ true then10: return ðci; jÞ;11: compute costðlðri;jÞ; lðrÞÞ and costðlðrÞ; lðri;jþ1ÞÞ;12: if ValidðSi; j; rÞ13: H:Pushððci; ri;j;Ddistjðr; SiÞ; trueÞÞ;14: return ð�1;�1Þ;15: Procedure ValidLBðschedule Si, segment j, request rÞ16: return ajðSiÞ þ ts þ costðlðri;jÞ; lðrÞÞ � dðrÞ and ajðSiÞ þ

2ts þ costðlðri;jÞ; lðrÞÞ þ costðlðrÞ; lðri;jþ1ÞÞ � djþ1ðSiÞ;17: Procedure Validðschedule Si, segment j, request rÞ18: return ajðSiÞ þ ts þ costðlðri;jÞ; lðrÞÞ � dðrÞ and ajðSiÞ þ 2tsþ

costðlðri;jÞ; lðrÞÞ þ costðlðrÞ; lðri;jþ1ÞÞ � djþ1ðSiÞ;

3.5 Basic Algorithm

The overall algorithm is outlined in Algorithm 2. Suppose foreach courier ci, ajðSiÞ and djðSiÞ for 0 � j � mi þ 1 have beencomputed respectively (line 1-2). When a new request r isreceived, the algorithm loads the updated locations of all cou-riers (line 4), and invokes UpdateðC; rÞ to find the courierci 2 C to serve r and update the corresponding schedule Si

(line 5). InUpdateðC; rÞ (line 6-11), as step 1, the algorithm com-putes candðrÞ (line 7); as step 2, it finds the courier ci and seg-ment j in Si with the minimum incurred distance by invokingprocedure BestR (line 8) using lazy path computation; and asstep 3, it either declines r if no valid segment is found (lines 9-10) or updates the corresponding schedule using the Insert

procedure (line 11). In the InsertðSi; j; rÞ procedure (lines 12-17), after inserting r between ri;j and ri;jþ1 (line 13), akðSiÞ forall jþ 1 � k � mi þ 1 need to be updated in increasing order

of k (line 14-15), and dkðSiÞ for all 1 � k � jþ 1 need to beupdated in decreasing order of k (line 16-17).

4 EFFECTIVENESS: BATCH ASSIGNMENT

The Basic solution (Algorithm 2) adopts a first-come, first-served strategy in which early requests have high priorityto be assigned. However, using request arrival time as thefirst priority may result in inferior scheduling result. Weuse an example depicted in Fig. 3 for explanation.

Algorithm2.BasicðCouriersC and Their Current SchedulesÞ1: for all courier ci 2 C do2: compute ajðSiÞ and djðSiÞ for 0 � j � mi þ 1;3: while there is a new request r issued in real time do4: load updated locations of all couriers;5: UpdateðC; rÞ;6: Procedure Updateðcourier set C, request rÞ7: compute candðrÞ;8: ðci; jÞ BestRðr; candðrÞÞ;9: if i ¼ �1 then10: decline request r; return;11: InsertðSi, j, r);12: Procedure Insertðschedule Si, segment j, request rÞ13: insert r between ri;j and ri;jþ1 of Si;14: for k ¼ jþ 1 tomi þ 1 do15: akðSiÞ ak�1ðSiÞ þ ts þ costðlðri;k�1Þ; lðri;kÞÞ;16: for k ¼ jþ 1 down to 1 do17: dkðSiÞ minfdðri;kÞ; dkþ1ðSiÞ � akþ1ðSiÞ þ akðSiÞg;

Example 4.1. A courier c has two tasks r1 and r2 in hisschedule S. Four new requests r3; r4; r5; r6 arrive insequence. The basic solution first considers r3, but cannotsatisfy it. Then it processes r4 and updates the courier’sschedule to be r0 ! r1 ! r4 ! r2. The next step is tocheck r5 and r6. But neither of them can be served by cou-rier c due to the large incurred distance of r4 toschedule S.

In contrast, if we process the four requests accordingto their incurred distance in a batch model, we can firstsatisfy r6 because it has the minimum incurred distanceamong the four requests. Thus we update the schedule tobe r0 ! r1 ! r6 ! r2. We perform another search andinsert r5 into the new schedule. After that there is no sat-isfiable request on the map. It is obvious that with

Fig. 3. An example with one courier and four new requests.

3208 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016

Page 7: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

incurred distance as the first priority, we can satisfy morerequests with less incurred distance comparing to thebasic solution.

This example motivates us to consider processing a batchof requests using incurred distance as the first priority.Batch assignment is feasible in practice because theresponse period tr allows the algorithm to collect multiplerequests and assign them in any order. Moreover, as shownin [12], using the minimum incurred distance as node inser-tion order for TSP is a 2-approximation algorithm. Byadopting this strategy, we propose a new algorithm SIDF,which stands for shortest incurred distance first.

SIDF is outlined in Algorithm 3. After computing ajðSiÞand djðSiÞ for the current schedule Si of each courier ci 2 C

(lines 1-2), the algorithm loads the updated information ofall couriers, collects and processes the set of requests Rissued in every tr minutes (lines 3-4). Two steps, candidatecomputation (line 6) and schedule updating (line 7), areused to process the requests in R. In candidate computation,the set of candidate requests for each courier and the set ofcandidate couriers for each request are computed. In sched-ule updating, the requests are processed in the increasingorder of their minimum incurred distances to couriers.

Candidate Computation. The procedure ComputeCand tocompute the candidates is shown in lines 8-13 of Algo-rithm 3. We first compute the candidates candðrÞ for eachrequest r 2 R (lines 10-11) using the same method in Algo-rithm 2. Then we compute the candidate set candðciÞ, whichis a set of requests that courier ci can reach before theirdeadlines, for each ci 2 C. candðciÞ can be considered as aninverse set of candðrÞ, thus, for each ci 2 candðrÞ, we simplyadd r into candðciÞ (lines 12-13).

Schedule Updating. In the schedule updating procedureUpdateR (line 14-29 of Algorithm 3), we use a global prior-ity queue Hg to maintain the priorities of all requests. Eachentry inHg has the form:

ðr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞÞ;

which means that the minimum incurred distance for therequest r is Ddistjðr; SiÞ when we insert r into segmentri;j ! ri;jþ1 of ci’s schedule Si. We compute the initial priori-ties of requests in R (lines 16-19), and assign the requestsaccording to their priorities in Hg (lines 20-28). The priorityof a request r is the minimum Ddistðr; SiÞ among the sched-ules of all candidate couriers, which can be computed usingthe procedure BestRðr; candðrÞÞ (line 17). After computingthe priorities of all requests and adding them into Hg

(lines 18-19), we pop out the request r with the highest pri-ority (minimum Ddist) from Hg iteratively, assign r to cou-rier ci by inserting it into the corresponding segment j of Si

(line 22), and remove r from the unprocessed request set R(line 23). After a request r is assigned to segment j of Si, thepriority of each request r0 2 candðciÞ needs to be updated inHg (lines 24-28). This is processed by removing r0 from Hg

(line 25), calculating the updated priority (line 26), andinserting it again intoHg if it is still valid (lines 27-28). Here,after inserting r into segment j of schedule Si, for eachrequest r0 2 candðciÞ, the priority of r0 needs to be recalcu-lated due to the following three factors:

(1) Segment Deletion. The original segment j is removedfrom Si. Therefore, the priority of a requestr0 2 candðciÞmay decrease due to the removal.

(2) Segment Insertion. Two new segments j and jþ 1 arecreated in Si. Therefore, the priority of a requestr0 2 candðciÞ may increase due to the insertion oftwo new segments.

(3) Segment Updating. The values of akðSiÞ (for

jþ 1 � k � mi þ 1) and dkðSiÞ (for 1 � k � jþ 1) areupdated after the insertion of r. Therefore, the prior-ity of a request r0 2 candðciÞ needs to be updatedaccordingly because some segments become invalidw.r.t. r0.

The algorithm terminates when all possible requests in Hg are

assigned. The remaining unassigned requests in R are declined

(line 29).

Algorithm3.SIDFðCouriersC andTheir Current SchedulesÞ1: for all courier ci 2 C do2: compute ajðSiÞ and djðSiÞ for 0 � i � mi þ 1;3: for every tr minutes do4: load updated locations of all couriers;5: R the set of all requests issued in the last tr minutes;6: ComputeCandðC;RÞ;7: UpdateRðC;RÞ;8: Procedure ComputeCandðcourier set C, request set RÞ9: candðciÞ ; for all ci 2 C;10: for all r 2 R do11: compute candðrÞ;12: for all ci 2 candðrÞ do13: candðciÞ candðciÞ [ frg;14: Procedure UpdateRðcourier set C, request set RÞ15: Hg ;;16: for all r 2 R do17: ðci; jÞ BestRðr; candðrÞÞ;18: if i 6¼ �1 then19: Hg:Pushððr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞÞÞ;20: whileHg 6¼ ; do21: ðr; ci; ri;j; ri;jþ1;DdistÞ Hg:PopðÞ;22: InsertðSi; j; rÞ;23: R R n frg;24: for all r0 2 candðciÞ and r0 2 Hg do25: Hg:Removeðr0Þ;26: ðci0 ; j0Þ BestRðr0; candðr0ÞÞ;27: if i0 6¼ �1 then28: Hg:Pushððr0; ci0 ; ri0;j0 ; ri0 ;j0þ1;Ddistj0 ðr0; Si0 ÞÞÞ;29: decline all requests in R;

5 EFFICIENCY: TWO-LEVEL PRIORITY QUEUE

STRUCTURE

SIDF improves the effectiveness of request assignment witha batch mode, but it is computationally expensive due toredundant shortest distance computations. Specifically, theredundant computations come from two aspects. First, inthe UpdateR procedure, when a request r is assigned to cou-rier ci (line 22), its schedule Si is updated. Then it invokesBestRðr0; candðr0ÞÞ to compute a new assignment for everyrequest r0 2 candðciÞ from scratch, which involves the costlyshortest distance computation for every r0 2 candðciÞ w.r.t.the schedule of every candidate courier c0 2 candðr0Þ.

ZHANG ETAL.: EFFECTIVE AND EFFICIENT: LARGE-SCALE DYNAMIC CITY EXPRESS 3209

Page 8: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

Second, SIDF only makes a new assignment after all theexact incurred distances of all requests are computed, whichmay not be necessary as some requests with very large dis-tances to some couriers can be easily pruned.

In this section, we aim to improve the efficiency of oursolution with a new algorithm SIDF� from two aspects.First, we observe that once a request r is assigned to theschedule Si of a courier ci, only those requests in candðciÞmay be affected in terms of their incurred distance to Si.The schedules of other couriers are not affected by thisassignment. Thus we can confine the update to courier ci.Second, we design a mechanism to delay and reduce theexact shortest distance computation as much as possible,while at the same time, still maintain the same effectivenessof SIDF. With these considerations, we design a two-levelpriority queue structure, which is illustrated in Fig. 4. At thebottom level, for each courier ci, we maintain a local priorityqueue Hi that keeps all the candidate requests candðciÞ; andat the top level, we maintain a global priority queue Hg thatkeeps the topmost entries popped from each local priorityqueue. With the two-level priority queue structure, after anew request r is assigned to a courier ci, we only need toupdate the local priority queue Hi and the global priorityqueue incrementally. For other couriers that also contain rin their local priority queues, they can discard r in a lazymanner. In this way, the total number of shortest distancecomputation can be largely reduced.

In the following, we first introduce the components of thetwo-level priority queue before describing the new schedul-ing algorithm SIDF�.

5.1 Local Priority Queue for Courier

For each courier ci, we maintain a local priority queue Hi

that keeps all the candidate requests candðciÞ. Each entry inthe local priority queueHi has the form:

ðr; ci; ri;j; ri;jþ1;Ddist; flagÞ:

It represents a possible assignment that assigns a requestr into segment ri;j ! ri;jþ1 of ci’s schedule. Different fromthe entry inHg of SIDF, a flag is added in each entry indicat-ing the incurred distance Ddist is an exact value (ifflag ¼ true) or a lower bound (if flag ¼ false).

The operations on the local priority queue include thefollowing.

1. Initialization. In the initialization step, for each requestr 2 candðciÞ, and each valid segment j in Si, we insert anentry ðr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞ; falseÞ to Hi. Here we only

calculate the lower bound of the incurred distance, but notthe exact incurred distance.

2. Pop. We pop the topmost entry in Hi and push it intothe global priority queue Hg. The rationale is, the request inthe top entry of a local priority queue is more likely to havea short incurred distance. Thus it will be first pushed intothe global priority queue for further verification.

3. Push. There are two cases that we will push an entryinto the local priority queueHi.

(1) For a request r and a valid segment j inSi, oncewe cal-culate the exact incurred distance Ddistjðr; SiÞ, wepush the entry ðr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞ; trueÞ backinto Hi for further comparison with other candidaterequests in the queue.

(2) When Si is updated with a new request inserted,there may be some requests that can possibly beserved by the courier ci due to this insertion. Accord-ingly, we push the entries for those new candidateswith the lower bound of their incurred distance intothe queueHi.

5.2 Global Priority Queue

On top of the local priority queues for couriers, we maintaina global priority queue Hg that keeps the topmost entriespopped from each local priority queue. The entries in Hg

have the same representation as those in local priorityqueues. The operations on the global priority queue includethe following.

1. Initialization. In the initialization step, the topmostentry from each local priority queue is popped out andpushed intoHg.

2. Pop. The top entry ðr; ci; ri;j; ri;jþ1;Ddist; flagÞ is poppedout from Hg. We will first check whether the request r is notassigned yet and whether the concerned segmentri;j ! ri;jþ1 is still valid (i.e., not altered by any earlierassignment). If the entry is still valid, we process it in one ofthe following two cases.

(1) If flag ¼ true, we will assign the request into the con-cerned segment j and update courier ci’s scheduleSi.

(2) If flag ¼ false, we will calculate the exact incurreddistance Ddistjðr; SiÞ and push this entry back to the

local priority queueHi for further comparison.3. Push. After the top entry is popped out from Hg, a new

entry related to the same courier is popped out from the cor-responding local priority queue and pushed intoHg.

Fig. 4. Global priority queue and local priority queues.

3210 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016

Page 9: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

5.3 Improved Algorithm SIDF�The new scheduling algorithm SIDF� is shown in Algo-rithm 4. We describe how it works in 5 steps.

Step 1. We initialize the global priority queue Hg and thelocal priority queue Hi for each courier ci 2 C (lines 1-6).The initialization of Hi is done in procedure InitH (lines 20-27), which computes the candidate entries for ci and pusheseach candidate along with the lower bound of its incurreddistance into the priority queue. To initialize Hg, we simplypop out the top element from each Hi (line 5) and push itinto Hg (line 6). We add a flag in each entry of Hg to markwhether the entry keeps the exact Ddist.

Step 2. We pop the top element ðr; ci; ri;j; ri;jþ1;Ddist; flagÞ(line 8) from the global priority queue Hg and attempt toassign request r to segment j in the schedule of courier ci.Before the assignment, we first check whether r has not beenassigned yet (i.e., r 2 R) and segment ri;j ! ri;jþ1 is still valid.If the entry is not valid at this step (line 9 returns false), we goto step 5. Otherwise, we go to step 3 if Ddist is a lower bound(flag ¼ false), or go to step 4 ifwe have exactDdist.

Step 3. We need to check ValidLBðSi; j; rÞ due to segmentupdating (line 10). If ValidLBðSi; j; rÞ, we compute the exactcostðlðri;jÞ; lðrÞÞ and costðlðrÞ; lðri;jþ1ÞÞ (line 11), and pushthe entry with exact Ddistjðr; SiÞ into Hi if ValidðSi; j; rÞ(lines 12-13), as described in case 1 of the push operation ofthe local priority queue before.

Step 4. We need to check ValidðSi; j; rÞ due to segmentupdating (line 14). If ValidðSi; j; rÞ, we can assign r to seg-ment j of Si by invoking AssignðSi; j; r; R;HiÞ. In theAssignðSi; j; r; R; HÞ procedure (lines 28-32), in addition toinserting r into segment j of Si (line 29) and removing rfrom R (line 30), we invoke InsertSegðH; Si; j; RÞ andInsertSegðH; Si; jþ 1; RÞ to push new entries into Hi due tothe insertion of r (lines 31-32), as described in case 2 of thepush operation of the local priority queue before.

Step 5. We simply pop out the top entry from the corre-sponding local priority queue Hi and push it into Hg

(lines 16-18). Then we repeat step 2 to step 5 until Hg

becomes ;. Finally, all the remaining requests in R aredeclined (line 19).

5.4 Analysis of SIDF�The benefits of keeping a local priority queue for each cou-rier include:

(1) All the updating entries after we make a new assign-ment are contained in one local priority queue. Usinga local priority queue avoids computing the newincurred distance of every affected request, whichconsiders a lot of repeated possible entries due to thelarge number of candidate couriers.

(2) The local priority queue enables incremental entryupdating. Once we make an assignment related tocourier ci, only new possible entries related to thetwo new segments in Si are pushed into Hi. Othervalid entries with exact incurred distance w.r.t. ciremain in Hi, which avoids repeated exact incurreddistance computation. No exact distance computa-tion is performed in the local priority queue.

The benefit of keeping a global priority queue is, it sup-ports lazy path computation among all assignments. We

only compute the exact incurred distance of an assignmentonce it is popped out of Hg, which reduces a lot of exactincurred distance computations.

Thus with the two-level priority queue structure, we cansubstantially improve the efficiency of the scheduling algo-rithm. Note that, SIDF� can achieve the same effectivenessas SIDF, since it still adopts the batch assignment strategyand the shortest incurred distance first criterion for requestassignment.

6 PERFORMANCE STUDIES

Data set. We use a real road network in Beijing city forexperimental study. We mainly focus on a 15� 5 km areathat lies between northeastern 4th ring road and 5th ringroad of Beijing. The road network contains 8,840 nodes and11,331 edges. In the scalability test, we use a series of roadnetworks with different sizes.

Algorithm 4. SIDF � ðCourier Set C, Request Set RÞ1: Hg ;;2: for all ci 2 C do3: Hi InitHðci; RÞ;4: ifHi 6¼ ; then5: ðr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞ; flagÞ Hi:PopðÞ;6: Hg:Pushððr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞ; flagÞÞ;7: whileHg 6¼ ; do8: ðr; ci; ri;j; ri;jþ1;Ddist; flagÞ Hg:PopðÞ;9: if r 2 R and ri;j ! ri;jþ1 is still a segment in Si then10: if flag ¼ false and ValidLBðSi; j; rÞ then11: compute costðlðri;jÞ; lðrÞÞ and costðlðrÞ; lðri;jþ1ÞÞ;12: if ValidðSi; j; rÞ then13: Hi:Pushððr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞ; trueÞÞ;14: if flag ¼ true and ValidðSi; j; rÞ then15: AssignðSi; j; r; R;HiÞ;16: ifHi 6¼ ;17: ðr0; ci; ri;j0 ; ri;j0þ1;Ddistj0 ðr0; SiÞ; flag0Þ Hi:PopðÞ;18: Hg:Pushðr0; ci; ri;j0 ; ri;j0þ1;Ddistj0 ðr0; SiÞ; flag0Þ;19: decline all requests in R;20: Procedure InitHðcourier ciÞ21: H ;;22: for all r 2 candðciÞ do23: compute costðlðrÞ; lðri;jÞÞ for all ri;j 2 Si;24: for all segment j in Si do25: if ValidLBðSi; j; rÞ then26: H:Pushððr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞ; falseÞÞ;27: returnH;28: Procedure Assignðschedule Si, segment j, request r, request

set R, priority queueHÞ29: InsertðSi; j; rÞ;30: R R n frg;31: InsertSegðH; Si; j; RÞ;32: InsertSegðH; Si; jþ 1; RÞ;33: Procedure InsertSegðpriority queueH,schedule Si,segment

j,request set RÞ34: for all r 2 candðciÞ \R do35: compute costðlðri;jÞ; lðrÞÞ and costðlðrÞ; lðri;jþ1ÞÞ;36: if ValidLBðSi; j; rÞ37: H:Pushððr; ci; ri;j; ri;jþ1;Ddistjðr; SiÞ; falseÞÞ;

Simulation with Parameter Settings. The set of parameters,their meanings, as well as the default values of all parame-ters are shown in Table 2. In our experiments, when we

ZHANG ETAL.: EFFECTIVE AND EFFICIENT: LARGE-SCALE DYNAMIC CITY EXPRESS 3211

Page 10: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

vary a certain parameter, all other parameters are set to betheir default values if not otherwise specified. Given a cer-tain set of parameter setting, we develop a simulation sys-tem to simulate the city express process as follows.

Transit Station and Courier Generation.We chooseK nodesas transit stations by performing the k-medoids algorithm onall the nodes in the road network. Then we build a networkVoronoi diagram with the transit stations as centers. Thecourier number in a transit station is proportional to thenumber of nodes in the corresponding Voronoi region, andthe total number of couriers is n. All couriers are located attheir corresponding transit stations at the beginning of thesimulation.

Request Generation. We assume that requests are gener-ated on all nodes in our experiments. Similar to most queu-ing systems [15], we consider the arrival of a pickup requeston a node as a Poisson process with intensity �, whichdenotes the average number of pickup requests arriving perhour on the node. We sample the � values for all nodes

from a normal distributionNð�; s2Þ, where the � is the aver-

age request intensity on a node. We set � to be 0.6 and s2 tobe 0.01 by default, and as a result, the average number ofpickup requests in an hour in the area is 5,400 by default.Such a setting is reasonable, since according to the statisticsof the State Post Bureau of China (http://www.chinapost.gov.cn/), the total number of parcels including pickup anddelivery in Beijing in the first quarter of 2015 is 283 million.Under the assumption that all parcels are uniformly distrib-uted in a region of 50 km � 50 km every day from 6:00 a.m.to 11:59 p.m., the expected number of pickup requests perhour in our experiment region should be 5,185, which isclose to our default value. The deadline for a pickup requestis set to be 30 minutes after the request is issued. Comparedwith the current city express company such as S.F. Express(http://www.sf-express.com/us/en/), which requires toreach the location of a pickup request in an hour after it isissued, our setting imposes a stricter requirement on the ser-vice quality.

Initial Delivery Assignment. The set of delivery tasksshould be generated at each transit station at the beginning.Thus, we use our request generation algorithm describedabove to generate the delivery in a period of 20 minutes andassign them to their corresponding transit stations. Thedelivery tasks are randomly assigned to the couriers in eachtransit station and the initial schedule of each courier is cal-culated based on Algorithm 3 by processing all delivery

tasks together. After setting off, each courier should returnto his transit station within tmax time and finish all theassigned delivery in this period. We set tmax to be 2 hours.Since there is no strict temporal constraint on each deliverytask, we can easily calculate the initial delivery routes at thebeginning of the simulation.

Service Process Simulation. We assume that the servicetime of each task follows an exponential distribution withmean ts, which is set to be 3 minutes by default. We updatethe locations of all couriers every 1 minute according totheir schedules and average speed r. Once the schedule of acourier is changed, we change the route of the courierimmediately.

Measurements. We test both the effectiveness and the effi-ciency of our proposed algorithms. All the experimentalresults are based on a simulation for 2 hours of the cityexpress process using our simulation system. For effective-ness testing, we compare the satisfaction ratio SR of differ-ent algorithms, which is computed as

SR ¼ # accepted pickup requests

# issued pickup requests:

We also compare the average incurred distance AID of dif-ferent algorithms, which is computed as

AID ¼X

r is satisfied by ci

Ddistðr; SiÞ=# accepted pickup requests:

The average incurred distance AID is used to measure theaverage increased cost to serve an accepted pickup request.

For efficiency testing, we compare the average process-ing time per request, which is defined as the time spent oncomputing the schedules for all requests within the 2 hoursdivided by the number of issued requests. We also comparethe average number of access nodes per request, which isthe average number of node access operations involved toprocess a new request within the 2 hours.

Evaluation Algorithms. For effectiveness testing, we com-pare three algorithms, namely, Nearest, Basic and SIDF�.Nearest is a simple baseline that assigns each request r to itsnearest courier ci that can possibly serve r. If no such cou-rier is found, the request is rejected. Basic (Algorithm 2)processes requests on a first-come, first-served basis, andSIDF� (Algorithm 4) is the efficient version of SIDF (Algo-rithm 3) with the same effectiveness. For efficiency testing,we first compare Nearest, Basic, and SIDF�. Then we com-pare SIDF and SIDF�.

TABLE 2Parameters and Default Values

Parameter Meaning Default

K The number of transit stations 7

� The average request intensity on a node 0:6=hr

n The number of couriers 500ts The average service time 3 minutestr The maximum time to confirm a request 15 minutestmax The maximum time for each trip 120 minutesmd The number of delivery tasks in tmax 1,600mp The number of pickup requests in tmax 10,800r The average speed of each courier 15 km/hdðrÞ The deadline for a pickup request r 30 minutes

Fig. 5. Vary n (effectiveness).

3212 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016

Page 11: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

6.1 Effectiveness Testing

(Exp-1: Varying n). In this experiment, we vary the numberof couriers n from 100 to 800. The experimental results forSR and AID are shown in Figs. 5a and 5b respectively.When n increases, SR and AID for all three algorithmsNearest, Basic, and SIDF� increase. When n is small, SRfor all three algorithms increase fast, and when n is large,SR for all algorithms increase slowly. The reason is that,when n is large, the potential for all three algorithms to sat-isfy more requests becomes small since most requests thatcan result in small incurred distance can still be satisfiedby the algorithms with small n. Basic increases SR by 20percent on average compared to Nearest, and SIDF�increases SR by 10 percent on average compared to Basic.Remarkably, as illustrated by the dashed line in Fig. 5a, tosatisfy the same number of requests (with SR ¼ 0:8), Basicrequires 800 couriers while SIDF� only requires 500 cou-riers. In other words, our algorithm can potentially helpthe express company to save substantial cost (i.e., 37.5 per-cent) while achieving the same satisfaction ratio (i.e.,SR ¼ 0:8). Regarding AID, Basic has a smaller AID thanNearest when n is less than 600 and a larger AID when n islarger than 600. It shows that trying to find an optimalschedule immediately after a request streams in may notlead to a smaller average incurred distance. The AID ofSIDF� is the shortest among the three algorithms under alln values, which demonstrates the high effectiveness of ourproposed batch algorithm SIDF�.

(Exp-2: Varying tr). We vary tr, the maximum time to con-firm a request, from 1 to 24 minutes and compare SR andAID for the three algorithms Nearest, Basic, and SIDF�. Theexperimental results for SR and AID are shown in Figs. 6aand 6b respectively. Nearest and Basic are independent oftr. Thus, both SR and AID remain unchanged for Nearestand Basic when we vary tr. For SR, SIDF� is better thanNearest and Basic for all tr values. When tr increases from 1to 12, SR for SIDF� increases. This is because the batch

assignment strategy in SIDF� can have a large room forimprovement when tr is large. However, when tr furtherincreases, SR for SIDF� starts to decrease. The reason is asfollows. When tr approaches dðrÞ (30 minutes by default),the time left for couriers to satisfy early arrival requestsbecomes short. For example, if request r arrives at the begin-ning of the tr period when tr is 24 minutes, only 6 minutesare left for couriers to satisfy it. Therefore, the overall SRdecreases when tr increases in this case. Based on this result,a city express company can choose the best tr using our sim-ulation system given a certain parameter setting. For AID,SIDF� has the shortest average incurred distance among thethree algorithms under all tr values. The AID of SIDF�decreases when tr increases from 1 to 10, but begins toincrease when tr is larger than 15. This is because when tr istoo large, SIDF� has to reject some requests that have asmall incurred distance with a strict deadline constraint(i.e., the requests issued at the beginning of tr period).

(Exp-3: Varying r). In this experiment, we vary the speedr of couriers from 15 to 25 (km/h). The trends of SR for thethree algorithms Nearest, Basic, and SIDF� are shown inFig. 7a. When r increases, SR for all three algorithmsincrease. This is because the increasing of speed means thedecreasing of cost to travel between any two nodes in thenetwork, which allows more requests to be served. When r

reaches 25, SR of Nearest, Basic and SIDF� is 0.67, 0.75 and0.92, respectively. The curves for AID when we vary r areshown in Fig. 7b. SIDF� has the shortest AID under allspeeds, while Basic has a smaller AID than Nearest onlywhen the speed is lower than 17 km/h. When the speed islarger than 17 km/h, both Basic and Nearest can satisfymore requests with small incurred distance, but Basic canalso satisfy those requests with larger incurred distance thatare rejected by Nearest, which leads to the higher AID asshown in Fig. 7b. The results are consistent with our analy-sis that SIDF� can choose a schedule with less incurred dis-tance than Basic.

6.2 Efficiency Testing

(Exp-4: Varying n). In this experiment, we vary the number ofcouriers n and test the average processing time and the num-ber of access nodes per request by the three algorithms. Theresults are shown in Figs. 8a and 8b respectively. When thenumber of couriers n increases, the total processing time aswell as the number of access nodes increase for all algo-rithms, because more couriers result in a larger search space.All the algorithms can process a request in less than 15 msand thus are suitable for realtime applications. Remarkably,SIDF� has a smaller average processing time per request

Fig. 6. Vary tr in minutes (effectiveness).

Fig. 7. Vary r in km/h (effectiveness).

Fig. 8. Vary n (efficiency).

ZHANG ETAL.: EFFECTIVE AND EFFICIENT: LARGE-SCALE DYNAMIC CITY EXPRESS 3213

Page 12: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

than Basic. This is because the average incurred distance perrequest by SIDF� is less than Basic, which means finding asuitable schedule inSIDF� needs less node access thanBasic.

(Exp-5: Varying tr). We vary tr, the maximum time to con-firm a request, from 1 to 24 minutes and test the efficiency ofNearest, Basic and SIDF�. The results for the average proc-essing time and the number of access nodes are shown inFigs. 9a and 9b respectively. When tr increases, the averagetime and the number of access nodes increase for SIDF�because more search space incurs for processing morerequests received in a longer tr period. The efficiency of twostreaming algorithms Basic and Nearest is not affected bythe window size tr. Nearest is the fastest among the threealgorithms while Basic needs more processing time for arequest than SIDF�. Remarkably, SIDF� can process about190 requests per second on average when tr ¼ 15 minutesand only 8 seconds are needed to process all requestsreceived in 15 minutes.

(Exp-6: Comparing SIDF and SIDF�). We compare theaverage processing time per request by SIDF and SIDF� toshow the efficiency improvement by the two-level priorityqueue. The results for varying n and tr are shown in Figs. 10and 11 respectively. SIDF� is six times faster than SIDF onaverage, which demonstrates the advantage of the globalpriority queue in reducing exact incurred distance calcula-tion, as well as the advantage of the local priority queues inmaintaining the information of candidate entries. Theresults for the number of access nodes are similar to theaverage processing time.

6.3 Performance Comparison with the ExactAlgorithm

Since SIDF� is an approximate algorithm, it is interesting tocompare its performance with an exact algorithm whichperforms exhaustive search to find the optimal schedule toserve the pickup requests. Given a set of pickup requests

and a set of n couriers, the exact algorithm first uses a bruteforce search strategy to explore all possible ways of requestassignments to the n couriers (without considering where toinsert them into the schedule). For every assignmentscheme, the exact algorithm uses dynamic programming toobtain the optimal schedule of each courier with the mini-mum travel cost. We assume that the couriers can satisfy allthe requests and compare the AID and average processingtime for SIDF� and the exact algorithm Exact.

(Exp-7: Varying the Number of Request mp When the CourierNumber n=2). We fix the number of couriers n ¼ 2. Theresults are shown in Figs. 12a and 12b respectively. It isobvious that SIDF� has very close AID to the exact algorithmwhile running at least two orders of magnitude faster thanthe exact algorithm when the number of request varies from4 to 14. When the number of request is larger than 14, theexact algorithm can not finish in 24 hours, thus we omit theresults when mp is larger than 14. This results show that: (1)the exact algorithm is computationally infeasible due to itsexponential cost, and (2) our method SIDF� can computethe request assignments with very close AID to that of theexact algorithm, and is orders of magnitude faster.

6.4 Scalability Test

(Exp-8: Varying the Number of Nodes N in the Road Networks).To evaluate the scalability of SIDF�, we compare the perfor-mance of the three algorithms, SIDF�, Basic and Nearest onroad networks of increasing size. Specifically, we run thethree algorithms on road networks corresponding to thesecond ring, third ring, fourth ring, fifth ring and sixth ringof Beijing. The largest road network consists of 81,000 nodesand 104,000 edges, and covers the main urban district ofBeijing. The number of couriers, transit stations and deliv-ery tasks are increased in proportion to the node number ofthe road network. For instance, the largest road networkcontains 5,000 couriers, 70 transit stations and 16,000

Fig. 10. Efficiency improvement by SIDF� (vary n).

Fig. 11. Efficiency improvement by SIDF� (vary tr).

Fig. 12. Varyingmp (effectiveness and efficiency).

Fig. 9. Vary tr in minutes (efficiency).

3214 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016

Page 13: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

delivery tasks. Other parameters are the same as the defaultvalue. The results for SR, AID, the average processing timeand the number of access nodes are shown in Figs. 13 and14 respectively. The results show that when the number ofnodes N increases from 8,400 to 81,000, SIDF� has the high-est SR and the smallest AID among the three algorithms.Besides, it uses less average processing time than thestreaming algorithm Basic according to Fig. 14. It is notedthat the SR of the three algorithms decrease when the roadnetwork contains 81,000 nodes. This is because the densityof the nodes is different in different regions of Beijing, andit takes more time to travel between nodes outside the fifthring region of Beijing. Overall, our method SIDF� scaleswell with the road network size in terms of both servicequality and computational efficiency.

7 RELATED WORKS

Query Processing on Spatial Networks. A lot of work has beendone on spatial network query processing. The most relatedworks include shortest path computation, kNN search, andbest point detour on a road network.

Shortest path and distance queries on road networkshave been experimentally evaluated by Wu et al. [16]. In[16], it is shown that the vertex importance based approachCH [17] is the most space-economic technique and spatialcoherence based method SILC [18] is the most efficientmethod. Inspired by CH, Zhu et al. [19] proposed the Arte-rial Hierarchy (AH) method that can outperform CH interms of both asymptotic and practical performance.

The kNN query in a spatial network was first studied byPapadias et al. [20]. In [20], two classical methods were pro-posed: INE (Incremental Network Expansion) and IER (Incre-mental euclidean Restriction). The LBC method proposed byDeng et al. [21] improved the INE method by maintaining apath distance lower bound for each candidate data point andsearching toward the candidateswithminimumpathdistancelower bound. Nutanong et al. [22] further improved LBC bydevising a novel heuristic function that considered all the can-didates for a query point as a single unit.

The best point detour problem is proposed by Shang et al.[23], which aims to find a detour that can pass a certain type ofnode with the minimum additional traveling cost. The bestpoint detour problem is different from our problem studiedin this paper sincewe aim at optimizing the route for multiplerequests andmultiple couriers in a dynamic environment.

Dynamic Vehicle Routing. Our problem is a restricted vari-ant of the dynamic vehicle routing problem [2], [24], [25],[26], [27], where each request that includes both pickup and

delivery location constraints are satisfied by a set ofvehicles. The dynamic vehicle routing problem has beenstudied extensively in the Operational Research literature.Our work is different from existing studies in three aspects:

(1) To the best of our knowledge, there is no existingwork that studies the dynamic vehicle routing prob-lem on a real road network, where it is costly to com-pute the shortest distance between two nodes, and isimpractical to precompute and store all-pair shortestdistances in the large road network. Our solutionexploits a spatio-temporal index to estimate thetravel cost lower bound and reduces the number ofexact shortest path computations. Therefore, we areable to handle real-world pick-up requests work-loads (e.g., 5,400 requests per hour with 800 couriersin a road network of Beijing). In contrast, the largestsimulation instance on dynamic vehicle routingproblem from the OR literature is 60 requests perhour with 100 couriers in euclidean space [28].

(2) Most works on dynamic vehicle routing problemassume that all the requests can be satisfied by thefixed numbers of couriers and their objective is tominimize the operational cost [4], [29], [30], [31], [32],[33], [34], [35] (e.g., the sum of the total travel cost forcouriers and the delayed cost for all requests). Ourobjective is to satisfy incoming pickup requests asmany as possible under the temporal constraints onpickup requests and couriers (i.e., each request mustbe satisfied on time once assigned to a courier andcouriers must come back to their transit stations ontime). This setting is more realistic, because in prac-tice, a city express company may not be able to sat-isfy all the pickup requests arriving in real-time dueto limited manpower.

(3) Compared with existing works [5], [28] that rejectrequests immediately due to hard time constraints,we adopt a batch solution that considers the schedul-ing problem for requests received in a short periodinstead of making a decision immediately after a newrequest arrives. As we demonstrate in our experi-ments, such a strategy can satisfy more requests thana basic streaming algorithm as we gather short termrequests information for further optimization.

Dynamic Taxi Ridesharing. Recently, dynamic taxi ride-sharing has been studied in the database community. A taxiridesharing query asks if there exists a taxi in a city that cantake the customer from his/her current location to a specificdestination under time window constraints. Our dynamic

Fig. 14. Varying N (efficiency).Fig. 13. Varying N (effectiveness).

ZHANG ETAL.: EFFECTIVE AND EFFICIENT: LARGE-SCALE DYNAMIC CITY EXPRESS 3215

Page 14: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

city express problem is different from the taxi ridesharingproblem in three aspects. First, though the number of taxisis larger than the couriers in our problems, the capacity of acourier is much larger than a taxi so that finding the optimalroute of a courier using kinetic tree structure [9] proposedfor taxi ridesharing is impractical. Second, the deadline ofprocessing a pickup request (e.g., 30 minutes after issuing)is larger than the maximal waiting time of a customer (e.g.,5 minutes) in ridesharing service. Moreover, there is a tighttime window for a taxi to reach customer’s destination but aparcel can usually wait a longer time at the transit stationbefore being delivered to its destination. Thus the spatial-temporal index proposed in [7], [8] is less effective in solv-ing city express problem resulting in more candidate cou-riers for a pickup request. Third, there is response time for arequest in city express service while a taxi ridesharingrequest needs instant response. Once a customer issues apickup request on the website or through a smartphone, astaff from the express company will contact the customer toconfirm or decline the request within a short period (e.g.,10 minutes). Such a period allows the system to gather mul-tiple requests for better scheduling. On the other hand, wepropose a new algorithm that can accelerate the requestassignment on a two-level priority queue structure, whichcan benefit the research in urban computing [1] such as theridesharing system.

8 CONCLUSION

In this paper, we propose a solution to batch processrequests in dynamic city express. We also develop a simula-tion platform to confirm both the effectiveness and effi-ciency of our solution. The new city express system hasthree advantages: First, when the intensity of pickuprequests is 72/km2 per hour (i.e., 5,400/75 km2), our solu-tion can increase the satisfaction ratio of pickup requests bymore than 10 percent compared to a basic solution and 30percent compared to an existing straightforward solution.On the other hand, our solution can save 37.5 percent man-power to achieve 80 percent satisfaction ratio compared tothe basic solution as shown in the experiment. Second, oursolution enjoys high efficiency with the help of the two-levelpriority queue structure. On average, we can process about190 pickup requests per second under default settings,which is 15 percent faster than the basic streaming algo-rithm. Third, the developed simulation platform can be usedto estimate the number of couriers a city express companyneeds to achieve a certain satisfaction ratio in urban area.For instance, by estimating the satisfaction ratio under dif-ferent courier numbers in the simulation, we can find that at

least 7 couriers/km2 (i.e., 500 couriers/75 km2) are neededto keep the satisfaction ratio above 80 percent, or equiva-lently, serve 8,640 pickup requests in 2 hours. In the future,we plan to consider more factors that can affect the servicequality of city express service such as the time-dependenttravel cost of road segments and the stochastic informationof requests’ locations.

ACKNOWLEDGMENTS

This work is supported by a Microsoft Research grant onurban informatics, the National Natural Science Foundation

of China (Grant No. 61672399), and CUHK Direct Grant No.4055015 and 4055048. Lu Qin is supported by ARCDE140100999 andARCDP160101513.

REFERENCES

[1] Y. Zheng, L. Capra, O. Wolfson, and H. Yang, “Urban computing:Concepts, methodologies, and applications,” ACM Trans. Intell.Syst. Technol., vol. 5, no. 3, pp. 38–55, 2014.

[2] V. Pillac, M. Gendreau, C. Gu�eret, and A. L. Medaglia, “A reviewof dynamic vehicle routing problems,” Eur. J. Oper. Res., vol. 225,no. 1, pp. 1–11, 2013.

[3] O. Br€aysy and M. Gendreau, “Vehicle routing problem with timewindows, part I: Route construction and local search algorithms,”Transp. Sci., vol. 39, no. 1, pp. 104–118, 2005.

[4] Z.-L. Chen and H. Xu, “Dynamic column generation for dynamicvehicle routing with time windows,” Transp. Sci., vol. 40, no. 1,pp. 74–88, 2006.

[5] M. Gendreau, F. Guertin, J.-Y. Potvin, and E. Taillard, “Paralleltabu search for real-time vehicle routing and dispatching,” Transp.Sci., vol. 33, no. 4, pp. 381–390, 1999.

[6] M. M. Solomon, “Algorithms for the vehicle routing and schedul-ing problems with time window constraints,” Oper. Res., vol. 35,no. 2, pp. 254–265, 1987.

[7] S. Ma, Y. Zheng, and O. Wolfson, “T-share: A large-scale dynamictaxi ridesharing service,” in Proc. IEEE 29th Int. Conf. Data Eng.,2013, pp. 410–421.

[8] S. Ma, Y. Zheng, and O. Wolfson, “Real-time city-scale taxiridesharing,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 7,pp. 1782–1795, Jul. 2015.

[9] Y. Huang, F. Bastani, R. Jin, and X. S. Wang, “Large scale real-timeridesharing with service guarantee on road networks,” Proc.VLDB Endowment, vol. 7, no. 14, pp. 2017–2028, 2014.

[10] M. M. Flood, “The traveling-salesman problem,” Oper. Res., vol. 4,no. 1, pp. 61–75, 1956.

[11] M. Jsnger, S. Thienel, and G. Reinelt, “Provably good solutions forthe traveling salesman problem,” Zeitschrift Oper. Res., vol. 40,no. 2, pp. 183–217, 1994.

[12] D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis, II, “An analysisof several heuristics for the traveling salesman problem,” SIAMjournal on computing, vol. 6, no. 3, pp. 45–69, 1977.

[13] M. R. Kolahdouzan and C. Shahabi, “Voronoi-based K nearestneighbor search for spatial network databases,” in Proc. 30th Int.Conf. Very Large Data Bases, 2004, pp. 840–851.

[14] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Intro-duction to Algorithms, 3rd ed. Cambridge, MA, USA: MIT Press,2009.

[15] S. M. Ross, Stochastic Processes, vol. 2. New York, NY, USA: Wiley,1996.

[16] L. Wu, X. Xiao, D. Deng, G. Cong, A. D. Zhu, and S. Zhou,“Shortest path and distance queries on road networks: An experi-mental evaluation,” Proc. VLDB Endowment, vol. 5, no. 5, pp. 406–417, 2012.

[17] R. Geisberger, P. Sanders, D. Schultes, and D. Delling,“Contraction hierarchies: Faster and simpler hierarchical routingin road networks,” in Proc. 7th Int. Conf. Exp. Algorithms, 2008,pp. 319–333.

[18] H. Samet, J. Sankaranarayanan, and H. Alborzi, “Scalable networkdistance browsing in spatial databases,” in Proc. ACM SIGMODInt. Conf. Manage. Data, 2008, pp. 43–54.

[19] A. D. Zhu, H. Ma, X. Xiao, S. Luo, Y. Tang, and S. Zhou, “Shortestpath and distance queries on road networks: Towards bridgingtheory and practice,” in Proc. ACM SIGMOD Int. Conf. Manage.Data, 2013, pp. 857–868.

[20] D. Papadias, J. Zhang, N. Mamoulis, and Y. Tao, “Query process-ing in spatial network databases,” in Proc. 29th Int. Conf. VeryLarge Data Bases, 2003, pp. 802–813.

[21] K. Deng, X. Zhou, H. T. Shen, S. W. Sadiq, and X. Li, “Instanceoptimal query processing in spatial networks,” VLDB J., vol. 18,no. 3, pp. 675–693, 2009.

[22] S. Nutanong and H. Samet, “Memory-efficient algorithms for spa-tial network queries,” in IEEE 29th Int. Conf. Data Eng., 2013,pp. 649–660.

[23] S. Shang, K. Deng, and K. Xie, “Best point detour query in roadnetworks,” in Proc. 18th SIGSPATIAL Int. Conf. Advances Geo-graphic Inf. Syst., 2010, pp. 71–80.

3216 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12, DECEMBER 2016

Page 15: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL …yli15/courses/DS595CS525Spring17/includes/… · IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 12,

[24] G. Laporte, “Fifty years of vehicle routing,” Transp. Sci., vol. 43,no. 4, pp. 408–416, 2009.

[25] J.-F. Cordeau and G. Laporte, “The dial-a-ride problem: modelsand algorithms,” Ann. Oper. Res., vol. 153, no. 1, pp. 29–46, 2007.

[26] D. S�aez, C. E. Cort�es, and A. N�u~nez, “Hybrid adaptive predictivecontrol for the multi-vehicle dynamic pick-up and delivery prob-lem based on genetic algorithms and fuzzy clustering,” Comput.Oper. Res., vol. 35, no. 11, pp. 3412–3438, 2008.

[27] C. E. Cort�es, A. N�unez, and D. S�aez, “Hybrid adaptive predictivecontrol for a dynamic pickup and delivery problem including traf-fic congestion,” Int. J. Adaptive Control Signal Process., vol. 22, no. 2,pp. 103–123, 2008.

[28] A. Fabri and P. Recht, “On dynamic pickup and delivery vehiclerouting with several time windows and waiting times,” Transp.Res. Part B: Methodological, vol. 40, no. 4, pp. 335–350, 2006.

[29] M. Savelsbergh and M. Sol, “Drive: Dynamic routing of indepen-dent vehicles,” Oper. Res., vol. 46, no. 4, pp. 474–490, 1998.

[30] O. B. Madsen, H. F. Ravn, and J. M. Rygaard, “A heuristic algo-rithm for a dial-a-ride problem with time windows, multiplecapacities, and multiple objectives,” Ann. Oper. Res., vol. 60, no. 1,pp. 193–208, 1995.

[31] D. Teodorovic and G. Radivojevic, “A fuzzy logic approach todynamic dial-a-ride problem,” Fuzzy Sets Syst., vol. 116, no. 1,pp. 23–33, 2000.

[32] M. Gendreau, F. Guertin, J.-Y. Potvin, and R. S�eguin,“Neighborhood search heuristics for a dynamic vehicle dispatch-ing problem with pick-ups and deliveries,” Transp. Res. Part C:Emerging Technol., vol. 14, no. 3, pp. 157–174, 2006.

[33] S. Mitrovi�c-Mini�c, R. Krishnamurti, and G. Laporte, “Double-horizon based heuristics for the dynamic pickup and deliveryproblem with time windows,” Transp. Res. Part B: Methodological,vol. 38, no. 8, pp. 669–685, 2004.

[34] S. Mitrovi�c-Mini�c and G. Laporte, “Waiting strategies for thedynamic pickup and delivery problem with time windows,”Transp. Res. Part B: Methodological, vol. 38, no. 7, pp. 635–655, 2004.

[35] L. Coslovich, R. Pesenti, and W. Ukovich, “A two-phase insertiontechnique of unexpected customers for a dynamic dial-a-rideproblem,” Eur. J. Oper. Res., vol. 175, no. 3, pp. 1605–1615, 2006.

Siyuan Zhang is currently working toward thePhD degree in the Department of Systems Engi-neering and Engineering Management, TheChinese University of Hong Kong. His majorresearch interests include spatial database andgeographic information systems.

Lu Qin received the bachelor’s degree from theRenmin University of China, in 2006, and thePhD degree from the Department of SystemsEngineering and Engineering Management, Chi-nese University of Hong Kong, in 2010. He is nowa senior lecturer in the Centre of Quantum Com-putation and Intelligent Systems, University ofTechnology Sydney.

Yu Zheng is a research manager with MicrosoftResearch, passionate about using big data totackle urban challenges. He currently serves asthe editor-in-chief of the ACM Transactions onIntelligent Systems and Technology. He is alsothe founding secretary of SIGKDD China Chapterand has served as chair on more than 10 presti-gious international conferences, e.g., as the pro-gram co-chair of ICDE 2014 (Industrial Track).He received five best paper awards from ICDE’13and ACM SIGSPATIAL10, etc. His book, titled

Computing with Spatial Trajectories, has been used as a text book in uni-versities world-widely and was awarded the Top 10 Most Popular Com-puter Science Book authored by Chinese at Springer. In 2013, he wasnamed one of the Top Innovators under 35 by MIT Technology Review(TR35) and featured by Time Magazine for his research on urban com-puting. In 2014, he was named one of the Top 40 Business Elites under40 in China by Fortune Magazine, because of the business impact ofurban computing he has been advocating since 2008. He is also a visit-ing chair professor with Shanghai Jiao Tong University and an adjunctprofessor with the Hong Kong University of Science and Technology. Heis a senior member of the IEEE.

Hong Cheng received the PhD degree from theUniversity of Illinois at Urbana-Champaign, in2008. She is an associate professor in theDepartment of Systems Engineering and Engi-neering Management, Chinese University ofHong Kong. Her research interests include datamining, database systems, and machine learn-ing. She received research paper awards atICDE’07, SIGKDD’06, and SIGKDD’05, and thecertificate of recognition for the 2009 SIGKDDDoctoral Dissertation Award. She received the

2010 Vice-Chancellor’s Exemplary Teaching Award at the Chinese Uni-versity of Hong Kong.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

ZHANG ETAL.: EFFECTIVE AND EFFICIENT: LARGE-SCALE DYNAMIC CITY EXPRESS 3217