[ieee 2010 2nd international workshop on database technology and applications (dbta) - wuhan, china...

4
An Efficient Method for Processing Reverse Skyline Queries over Arbitrary Spatial Objects Ah Han Dept. of Computer Engineering Myongji University Yongin-si, Korea [email protected] Zhonghe Li Dept. of Computer Engineering Myongji University Yongin-si, Korea [email protected] Dongseop Kwon Dept. of Computer Engineering Myongji University Yongin-si, Korea [email protected] Youngbae Park Dept. of Computer Engineering Myongji University Yongin-si, Korea [email protected] Abstract—Although several algorithms for computing the reverse skyline queries have been proposed, they are developed only for point datasets and cannot handle reverse skyline queries over arbitrary spatial objects such as regions, polygons or lines. In this paper, we introduce a novel method for processing reverse skyline queries over arbitrary spatial objects. The proposed method also processes reverse skyline queries efficiently because it reduces the number of disk accesses by pruning unnecessary traverses of nodes. Since arbitrary spatial objects may have overlaps with each other differently from point dataset, the proposed method allows users to choose a precedence among overlapped objects, which is useful for various applications such as decision support systems and data mining systems. Extensive experiments under various settings are conducted to prove the superiority of the proposed method. Keywords-skyline; reverse skyline; spatial objects. I. INTRODUCTION A reverse skyline of a query point is a set of objects whose dynamic skyline contains the query point. This means that the reverse skyline points are the best objects that have an interest to the query point more than others [1-2]. For example, suppose that preferences of each customer for hotels are stored as records in a database. Fig. 1(a) describes the hotels where seven customers prefer to stay as points (a~g) in the two- dimensional space (a price and a distance to a beach). If a hotel manager wants to advertise his hotel in the market, the manager can find customers whose preferences on hotels are close to the hotel by requesting a reverse skyline query of the hotel, which is represented as a query point q. In this case, the query result is {a, c, e}, which represents the customers who prefer to stay in the hotel. Consequently, the hotel manager can advertise more effectively by contacting only these three customers. A customer may get an interesting offer not a spam. The reverse skyline query has been utilized widely not only in the market analysis like the above example but also in many new applications such as environmental surveillance and quantitative economics research [3-4]. Although several algorithms for computing reverse skyline queries have been proposed, they are developed only for point objects. As far as we know, there are no algorithms for computing reverse skyline queries over arbitrary spatial objects such as regions, polygons or lines. The algorithm for processing reverse skyline queries over point objects cannot be directly applied to process those over arbitrary spatial objects, because arbitrary spatial datasets may have overlapping regions between objects or pruning spaces to discard non-reverse skyline points. These overlaps make domination relationship ambiguous, which means that an object may dominate only a part of another objects, so that you cannot determine the object dominates it or not. Fig. 1(b) describes hotels where customers prefer to stay as arbitrary spatial objects (a~g), which have range values from A to B in each dimension; a price is 1$ ~ 3$ and a distance to the beach is 100m ~ 300m. In this case, a query result is {a, c, e, and d}. It is different from the query result over point objects with the same query point in Fig. 1(a). The reason is that the object c, which clearly dominates a query point in case of point objects, partially dominates the query point over shape objects with respect to an object d. Consequently, the object d is included as a reverse skyline, since its dynamic skyline can have the query point according to what coordination of the object d is a standard. In this paper, we introduce an efficient method for processing reverse skyline queries over arbitrary spatial objects. It also reduces the number of disk accesses and minimizes a search space by pruning unnecessary traverses of nodes. Furthermore, differently from cases of points, our algorithm computes the probability that an object will be a reverse skyline with the ratio of dominating region, and can prioritize a query result with this probability. It allows user to choose precedence among the query results. To demonstrate the efficiency and effectiveness of our approach, extensive experiments under various settings are conducted. Specially, the experiments are focused on finding the relative factors to changing the performance. reverse skyline points of q c a e b d q Prices Distance g f reverse skyline points of q c d e a b q Prices Distance g f (a) In case of point objects (b) In case of shape objects Figure 1. Reverse skyline 978-1-4244-6977-2/10/$26.00 ©2010 IEEE

Upload: youngbae

Post on 09-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

An Efficient Method for Processing Reverse Skyline Queries over Arbitrary Spatial Objects

Ah Han Dept. of Computer

Engineering Myongji University Yongin-si, Korea

[email protected]

Zhonghe Li Dept. of Computer

Engineering Myongji University Yongin-si, Korea [email protected]

Dongseop Kwon Dept. of Computer

Engineering Myongji University Yongin-si, Korea

[email protected]

Youngbae Park Dept. of Computer

Engineering Myongji University Yongin-si, Korea

[email protected]

Abstract—Although several algorithms for computing the reverse skyline queries have been proposed, they are developed only for point datasets and cannot handle reverse skyline queries over arbitrary spatial objects such as regions, polygons or lines. In this paper, we introduce a novel method for processing reverse skyline queries over arbitrary spatial objects. The proposed method also processes reverse skyline queries efficiently because it reduces the number of disk accesses by pruning unnecessary traverses of nodes. Since arbitrary spatial objects may have overlaps with each other differently from point dataset, the proposed method allows users to choose a precedence among overlapped objects, which is useful for various applications such as decision support systems and data mining systems. Extensive experiments under various settings are conducted to prove the superiority of the proposed method.

Keywords-skyline; reverse skyline; spatial objects.

I. INTRODUCTION A reverse skyline of a query point is a set of objects whose

dynamic skyline contains the query point. This means that the reverse skyline points are the best objects that have an interest to the query point more than others [1-2]. For example, suppose that preferences of each customer for hotels are stored as records in a database. Fig. 1(a) describes the hotels where seven customers prefer to stay as points (a~g) in the two-dimensional space (a price and a distance to a beach). If a hotel manager wants to advertise his hotel in the market, the manager can find customers whose preferences on hotels are close to the hotel by requesting a reverse skyline query of the hotel, which is represented as a query point q. In this case, the query result is {a, c, e}, which represents the customers who prefer to stay in the hotel. Consequently, the hotel manager can advertise more effectively by contacting only these three customers. A customer may get an interesting offer not a spam. The reverse skyline query has been utilized widely not only in the market analysis like the above example but also in many new applications such as environmental surveillance and quantitative economics research [3-4].

Although several algorithms for computing reverse skyline queries have been proposed, they are developed only for point objects. As far as we know, there are no algorithms for computing reverse skyline queries over arbitrary spatial objects such as regions, polygons or lines. The algorithm for processing reverse skyline queries over point objects cannot be

directly applied to process those over arbitrary spatial objects, because arbitrary spatial datasets may have overlapping regions between objects or pruning spaces to discard non-reverse skyline points. These overlaps make domination relationship ambiguous, which means that an object may dominate only a part of another objects, so that you cannot determine the object dominates it or not. Fig. 1(b) describes hotels where customers prefer to stay as arbitrary spatial objects (a~g), which have range values from A to B in each dimension; a price is 1$ ~ 3$ and a distance to the beach is 100m ~ 300m. In this case, a query result is {a, c, e, and d}. It is different from the query result over point objects with the same query point in Fig. 1(a). The reason is that the object c, which clearly dominates a query point in case of point objects, partially dominates the query point over shape objects with respect to an object d. Consequently, the object d is included as a reverse skyline, since its dynamic skyline can have the query point according to what coordination of the object d is a standard.

In this paper, we introduce an efficient method for processing reverse skyline queries over arbitrary spatial objects. It also reduces the number of disk accesses and minimizes a search space by pruning unnecessary traverses of nodes. Furthermore, differently from cases of points, our algorithm computes the probability that an object will be a reverse skyline with the ratio of dominating region, and can prioritize a query result with this probability. It allows user to choose precedence among the query results. To demonstrate the efficiency and effectiveness of our approach, extensive experiments under various settings are conducted. Specially, the experiments are focused on finding the relative factors to changing the performance.

reverse skylinepoints of q

c

a e

b

d

q

Prices

Distance

gf

reverse skylinepoints of q

c

d

ea

b

q

Prices

Distance

gf

(a) In case of point objects (b) In case of shape objects

Figure 1. Reverse skyline

978-1-4244-6977-2/10/$26.00 ©2010 IEEE

The rest of this paper is organized as follows. Section 2 reviews a skyline query and previous works related to reverse skyline query processing. Section 3 formally defines our problems and Section 4 illustrates the whole proposed method using simply codes. Section 5 presents the performance of proposed approach and the decisive factors giving effects to them. Finally, Section 6 concludes this paper.

II. RELATION WORK Given a set P of d-dimensional points, skyline queries

return all points in P that are not dominated by another point. This means that skyline points are the best tuples according to any preference function that is monotone in each dimension [3-6]. For example, suppose that information about hotels is stored as tuples with same attributes (e.g. a price, a distance to a beach). When you want to find a correct hotel to you, the skyline query can help your decision by retrieving a set of hotels which should interest you.

Recently, Dellis and Seeger introduced a reverse skyline query at first time and proposed two algorithms for processing the query: a branch-and-bound reverse skyline (BBRS) and a reverse skyline using skyline approximations (RSSA) [1]. Given a set P of d-dimensional points, reverse skyline queries return all points in P that have a given query point in their dynamic skyline. This means that the reverse skyline points are tuples having an interest to the query and the query point is one of the best tuples with respect to the reverse skyline points [1-2]. While the skyline query focuses on a user perspective (selecting the products that they like), the reverse skyline query focuses on a company perspective. In the hotel’s example, suppose that information about hotels in which customers prefer to stay is stored as tuples. When a hotel manager wants to recommend the hotel in a market, the reverse skyline query can return a set of customers who have an interest on the hotel. By advertising the hotel to some customers who are represented as query results, the manager can expect a high commercial effect than broadcasting to all customers who they know, and the customers can get useful information without any effort.

III. PROPOSED METHOD FOR ARBITRARY SPATIAL OBJECTS Although there are many computing algorithms for reverse

skyline queries, they cannot handle the query over arbitrary spatial objects (shape objects) which have region values from A to B in each dimension since they are developed for only precise objects (point objects). In this section, we propose a novel method for processing reverse skyline queries, namely Reverse Skyline over Arbitrary Spatial Objects (called RSASO), which can cover not only point datasets but also shape datasets.

A. A Pruning Method to Get Minimal Candidates To have a good performance, it is important to identify

minimal objects which have a potential as the reverse skyline. For that, we introduce a pruning method to get minimal candidates by using a search area which is a space to retrieve reverse skyline points. It can prune objects which cannot be a reverse skyline by already selected candidates. The following is a formal definition for the search area.

Definition 1. (A search area for arbitrary spatial objects) Assume an arbitrary spatial object p has a region (p1

−,p1+;

p2−,p2

+;…;pd−,pd

+), where [pi−,pi

+] is the interval of the object along the i-th dimension (1≤ i≤ d). Given a query point q, (1) if qi < pi

−, SA(q,p) is [qi, (pi++ qi)/2] (2) if qi > pi

+, SA(q,p) is [(pi

−+ qi)/2, qi].

Fig. 2 shows a search area about a candidate object p which is a nearest neighbor object with respect to a query point in a 2-dimensional dataset. The search area of the object p is a space non-dominated by a point Sp which is a middle point between q and a right-top point (Tp) of p; it is a shaded region. Then objects which are not fully contained in this search area cannot obviously be the reverse skyline like an object s, since the object p is always dynamically dominating q with respect to the object s, which means that the object s has the object p not the query point in its dynamic skyline. In other words, if an object is fully included in the search area, the object is selected as a candidate for the reverse skyline like an object u. As we mentioned before, based on Definition 1, we propose a pruning method to get minimal candidates as follows.

Lemma 1. A pruning method to get minimal candidates:

Given a query point q and a candidate p, an object s can be pruned if the object s is fully contained in a non-search area of the object p. Proof. Objects which are not included in a search area of a candidate have not a query point in their dynamic skyline, since the candidate dominates the query point. Therefore, the objects cannot be the reverse skyline according to the fundamental of the reverse skyline.

The fully contained or fully non-contained objects in a search area can be clearly distinguished by the lemma 1, but not-fully contained objects, overlapped objects like an object k in Fig. 1(b), cannot be reduced. The reason is that the object k can be a non-reverse skyline with respect to Tp and a reverse skyline with respect to Bp. Therefore, the objects are selected as candidates since a query operator cannot ignore any data in a shape object, but their preference is re-computed for discriminating the fully contained objects. A method to re-compute preference according to an overlapping relation is minutely discussed in a subsection C.

Tp

s

p

Sppruning linefor SA(q,p)

q

candidate point

Tpp

Sp

q

k

Bp

Tp’

Tp

u

��������������������

��������������������

(a) SA(q,p) (b) SA(q,p) and an object k

Figure 2. Search area about object p

B. A Pruning Method to Determine Results We explained the pruning method to get minimal

candidates in prior subsection. In this subsection, we introduce a pruning method to determine results among the selected candidates by using a check area. It can prune objects which cannot be the reverse skyline by new target objects. The following is the formal definition for the check area.

Definition 2. (A check area for arbitrary spatial objects)

Assume arbitrary spatial object p has a region (p1−,p1

+; p2

−,p2+;…;pd

−,pd+), where [pi

−,pi+] is the interval of the object

along the i-th dimension (1≤ i≤ d). Given a query point q, (1) if qi < pi

−, CA(q,p) is [qi, 2pi+- qi] (2) qi > pi

+, CA(q,p) is [2pi−-

qi, qi]. Finally, (3) if pd−<qd <pd

+, CA(q,p) is [-2pi−- qi, 2pi

+- qi].

Fig. 3 shows a check area of a candidate object p with respect to a query point. The check area is differently created according to a location of the query point. If each dimensional coordinate of the query point is not included in a region of the object p (like Fig. 3(a)), the check area is a space between q and a point Cp. If a coordinate of the query point are included in the object p at least one dimension (like Fig. 3(b)), the check area is a space combined with two check areas which are generated by Tp and Tp’ of the object p; it is a shaded region. Then the point Cp has a coordinate of two times of a distance between the query point and a farthest vertex of p from the query point (e.g. Tp point in this example).

Obviously, a candidate object which contains any other object in the candidate’s check area cannot be the reverse skyline like the object p in Fig. 3(b), since a object u is always dynamically dominating a query point with respect to the object p, which means that the object p has the object u not the query point into its dynamic skyline. As we mentioned before, based on Definition 2, we propose a pruning method to determine results as follows.

Lemma 2. A pruning method to determine results: Given a

query point q and a candidate p, the candidate p can be pruned if any object s is fully contained in a check area of the object p. Proof. Candidates which include any object in their check area have not the query point into their dynamic skyline, since the any object dominates the query point. Therefore, the objects cannot be the reverse skyline according to the fundamental of the reverse skyline.

The fully contained or fully non-contained objects in a check area can be clearly distinguished by the lemma 2, but there are also non-fully contained objects, overlapped objects like an object k in Fig. 3(b). Candidate points including the overlapped objects cannot be simply reduced at query results because of the same reason with an overlap case of the search area. Therefore, their preference is re-computed instead of remaining as candidates.

C. A Method to Re-compute Preference of Objects An arbitrary spatial object has a same preference in its

region, which means that we must respect all conditions of the object because they are important equally. Therefore, if an object has potential as the reverse skyline partially, which means that it is non-fully included in a search area of some candidate or it non-fully includes other object in its check area, we expect that the object has preference as much as it has potential. According to above theory, we can re-compute their preference. This feature of our algorithm gives opportunity that users can choose precedent among query results, which means that they differently appreciate a value of the results and can use optionally.

Fig. 4 shows overlapping relations with the search area and the check area. In case of the search area, an object u and an object k can be the reverse skyline, but their preference is re-computed; the object u takes forth and the object k takes eighty five, which is a percentage of a region contained into the search area of the candidate object p. After then, the search area is modified by adopting k’s Sp and u’s Sp. In case of the check area, the candidate object p can be the reverse skyline but its preference is also re-computed with twenty, which is a percent subtracted the largest included percent (e.g. eighty percent of the object u) among overlapped objects from the percent of the object p.

IV. THE PROCEDURE OF THE PROPOSED ALGORITHM In prior sections, we introduce two pruning methods and a

method to re-compute preference according to overlapping relations. We can efficiently process reverse skyline queries by using this methods. Algorithm 1 shows a whole procedure about our proposed algorithm.

V. EXPERIMENTS AND ANALYSIS In this section, we conduct extensive experiments and

analyze the results to evaluate a performance of the proposed approach. We specially focus what factors give the effect to

Tp

q

p

Cp

s

pruning linefor CA(q,p)

������

������

TpTp’

Cp’ CPx = qx

q

u

k

Tp

q

Tp

Tp’ Tp

k

p

p.Sp

����������

����������

��������������������

��������������������

u

85%

Sp

40%

k.Sp

u.Sp

���������������

���������������

Tp

q

p

k

������������

������������

Cp

u

55%

80%

(a) The first case of CA(q,p) (b) The second case of CA(q,p) (a) Overlap relation with search area (b) Overlap relation with check area

Figure 3. A check area about object p Figure 4. Overlap relation and preference

the query result and the query response time. For the experiments, we generated integer datasets having various types by using our generation program. All experiments were performed on a 32-bit Windows XP PC with an Intel Quad Q6600 2.4GHz CPU, 4GB of memory, and a SATA ST340062 400GB hard disk.

Fig. 5 shows the number of result objects and the query response time with respect to a containing rate of shape objects in a dataset. It shows that a dataset having many more point objects takes many more results and that a dataset having many more shape objects takes the higher processing time. In other words, the query response time is increased according to increasing of the number of the shape objects in a dataset, but the size of results is decreased; a dataset having only shape objects takes 38 result objects and spends 4.5s on the processing time, but a dataset having only point objects takes 283 result objects and spends 22.7s. The rate is also an important factor to a distribution of the results with the preference. Fig. 6 shows the preference distribution of the

results according to the containing rate; a dataset having many more shape objects takes many more results having a low preference although the whole result size is decreased.

Fig. 7 shows the query response time of two algorithms over 100K objects under the uniform distribution; RSSA algorithm can handle only point objects. The performance of RSASO algorithm over a dataset having only point objects is superior to those of RSSA in all dimensions. In case a dataset having only shape objects, RSASO outperforms RSSA under high-dimensional datasets.

VI. CONCLUDE Although there are various algorithms for processing

reverse skyline queries, they cannot handle the queries over a dataset which has arbitrary spatial objects. In this paper, we propose an efficient method for processing reverse skyline queries, which is can cover arbitrary shape objects. As a proposed algorithm is based on a pruning approach which reduces non-reverse skyline points, it does not pre-compute anything unlike prior algorithms based on pre-processing. It minimizes the number of dist accesses by pruning unnecessary traverses of nodes and reduces a search space to retrieve reverse skyline points. Furthermore, it makes that user who requests the query can choose precedence among the query results. To prove the superiority of the proposed method, extensive experiments under various settings are conducted. As a result, we detected that an increase of containing rate of shape objects makes a decrease of the number of result objects and an increase of a query response time, and our algorithm outperforms RSSA algorithm over a high-dimensional large-size database regardless of an object type.

REFERENCES [1] E. Dellis and B. Seeger, “Efficient Computation of Reverse Skyline

Queries,” Very Large Data Base, pp. 291-302, 2007. [2] X. Lian and L. Chen, “Monochromatic and Bichromatic Reverse Skyline

Search over Uncertain Databases,” ACM Special Interest Group on Management Of Data, pp. 213-226, 2008.

[3] S. Borzsonyi, D. Kossmann, and K. Stocher, “The Skyline Operator,” International Conference on Data Engineering, pp. 421-430, 2001.

[4] D. Papadias, Y. Tao, G. Fu and B. Seeger, "An optimal and progressive algorithm for skyline queries," ACM Special Interest Group on Management Of Data, pp.467-478, 2003.

[5] X. Lian and L. Chen, “Dynamic Skyline Queries in Metric Spaces,” ACM International Conference Proceeding Series, vol.261, pp.333-343, 2008.

[6] D. Papadias, Y. Tao, G. Fu and B. Seeger, "Progressive skyline computation in database systems," ACM Transactions on Database Systems, pp.41-82, 2005.

Algorithm 1. The RSASO Algorithm 1: procedure RSASO (R*-tree R, Query point q) 2: RSL; // result list, C; // candidate list 3: Insert all entries of a root in a heap sorted by a distance from q 4: while the heap is not empty do 5: Remove a top entry e from the heap 6: if e is clearly and globally dominated by some object in C then 7: Retrieve objects overlapped by e in their check space among

already found candidates in RSL 8: If existing, store the overlapped percent 9: discard e

10: else [e is not globally dominated] 11: if e is an intermediate entry then 12: for all child ei of e do 13: Check that ei is globally dominated by C 14: If yes, discard ei. Otherwise, insert ei into the heap 15: end for 16: else [e is a leaf entry] 17: Check that e is not contained into a search space 18: If yes, discard ei. Otherwise, insert e into RSL and alter the

search space with e. (Then, if e is overlapped into the search space, e’s preference is recomputed)

19: Retrieve objects overlapped by e in their check space among already found candidates in RSL

20: If existing, store the overlapped percent 21: end if 22: end if 23: end while 24: Re-compute preferences of result objects into RSL according to

collected overlapped percent with the check space of other objects 25: RETURN RSL 26: end procedure

Figure 6. Distribution of results according to the preference

(uniform, 5d, 100K)

Figure 7. Query response time RSASO vs. RSSA (uniform, 100K)

(b) Result size (b) Query response time

Figure 5. Rate of shape objects vs. result size and processing time (uniform, 100K, 5d)