DAP Join: Produce Massive and Immediate Result in
Multi Join Query using Flushing
Pranjal Bansal1 Dr. Rakesh Rathi
2 Mr. Vinesh Jain
3
[email protected] [email protected] [email protected]
1,2,3 – CS & IT Department, Govt. Engineering College, Ajmer, Rajasthan, India
Abstract- This paper introduced a method for producing massive
and immediate result in multi-join query (DAP Join, for short).
Mostly previous research are based on single join operator while
DAP join is based on multiple join operators. DAP join
distinguishes itself from all previous techniques by concluding
benefits of Hash Merge join, Rate Based Progressive Join and
State Spilling and its own new approach. DAP join employs a
new flushing technique to move N amount of data from memory
to disk when memory allotment is exhausted and move N/2
amount of data when data arrival rate is slow in online
environment. DAP join moves data to disk which is least useful
and use Symmetric Hash Join in in-memory join makes it
efficient in order to maximize overall throughput and produce
early result.
Keywords: DAP Join; Hash Merge Join; RPJ; State Spilling;
Operator State Manager;
1. Introduction
Design of traditional join algorithm (e.g. [1], [2]) is based on
assumption that all input data is available before join
operation. Those algorithms are very useful to produce overall
query result, but they are not suitable for applications that
require result immediately. Some new non-blocking join
algorithm suitable for such application like; i) when input data
arrives in online environment ii) user is interested in first few
result, rather than waiting for complete result.
Previously, to achieve the goal of producing massive result,
several research efforts have been dedicated to the
development of non-blocking join operator; like XJoin [3],
Hash Merge Join [4], Rate-based Progressive Join (RPJ) [5]
and state spilling method [6]. These algorithms are based on
join operation on single join operator. The main focus of these
algorithms is to produce massive result locally at each
operator. It is not compulsory that optimize local result at each
operator can always produce the maximum throughput at
overall result.
In this paper we propose a new efficient non-blocking join
algorithm (DAP Join; in short) for producing immediate result
in multi-join query plan (extension in Adaptive Global Flush
Algorithm). DAP Join is differ from previous technique; it
explores a new approach to flush less memory when data
arrival rate is slow in order to maximize throughput in multi-
join query. DAP Join selects the N amount of data to be
flushed to disk based on expected contribution of data in
previous result with least contribution.
DAP Join distinguishes itself from all other non-blocking join
algorithm ([3], [4], [5], [6], [9]) in two new aspects: i) DAP
Join employs new memory flushing technique, which flush N
amount of data when memory is full (data arrival rate is
normal) or flush N/2 amount of data when data arrival rate is
slow. DAP Join maximize overall query throughput for a
multi-join query plan predicting the throughput contribution of
in-memory data. Ii) DAP Join employs a new operator state
manager that has ability to switch any operator between
joining in-memory data and on-disk data by considering most
beneficial state in order to query throughput. Such operator
state manager does not exist in previous techniques ([3], [4],
[5], [6], [9]).
2. Related Work
Various non-blocking join algorithm ([3], [4], [5], [6]) uses the
symmetric hash join which is the most widely used non-
blocking join algorithm for producing immediate result. These
methods based on; flush certain percentage of hash buckets to
disk either individually [3], [5] or in groups [4], [9] when
memory become full. All of these algorithms used symmetric
hash join in order to achieve non-blocking behavior. The main
difference between them in the flushing algorithm used to free
memory.
These techniques focus only on the case of single join operator
while DAP Join extends with multi-join query plan. The
closest work to DAP Join is the state spilling method [6]; the
only work related with multi-join query plan. The main idea of
state spilling is to score each hash partition based on
contribution in past query result and to flush hash partition
group with the lowest score.
Hash Merge Join (HMJ) [4] main idea is to minimize time to
produce first few result and produce join result if two sources
of operator already blocked. HMJ consider only single join
operator. HMJ was implemented taking benefits of XJoin [3]
and PMJ [8] with two phases: Hashing and Merging. RPJ [5]
was proposed using two policies: Data Arrival Pattern and
Data Properties. RPJ also consider only single join operator.
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
978-1-4673-5758-6/13/$31.00 © 2013 IEEE 356
DAP Join is concluding the benefits of all three ([4], [5], [6])
and propose its new approach with avoiding drawbacks of
these techniques. First, DAP Join employs a new flushing
technique that take both input and output characteristics at
each join operator which helps to predict the contribution of
data in hash buckets. Second, DAP Join employ an operator
state manager that switches operators in between in-memory,
on-disk resident data or in block state, on the basis of which
state is most beneficial to produce massive result for efficient
throughput.
Figure 1 Overview of the DAP Join
Figure 2 example Plan
3. DAP Join: An Overview
This section gives an overview of the DAP Join Algorithm,
which have following new methods:
i) a new memory flushing algorithm with goal of flush N
amount of data which contributes least in overall throughput
and flush N/2 if data arrival rate is also slow.
ii) An operator state manager with goal of placing each
operator in a state that will affect result throughput. Each
operator can be place in an in-memory, on-disk or blocking
state.
Figure 1 provides DAP Join state diagram with new aspect
highlighted in bold. In this work we consider left-deep query
plan with n join operators (n>1) as depicted in figure 2. As
depicted in figure 1 when a tuple SR arrives at input buffer
from source R of operator O. SR will temporarily store in
buffer if operator O is not currently in memory and will wait
till O brought back. Otherwise SR will produce immediate
result by joining in-memory data. When memory becomes
full, DAP Join free N memory spaces from in-memory to disk.
Operator state manager (depicted by bold diamond) is core of
DAP Join. Operator state manager place each join operator in
one of three states: i) joining in-memory data ii) joining on-
disk resident data iii) Less priority. Operator state manager can
switch a join operator from a one state to another at any time.
There are four components of DAP Join:
A. Flush the Memory
To produce immediate result in online environment DAP Join
used the symmetric hash join [10]. If memory becomes full
then a certain amount of data (N) is flushed to disk to make
room for new input; to producing the immediate result.
Another aspect is that if data arrival rate is slow then N/2
memory is being flushed instead of size N. In order to
expecting that data arrival rate is slow or normal, we observe
total input arrives over a time interval [ti, tj]; (i < j) and fixed a
minimum number of inputs in a variable that will decide
whether arrival rate is slow or not. We propose a DAP Join
memory flush algorithm that aims to produce immediate and
massive throughput in multi-join query plan.
DAP Join memory flush algorithm flushes corresponding hash
partition from both hash tables called partition group. The
main idea behind this algorithm is to scoring partitions based
on their expected contribution to overall result and finally
flushes the partitions with the lowest score.
To achieve this goal DAP Join consider three characteristics
together from previous flushing techniques:
i) Contribution in global result: is calculated for each
partition group. Number of overall result that has been
produced by each partition is the contribution in global result
by particular partition.
ii) Arrival pattern of data: consider changes in data arrival
rate. Higher data rate for a join operator (its partition group)
will contribute more as well as at low data arrival rate, an
operator contribute less in overall result throughput.
iii) Properties of data: i.e. whether data is stored or join
attribute distribution.
DAP Join memory flush algorithm overcomes previous
flushing techniques drawback by considering all three
properties together.
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
978-1-4673-5758-6/13/$31.00 © 2013 IEEE 357
B. Operator state manager
The goal of operator state manager to place each operator in a
state which will most beneficial for that particular operator in
term of produces massive result to achieve efficient query
throughput.
Operator state manager traverse top to bottom in pipeline of
multi-join query and determine operator O which will produce
massive result in the on-disk state compare to its in-memory
state and also consider that operator should be nearest operator
from root. If such type of operator exists, it will immediately
place to on-disk state. All operators above O in the pipeline
continue process in memory and all operators below O in the
pipeline placed to low priority (block) state.
Operator state manager is differ from DAP Join memory flush
algorithm that the goal of DAP Join memory flush algorithm is
to predict least valuable data from memory then flush it to
disk. But in online environment due to changing nature in
query, data which is least valuable may turn into valuable
later. Thus operator state manager used to find on-disk
resident data that will be most beneficial to the overall
throughput.
Figure 3 Symmetric hash Join
Figure 4 On-Disk Merge
C. In-memory join
DAP Join uses Symmetric Hash Join Algorithm [10] to
produce immediate result. Figure 3 gives main idea of
Symmetric Hash Join. There are two input sources (R and S)
maintains hash table with hash function h and n buckets.
When a tuple r arrives from R, its hash value h(r) used to
probe the hash table S and produce join result then r is stored
in hash bucket h(rR) of source R. same process occurs when
input tuple arrives from S.
D. On-disk join
DAP join uses sort merge join on the operators which are in
on-disk state (e.g. [4], [6]). Sort Merge Join can be understand
by an example depicted in figure 4, in which hash partition h1
flushed twice (corresponding from both hash table R and S).
The flushed partition were a1,1 and b1,1 and second flushed
partition were a1,2 and b1,2 . In on-disk joining we join a1,1 with
b1,2 and a1,2 with b1,1. Partitions a1,1 with b1,1 and a1,2 with b1,2
were already joined while in memory, then no need to merge
them. Major benefit of using sort merge join is that; not
require timestamp for removals of duplicate result.
Class Statistics Definition
Size partitionSizeRi Size of hash partition i for input R
groupSizei Size of partition group i
tupleSizei Tuple size for input R
Input inputtotal Total input count
uniquej No. unique values in partition group i
prtInputSj Input count at partition j for input S
Output outLocali Local output of partition group i
outGlobali Global output of partition group i
Table 1 Statistics
Figure 5 Flush Example Plan
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
978-1-4673-5758-6/13/$31.00 © 2013 IEEE 358
4. DAP Join memory flush algorithm
The main idea of DAP Join memory flush algorithm is to
collect and observe statistics (summarize in table 1) during
query runtime, which helps algorithm to choose least useful
partition group. After flushing of least useful groups from
memory, new arrival data join with more useful partitions then
it will increase the probability to produce high result.
Collected and observe statistics can be grouped into following
classes:
Size statistics: observe data inside particular operator at any
particular time. For each operator O we keep size of each hash
partition size i at each input R, denoted partitionSizeRi. As
depicted in figure 5 the operator joinAB has two partitions
group 1 and 2 (i.e. four partitions A1, A2, B1 and B2) with sizes
partitionSizeA1=150, partitionSizeB1=30, partitionSizeA2=100
and partitionSizeB2=15 and respectively their group sizes will
be groupSize1 = partitionSizeA1 + partitionSizeB1 = 150+30 =
180 and groupSize2 = partitionSizeA2 + partitionSizeB2 =
100+15 = 115. Assuming tuple size is constant at each join
input (i.e. tupleSizeA=10 and tupleSizeB=8)
Input statistics: it observes the properties of incoming data at
each operator. In online environment due to dynamic nature of
data, the input statistics collected over a time interval [ti,tj] (i <
j). For each operator we observe total number of input tuples
receive from all inputs inputtotal and also observe statistics
uniquei which indicates the number of unique join attributes
values. Finally for each input R and hash partition I, we
observe number of tuples received at each partition i
partitionInputRi. For example we assume following values as
inputtotal=100, unique2=5, partitioninputA2=6 and partitionB2=4.
Output statistics: it observes result produce by the join
operator. Like input statistics output statistics also collected
and updated over a predefined time interval. We observe two
output statistics: local output (outLocali) and global output
(outGlobali) for each partition group i. from figure 5 we can
understand that all tuples which flow from JoinAB to
JoinABC are considered local output of JoinAB. Contribution
of JoinAB in output of JoinABC (local for JoinABC) is the
global output of JoinAB.
DAP Join memory flush algorithm:
DAP join memory flush algorithm takes parameter N,
representing the amount of memory that must flush to disk.
We are taking another parameter I, represents the number of
tuple that will used to compare with incoming tuples in a
given period of time to estimate data arrival rate is slow or not.
DAP Join memory flush algorithm calculates both global
contribution for each partition and data arrival rate based on
past query result and flush the least useful partition to the disk.
By using input statistics we calculated expectedpartitioninputRi
and using output statistics calculate
expectedlocalpartitionresultRi. Then we calculate
expectedGlobalResulti and assign groupScorei to each
partitionGroupi. Finally if data arrival rate is normal then flush
partition group from R with lowest groupScorei until N have
been flushed to disk, else if data arrival rate is slow flush N/2
memory with lowest groupscorei.
In order to calculate expectedDataArrivalRate we observe 100
previously arrive data (i.e. time series prediction) and predict
expected data arrival rate for new incoming data. We can
change no. of observation as per condition. Here we take a
constant i to generate loop 100 times for calculation purpose.
Detailed Algorithm can describe as:
1. DAP_Join_memory_flush(N)
2. for all Operators O do
3.A=O.sideA; B=O.sideB; FlushedSize=0;
4. for all Partition Groups i O do
/* Input estimation */
5. expectedpartitionInputAi
= (N.(partitionInputAi/inputtotal))/tupleSizeA
6. expectedpartitionInputBi
= (N.(partitionInputBi/inputtotal))/tupleSizeB
/* output estimation */
7. expectedLocalPartitionResultAi
= ((partitionSizeAi*expectedPartitionInputBi)/uniquei)
8. expectedLocalPartitionResultBi
= ((partitionSizeBi*expectedPartitionInputAi)/uniquej)
9. expectedLocalResultNEWi = (expectedpartitionInputAi
*expectedpartitionInputBi)/uniquei
10. expectedLocalResulti = (expectedLocalPartitionResultAi +
expectedLocalPartitionResultBi + expectedLocalResultNEWi)
/* Data Arrival Rate */
11. overallInputtotal = 0 ;
12. for i=1 to 100
13. t = tj - ti;
14. tj = tj - t;
15. ti = ti - t;
16. overallInputtotal = overallInputtotal + inputtotal[ti,tj]
17. expectedDataArrivalRate = overallInputtotal / 100;
18. end for
/* Partition group scoring */
19. expectedGlobalResulti
= expectedLocalResulti*(outGlobali/outLocali)
20. groupScorei = expectedGlobalResulti/groupSizei
21. end for end for
/* Partition group flushing */
22. If (expectedDataArrivalRate > I ) than
23. while FlushedSize <= N do
24. Psi = partition group from input R with lowest groupScorei
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
978-1-4673-5758-6/13/$31.00 © 2013 IEEE 359
25. Flush P to disk
26. FlushedSize += sizeof(Psi)*tupleSizeR
27. end while
28. else
29. while SizeFlushed <= N/2 do
30. Psi = partition group from input R with lowest groupScorei
31. Flush P to disk
32. FlushedSize += sizeof(Psi)*tupleSizeR
33. end while
For example we assume some values by observing and
calculating them using figure 5. Observed and collected
statistics can be seen in table 2.
unique2 5
partitionInputA2 6
partitionInputB2 4
inputtotal 100
tuplesizeA 10
tuplesizeB 8
N 2000
I 500
expectedPartitionInputA2 [2000*(6/100)]/10=12
expectedPartitionInputB2 [2000*(4/100)]/8=10
expectedLocalPartitionResultA2 (10*100)/5=200
expectedLocalPartitionResultB2 (12*15)/5=36
expectedLocalResultNEW2 (12*10)/5=24
expectedLocalResult2 200+36+24=260
expectedGlobalResult2 260/2=130
Table 2 Observed and Calculated Value (JoinAB)
5. Experimental Result
This section provides experimental results that show the
performance of proposed DAP Join memory flush Algorithm.
Figure 6 Experimental Result 1
We compare this algorithm with most useful non-blocking join
algorithms.
In figure 6 we can see performance of DAP Join memory flush
algorithm with HMJ, RPJ and State Spill method.
In figure 7 we can see comparison with state spill method
which is only previous technique based on multi join query.
Figure 7 shows performance of DAP Join memory flush
algorithm in 5-join query Plan.
Figure 7 Experimental Result 2
6. Conclusion and Future Work
This paper introduce an algorithm to produce massive and
immediate result in multi-join query plan (DAP Join; in short).
Algorithm calculate contribution of data in overall result and
flush least useful data to disk to make room for new incoming
data in online environment. Operator state manager switches
operator in different states to maximize the overall immediate
throughput. Experimental result shows that our proposed
method is more efficient and scalable in compare to previous
non-locking join algorithms when producing immediate and
massive result. Future work can be done on improving the
scoring of partition group; which fulfill the aim to flush least
useful partition. Also estimate optimal data arrival rate which
will help in order to produce efficient and massive result in
multi-join query.
REFERENCES
[1] P. Mishra and M.H. Eich, “Join Processing in Relational Databases,”
ACM Computing Surveys, vol. 24, no. 1, pp. 63-113,1992.
[2] L.D. Shapiro, “Join Processing in Database Systems with Large Main
Memories,” ACM Trans. Database Systems, vol. 11, no. 3, pp. 239-264, 1986.
[3] T. Urhan and M. J. Franklin, “XJoin: A Reactively-Scheduled Pipelined
Join Operator,” IEEE Data Engineering Bulletin, vol. 23, no. 2, pp. 27–33, 2000.
[4] M. F. Mokbel, M. Lu, and W. G. Aref, “Hash-Merge Join: A Nonblocking
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
978-1-4673-5758-6/13/$31.00 © 2013 IEEE 360
Join Algorithm for Producing Fast and Early Join Results,” in ICDE, 2004.
[5] Y. Tao, M. L. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis,
“RPJ: Producing Fast Join Results on Streams through Rate-based
Optimization,” in SIGMOD, 2005.
[6] B. Liu, Y. Zhu, and E. A. Rundensteiner, “Run-time operator state spilling
for memory intensive long-running queries,” in SIGMOD, 2006.
[7] levandoski et al.: on producing high and early result throughput in multi
join query plans, IEEE transaction of knowledge and data engineering, vol. 23, no. 12, december 2011.
[8] J.-P. Dittrich, B. Seeger, D. S. Taylor, and P. Widmayer. Progressive Merge Join: A Generic and Non-blocking Sortbased Join Algorithm. In
Proceedings of the International Conference on Very Large Data Bases,
VLDB, pages 299–310, Hong Kong, Aug. 2002.
[9] S. Viglas, J. F. Naughton, and J. Burger, “Maximizing the Output Rate of
Multi-Way Join Queries over Streaming Information Sources,” in VLDB, 2003
. [10] A. N. Wilschut and P. M. G. Apers, ‘‘Dataflow Query Execution in a Parallel Main-Memory Environment,’’ in PDIS, 1991.
Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
978-1-4673-5758-6/13/$31.00 © 2013 IEEE 361