66.query planning for continuous aggregation

Upload: prabakar663

Post on 06-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    1/14

    IEEE TRANSACTIONS ON JOURNAL KNOWLEDGE AND DATA ENGINEERING, MANUSCRIPT ID 1

    Query Planning for Continuous AggregationQueries over a Network of Data Aggregators

    Rajeev Gupta, and Krithi Ramamritham, Fellow IEEE

    AbstractContinuous queries are used to monitor changes to time varying data and to provide results useful for online

    decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for

    example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In these queries a client

    specifies a coherency requirement as part of the query. We present a low-cost, scalable technique to answer continuous

    aggregation queries using a network of aggregators of dynamic data items. In such a network of data aggregators, each data

    aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic web-page are served by

    one or more nodes of a content distribution network, our technique involves decomposing a client query into sub-queries and

    executing sub-queries on judiciously chosen data aggregators with their individual sub-query incoherency bounds. We provide a

    technique for getting the optimal set of sub-queries with their incoherency bounds which satisfies client querys coherency

    requirement with least number of refresh messages sent from aggregators to the client. For estimating the number of refresh

    messages, we build a query cost model which can be used to estimate the number of messages required to satisfy the client

    specified incoherency bound. Performance results using real-world traces show that our cost based query planning leads to

    queries being executed using less than one third the number of messages required by existing schemes.

    Index TermsAlgorithms, Continuous queries,Distributed query processing, Data dissemination, Coherency, Performance.

    1 INTRODUCTION

    xxxx-xxxx/0x/$xx.00 200x IEEE

    Digital Object Indentifier 10.1109/TKDE.2011.12 1041-4347/11/$26.00 2011 IEEE

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    2/14

    2 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

    1.1 Aggregate Queries and their Execution

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    3/14

    GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 3

    1.2 Problem Statement and Contributions

    ))(()(1

    ==

    qn

    iqiqiq wtvtV

    qiw

    qqi

    n

    iqiqi Cwtutv

    q

    =

    |))()((|1

    =

    qn

    iqqiqi CwC

    1

    )(

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    4/14

    4 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

    1.3 Outline of the Paper

    2 DATA DISSEMINATION COST MODEL

    2.1 Incoherency Bound Model

    2/1)|)()(|( CCtutvP >

    Figure 1. Number of pushes vs. incoherency bounds

    Symbols Description

    A Set of aggregators in the network.

    N Number of data aggregators (DAs).

    D Set of data items disseminated by the network.

    C Incoherency bounds of data items.

    ak kth data aggregator, 1kN

    Dk Set of data items disseminated by the kth DA.

    dkj jth data item disseminated by the k

    th DA.

    tkj Incoherency bound which akcan ensure for dkj.

    q Client query.

    Cq Incoherency bound for q.

    nq Number of data items in q.

    dqi ith data item of the query q.

    vqi(t) Value of the ith data item of the query q at time t.

    wqi Weight of the data item dqi for the query q.

    Vq(t) Value of the query q at time t.

    qk Sub-query ofq to be executed at ak .

    Cqk Incoherency bound ofqk.

    Rq Sumdiffof the query q.

    Correlation measure between data items

    Query satisfiability parameter

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    5/14

    GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 5

    2.2 Data Dynamics Model

    = iiis ssR || 1

    2.3 Combining Data Dissemination Models

    (a) C=0.001 (b) C=0.01 (c) C=0.1

    Figure 2. Number of pushes vs. data sumdiff

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    6/14

    6 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

    3 COST MODEL FOR ADDITIVE AGGREGATIONQUERIES

    |||| 11 +=+= iiqiipqqppdata qqwppwRwRwR

    |)()(| 11 += iiqiipquery qqwppwR

    3.1 Modeling Correlation between Data Dynamics

    )2( 22222 qqppqqppquery RwRwRwRwR ++

    ))()(/()))(((2

    12

    111 = iiiiiiii qqppqqpp

    3.2 Query based Normalization

    )2/()2(2222222

    qpqpqqppqqppquery

    wwwwRwRwRwRwR ++++=

    qpqp wwww 2/122 ++

    +

    +

    =

    = = =

    = ==

    q q q

    q qq

    n

    i

    n

    i

    n

    ijjqjqiijqi

    n

    i

    n

    ijjjiqjqiij

    n

    iiqi

    Q

    www

    RRwwRw

    R

    1 1 ,1

    2

    1 ,11

    22

    2

    3.3 Validating the Query Cost Model

    4 QUERY PLANNING FOR WEIGHTEDADDITIVE AGGREGATION QUERIES

    Figure 3: Query cost validation with varying (a) Sumdiff (b) Incoherency bound

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    7/14

    GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 7

    kjt

    kj

    d

    qiqidw qiw qid

    2qkC

    ==

    N

    k qk

    qk

    qC

    RZ

    12

    qkid

    qkid

    qqk CC

    kjt kjd

    )( kjn

    qiqjqiqk ddtwT

    qk

    =

    qkqk TC

    4.1 Finding Optimal Query Plan is NP-hard

    4.2 Optimal Allocation of Query IncoherencyBound among Sub-queries

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    8/14

    8 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

    +==

    N

    kqqk

    N

    kqkqk CCCR

    11

    2 )()/(

    qkC

    ==

    N

    kqkqkqqk RRCC

    1

    3/13/1 )/(

    3/1qkR

    4.3 Greedy Heuristics for Deriving the Sub-queries

    4.3.1 Minimum Cost Heuristic

    3/1

    kR

    ==

    N

    kqk

    q

    q RC

    Z1

    3/1

    3/2

    3/1 1

    3/1

    qkR

    3/1mR

    qkqk TC

    4.3.2 Satisfiability of sub-query incoherency bound

    3/1qkR

    Figure 4: Greedy algorithm for query plan selection

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    9/14

    GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 9

    )(3/1

    3/1

    mq

    mm

    RC

    TR

    +

    3/1/ mqm RCT

    3/1mqRC

    4.3.3 Maximum Gain Heuristic

    iidw

    122

    +

    =

    i j ijijiijii

    iii

    m

    RRwwRw

    Rw

    G

    3/1

    '

    )(

    mq

    iii

    mmRC

    Tw

    GG

    =

    5 PERFORMANCE EVALUATION

    5.1 Comparison of Algorithms

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    10/14

    10 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

    5.2 Effects of Algorithmic Parameters

    5.2.1 Effect of data dynamics

    5.2.2 Effect of correlation between data dynamics

    Figure 5: Performance evaluation of algorithms

    (a) Query size=3 (b) Query size=5

    Figure 6: Effect of data sumdiffon sub-query size

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    11/14

    GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 11

    5.2.3 Effect of query satisfiability parameter

    5.3 Overheads of Query Planning

    6 QUERY PLANNING FOR MAX QUERIES

    )1),(max()( qqiq nitvtV =

    qqi niiCC 1,,

    (a) Comparison of algorithm (b) Effect ofdata dynamics orderon performance

    Figure 7: Effect of on query satisfiability Figure 8: Performance of MAX queries

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    12/14

    12 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

    6.1 Query Cost Model

    >= =

    q qn

    i

    n

    ijjqijiiq niRxxpRR

    1 ,1

    )1|max())((

    6.2 Optimized Execution

    6.2.1 Optimal query planning problem is NP-hard

    6.2.2 Greedy Heuristics

    =qid

    diqdiq RRG )max(

    6.2.3 Simulation results

    7RELATED WORK

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    13/14

    GUPTA ET AL.: QUERY PLANNING FOR CONTINUOUS QUERIES IN DYNAMIC DATA DISSEMINATION NETWORKS 13

    8DISCUSSION &CONCLUSION

    o

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/3/2019 66.Query Planning for Continuous Aggregation

    14/14

    14 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

    o

    o

    REFERENCES

    [8] Query cost model validation for sensor data.www.cse.iitb.ac.in/~grajeev/sumdiff/RaviVijay_BTP06.pdf.

    Rajeev Gupta got his BTech from Indian Institute of Technology (IIT)Kharagpur, India in Electronics Engineering. He is currently pursuinghis PhD from IIT Mumbai, India in Computer Science. He is workingas Researcher at IBM Research, New Delhi, India for last 10 years.Krithi Ramamritham received the PhD in Computer Science fromUniversity of Utah and then joined the University of Massachusetts.He is currently at IIT Bombay as Professor in the Department ofComputer Science. He is a fellow of IEEE and a fellow of ACM. Hehas served on numerous program committees of conferences andworkshops. His editorial board contributions include IEEE Transac-tions, the Real Time Systems Journal, and the VLDB Journal.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.