thecqlconnuousquery$ language:$seman*c$foundaons ......image$sources$(in$order)$ • hp://...

44
The CQL Con*nuous Query Language: Seman*c Founda*ons and Query Execu*on Arvind Arasu, Shivnath Babu, Jennifer Widom Presented by: YoungRae Kim

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

The  CQL  Con*nuous  Query  Language:  Seman*c  Founda*ons  

and  Query  Execu*on  Arvind  Arasu,  Shivnath  Babu,  Jennifer  Widom  

 Presented  by:  Young-­‐Rae  Kim  

Page 2: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

INTRODUCTION  You  are  here!  

Page 3: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

CQL  –    Con*nuous  Query  Language  

Page 4: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

2003:  A  CQL  Odyssey  

 “Many  papers  include  example  con3nuous  

queries  expressed  in  some  declara3ve  language  ….  However,  …  a  precise  language  seman3cs,  …  

o<en  is  le<  unclear.”[1]  

 

Page 5: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

2003:  A  CQL  Odyssey  

 “Furthermore,  very  liDle  has  been  published  to  date  covering  execu3on  details  of  general-­‐

purpose  con3nuous  queries.”  [1]  

Page 6: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

2003:  A  CQL  Odyssey  

 “It  may  appear  ini3ally  that  defining  a  

con3nuous  query  language  over  (rela3onal)  streams  is  not  difficult  ….  

However,  as  queries  get  more  complex  …  the  situa3on  becomes  much  murkier.”  [1]  

Page 7: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

It’s  all  about  the  abstract  seman*cs  

2  data  types  •  Streams  •  Rela*ons  

3  classes  of  operators  •  Stream-­‐to-­‐Rela*on  •  Rela*on-­‐to-­‐Rela*on  •  Rela*on-­‐to-­‐Stream  

CQL  abstract  seman*cs  is  based  on    

Page 8: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

It’s  all  about  the  abstract  seman*cs    –  Goals  

   

1.  Make  it  easy  to  understand;  familiar.    

2.  Simple  queries  should  be  easy  to  write,  compact,  and  shouldn’t  be  visually  deceiving.  

Page 9: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Query  Execu*on  Plans  MaTer  Too!  –  Goals  

1.  Plans  should  consist  of  modular  and  pluggable  components  based  on  generic  interfaces  

2.  An  execu3on  model  that  efficiently  captures  the  combina3on  of  streams  and  rela3ons  

3.  An  architecture  that  makes  performance-­‐based  experimenta3on  easy.  

Page 10: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

A  Roadmap  

Other  Language  Comparisons  

CQL  in  STREAM  

CQL  

Defini*ons  

Introduc*on  

Page 11: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

DEFINITIONS  The  technical  stuff.  

Page 12: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Streams  vs.  Rela*ons  

Streams  •  “a  (possibly  infinite)  bag  (mul3set*)  of  elements    <s,  τ>”[2]  

Rela*ons  •  “A  rela3on  R  is  a  mapping  from  Τ  to  a  finite  but  unbounded  bag  of  tuples  belonging  to  the  schema  of  R.”[3]  

Page 13: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Abstract  Seman*cs  Con*nuous  Seman*cs  •  Assume  a  discrete,  ordered  *me  domain  Τ.  •  Inputs  are  either  streams  or  rela*ons  •  In  discussing  the  result  of  a  con*nuous  query  Q  at  a  3me  τ  s,  there  are  2  possibili*es:  1.  The  outermost  operator  in  Q  is  rela*on-­‐to-­‐stream.  The  

result  of  Q  at  *me  τ  is  S  (the  produced  stream)  up  to  τ.  2.  The  outermost  operator  in  Q  is  stream-­‐to-­‐rela*on  or  

rela*on-­‐to-­‐rela*on.  The  result  of  Q  at  *me  τ  is  R(τ)  (the  produced  rela*on).  

•  Time  only  “advances”  from  τ  from  (τ  –  1)  when  all  inputs  up  to  τ  –  1  have  been  processed.  

Page 14: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

CQL  What  we’re  all  here  for.  

Page 15: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Operators:  Stream-­‐to-­‐Rela*on  

•  Based  on  the  concept  of  a  sliding  window  over  a  stream.  

•  SQL-­‐99  deriva*ve.  •  3  classes:  

1.  Time-­‐based      “S  [Range  T]”  

2.  Tuple-­‐based      “S  [Rows  N]”  

3.  Par**oned      “S  [Par**on  By  A1,  …  ,  Ak  Rows  N]”  

 

Page 16: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Operators:  Rela*on-­‐to-­‐Rela*on  

•  Derived  from  tradi*onal  SQL.  

Page 17: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Operators:  Rela*on-­‐to-­‐Stream  

•  3  operators:  –  Istream  (“insert  stream”)    – Dstream  (“delete  stream”)    – Rstream  (“rela*on  stream”)    

 

Page 18: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Operators:  Example  

       SELECT  Istream(*)      FROM  PosSpeedStr  [Range  Unbounded]      WHERE  speed  >  65  

Rela*on-­‐to-­‐Stream  

Stream-­‐to-­‐Rela*on  

Rela*on-­‐to-­‐Rela*on  

Page 19: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Shortcuts  &  Defaults  

Default  Windows  •  Unbounded  windows  are  applied  to  streams  by  default.  

 Default  Rela*on-­‐to-­‐Stream  Operators  •  An  intended  Istream  operator  may  be  omiTed  from  a  CQL  query.  

Page 20: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Post  Shortcut  Query  Example  

             SELECT  *            FROM  PosSpeedStr            WHERE  speed  >  65  

Page 21: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Equivalences  

•  Important  for  query-­‐rewrite  op*miza*ons  •  All  equivalences  that  hold  in  SQL  with  standard  rela*onal  seman*cs  carry  over  to  the  rela*onal  por*on  of  CQL.  

•  2  stream-­‐based  equivalences  in  CQL:      1.  Window  reduc*on      2.  Filter-­‐window  commuta*vity  

Page 22: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Equivalences:  Window  Reduc*on  

       SELECT  Istream(L)          FROM  S  [Range  Unbounded]          WHERE  C  

 

is  equivalent  to    

       SELECT  Rstream(L)          FROM  S  [Now]          WHERE  C  

Page 23: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Equivalences:  Window  Reduc*on  

       SELECT  Istream(L)          FROM  S  [Range  Unbounded]          WHERE  C  

 

is  equivalent  to    

       SELECT  Rstream(L)          FROM  S  [Now]          WHERE  C  

Page 24: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Equivalences:  Filter-­‐Window  Commuta*vity  

         (SELECT  L            FROM  S            WHERE  C)  [Range  T]  

 

is  equivalent  to    

         SELECT  L            FROM  S  [Range  T]            WHERE  C  

Page 25: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Equivalences:  Filter-­‐Window  Commuta*vity  

         (SELECT  L            FROM  S            WHERE  C)  [Range  T]  

 

is  equivalent  to    

         SELECT  L            FROM  S  [Range  T]            WHERE  C  

Page 26: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Equivalences:  Filter-­‐Window  Commuta*vity  

         (SELECT  L            FROM  S            WHERE  C)  [Range  T]  

 

is  equivalent  to    

         SELECT  L            FROM  S  [Range  T]            WHERE  C  

Page 27: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Time  Management  

•  More  realis*c  condi*ons:  we  make  the  aforemen*oned  assump*on.    – The  network  conveying  the  stream  elements  to  the  DSMS  may  not  guarantee  in-­‐order  transmission  

– Streams  pause  and  restart  •  Use  addi*onal  “meta-­‐input”  to  the  system  to  cope  –  ‘heartbeats’  in  STREAM  

Page 28: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Time  Management:  Heartbeats  

•  A  heartbeat  consists  simply  of  a  *mestamp  τ∈Τ.  

•  Aoer  arrival  of  the  heartbeat,  the  system  will  reject  stream  elements  with  *mestamp  ≤  τ.  

•  Various  ways  to  generate  heartbeats  

Page 29: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Environment  Overview  

Page 30: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

CQL  IMPLEMENTATION  IN  STREAM  Hot  and  STREAM-­‐y.  

Page 31: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

STREAM  Query  Plans  

•  Each  query  plan  runs  con*nuously  and  is  composed  of  3  different  types  of  components:  – Operators  – Queues  – Synopses  

Page 32: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Operators  

•  Read  from  one  or  more  input  queues,  processes  the  input  based  on  its  seman*cs,  and  writes  its  output  to  an  output  queue.  

•  In  STREAM,  every  operator  is  either  a  CQL  operator  or  a  system  operator.  

Page 33: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Operators  

Page 34: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Queues  

•  Connect  its  input  operator  to  its  output  operator.  

•  Stored  en*rely  in  memory.*  

Page 35: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Synopses  

•  Store  the  intermediate  state  needed  by  con*nuous  query  plans.  – E.g.  performing  a  windowed  join  of  two  streams  

•  Many  synopses  are  logical  “stubs”  that  primarily  point  into  other  synopses.  

•  Most  common  use  of  a  synopsis  is  to  materialize  the  current  state  of  a  rela*on.  

•  Also  stored  en*rely  in  memory.*  

Page 36: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Example  Query  Plan  Q1:  

 SELECT  B,  max(A)    FROM  S1  [Rows  50,000]    GROUP  BY  B  

 Q2:  

 SELECT  Istream(*)    FROM  S1  [Rows  40,000],      S2  [Range  600  Seconds]    WHERE  S1.A  =  S2.A  

Page 37: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Query  Op*miza*on  

•  Naïve  query  plan  generator.  •  Commonly  applied  heuris*cs:  – Push  selec*ons  below  joins  – Maintain  and  use  indexes  for  synopses  on  binary-­‐join,  mjoin,  and  aggregate  operators.  

– Share  synopses  within  query  plans  whenever  possible.  

Page 38: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

COMPARISON  WITH  OTHER  LANGUAGES  

We’re  #1.  

Page 39: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Tapestry  

•  Expressed  using  SQL  syntax.  •  Does  not  support  sliding  windows  over  streams  or  any  rela*on-­‐to-­‐stream  operators.  

Page 40: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Tribeca  

•  Based  on  stream-­‐to-­‐stream  operators.  •  Queries  take  a  single  stream  as  input  and  produce  a  single  stream  as  output,  with  no  no*on  of  rela*on.  

Page 41: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Aurora  

•  Difficult  to  compare  the  procedural  query  interface  of  Aurora  against  a  declara*ve  language  (CQL).  

•  Some  dis*nc*ons:  – Aggrega*on  operators  defined  by  user-­‐defined  func*ons  and  have  op*onal  parameters  set  by  the  user  

– Aurora  does  not  explicitly  support  rela*ons.  

Page 42: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

TelegraphCQ  (Stream-­‐Only)  

•  Note  that  we  can  derive  a  stream-­‐only  language  in  CQL  anyways.  

•  Mo*va*ons  for  CQL’s  dual  approach:  – Make  it  easy  to  understand;  familiar.  – More  intui*ve  queries.  – Use  of  both  rela*ons  and  streams  cleanly  generalizes  materialized  views.  

Page 43: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

THE  END  El  finito.  

Page 44: TheCQLConnuousQuery$ Language:$Seman*c$Foundaons ......Image$Sources$(in$order)$ • hp:// finkelstein.com/pptblog/wordpress/wpEcontent/uploads/2011/theEbeginningEroadEsign.jpg$ •

Image  Sources  (in  order)  •  hTp://www.ellenfinkelstein.com/pptblog/wordpress/wp-­‐content/uploads/2011/the-­‐beginning-­‐road-­‐sign.jpg  •  StartFragment  EndFragment  •  hTp://medcitynews.com/wp-­‐content/uploads/Black-­‐Box-­‐Art.png  •  StartFragment  EndFragment  •  hTp://www.harborfreight.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5|8d27136e95/i/

m/image_16157.jpg  •  StartFragment  EndFragment  •  hTp://bethesdalutherancommuni*es.org/view.image?Id=2216  •  StartFragment  EndFragment  •  hTp://wiki.treck.com/images/d/d4/Fig1.39_Sliding_Window_Protocol.gif  •  StartFragment  EndFragment  •  U.  Srivastava  and  J.  Widom.  Flexible  *me  management  in  data  stream  systems.  Page  264  •  StartFragment  EndFragment  •  hTps://edc2.healthtap.com/ht-­‐staging/user_answer/avatars/211738/large/

What_is_a_Steam_Shower_8666546_460.jpeg?1386553186  •  The  CQL  Con*nuous  Query  Language:  Seman*c  Founda*ons  and  Query  Execu*on,  Page  25  •  The  CQL  Con*nuous  Query  Language:  Seman*c  Founda*ons  and  Query  Execu*on,  Page  22  •  StartFragment  EndFragment  •  hTp://28.media.tumblr.com/tumblr_le900rkVhp1qzexono1_500.png  •  hTps://40.media.tumblr.com/3290de4ee413685c60f08dc775310524/tumblr_mydnbyvVBr1rz41fxo2_400.jpg