![Page 1: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/1.jpg)
Streaming SQL with
PipelineDB
Derek Nelson
![Page 2: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/2.jpg)
What is PipelineDB?
![Page 3: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/3.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
![Page 4: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/4.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
![Page 5: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/5.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.5
![Page 6: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/6.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.5
● No special client libraries
![Page 7: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/7.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.5
● No special client libraries
● Free and open-source (GPLv3)
![Page 8: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/8.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.5
● No special client libraries
● Free and open-source (GPLv3)
● (30-second demo)
![Page 9: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/9.jpg)
When is PipelineDB not useful?
![Page 10: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/10.jpg)
When is PipelineDB not useful?
● SQL isn’t a fit
● Ad-hoc on granular data
![Page 11: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/11.jpg)
When is PipelineDB useful?
![Page 12: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/12.jpg)
When is PipelineDB useful?
● High throughput aggregations (realtime reporting/analytics)
● Computations over sliding windows (continuous monitoring/ops)
● Queries are known in advance
![Page 13: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/13.jpg)
100,000 feet
Produce
Process
Consume
![Page 14: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/14.jpg)
100,000 feet
Process
ConsumeSQLProduce
![Page 15: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/15.jpg)
100,000 feet
Process
ConsumeSQLProduce
AggregationFilteringSliding windows
![Page 16: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/16.jpg)
100,000 feet
Process
ConsumeSQLProduce
AggregationFilteringSliding windows
= Reduction
![Page 17: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/17.jpg)
Why did we build it?
![Page 18: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/18.jpg)
Produce
Process
Consume
![Page 19: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/19.jpg)
ProduceConsume
![Page 20: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/20.jpg)
ProduceConsume
PipelineDBcontinuous view
![Page 21: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/21.jpg)
ProduceConsume
PipelineDB
Simplicity is nice,but what else?
continuous view
![Page 22: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/22.jpg)
Benefits of continuous SQL on streams
● Aggregate before writing to disk
![Page 23: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/23.jpg)
Benefits of continuous SQL on streams
● Aggregate before writing to disk
total data ingested
database size
CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream
![Page 24: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/24.jpg)
Benefits of continuous SQL on streams
● Aggregate before writing to disk
total data ingested
database size
CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream
(winning)
![Page 25: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/25.jpg)
Benefits of continuous SQL on streams
● Sliding window queries
● Any information outside of the window is excluded from results and deleted from disk
● Essentially automatic TTL
CREATE CONTINUOUS VIEW v WITH (max_age = ‘1 hour’) AS SELECT COUNT(*) FROM stream
![Page 26: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/26.jpg)
Benefits of continuous SQL on streams
CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream
● Probabilistic computations on infinite inputs
● Streaming Top-K, Percentiles, distincts, large set cardinalities● Constant space● No sorting● Small margin of error
![Page 27: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/27.jpg)
Internals (part 1/2)
Streams andWorkers
![Page 28: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/28.jpg)
Streams
● Internally, a stream is Foreign Table
CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);
![Page 29: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/29.jpg)
Streams
CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);
● Internally, a stream is Foreign Table
● System-wide Foreign Server called pipeline_streams
● stream_fdw reads from/writes to the Stream Buffer
● No permanent storage
● Stream rows only exist until they’ve been fully read
![Page 30: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/30.jpg)
stream buffer query on microbatch incremental table update
![Page 31: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/31.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
![Page 32: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/32.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
![Page 33: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/33.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
![Page 34: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/34.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
HeapTuple {0,1,0,1,1}
![Page 35: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/35.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
HeapTuple {0,1,0,1,1}
HeapTuple {1,1,1,1,1}
![Page 36: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/36.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
HeapTuple {0,1,0,1,1}
HeapTuple {1,1,1,1,1}✗
![Page 37: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/37.jpg)
stream buffer query on microbatch incremental table update
/* At Postmaster startup time ... */worker.bgw_main = any_function;worker.bgw_main_arg = (Datum) arg;
RegisterDynamicBackgroundWorker(&worker, &handle);
![Page 38: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/38.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
...
HeapTuple
SELECT count(*), avg(x) FROM stream
![Page 39: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/39.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
...
HeapTuple
SELECT count(*), avg(x) FROM stream
AGG
stream_fdw#GetStreamScanPlan
while (!BatchDone(node)){ tup = PinNext(buf) yield MarkAsRead(tup)}
1000
count
{1000, 4000}
avg
microbatch_result
![Page 40: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/40.jpg)
Worker proc 0
Worker proc 1
Worker proc ...
Worker proc n
tuples round-robin’d acrossn worker procs
Worker process parallelism
Stream buffer
![Page 41: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/41.jpg)
Internals (part 2/2)
IncrementalUpdates
![Page 42: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/42.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
![Page 43: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/43.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
![Page 44: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/44.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
● No changes to pg_aggregate catalog table or existing aggregate functions
![Page 45: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/45.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
● No changes to pg_aggregate catalog table or existing aggregate functions
● User-defined aggregates just need a combinefunc to be combinable
CREATE AGGREGATE combinable_agg(x)( sfunc=sfunc, finalfunc=finalfunc, combinefunc=combinefunc,);
![Page 46: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/46.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
![Page 47: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/47.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
{1000, 4000}
avg
1000
count
{5000, 10000}5000
combine()
![Page 48: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/48.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
{1000, 4000}
avg
1000
count
{5000, 10000}5000
combine()
existing on-disk row
![Page 49: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/49.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
{1000, 4000}
avg
1000
count
{5000, 10000}5000
combine()
{6000, 14000}
avg
6000
count
updated_result
existing on-disk row
![Page 50: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/50.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
![Page 51: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/51.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));
![Page 52: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/52.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));
set_values(lookup_plan, values)
![Page 53: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/53.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));
set_values(lookup_plan, values)
existing = PortalRun(lookup_plan, ...)
/* now we’re reading to combine these on-disk tuples with the incoming batch result */
![Page 54: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/54.jpg)
stream buffer query on microbatch incremental table update
● This query needs to be as fast as possible
● Continuous views indexed on a 32-bit hash of grouping
● Pro: maximize cardinality of the index keyspace, great for random perf
● Con: must deal with collisions programmatically
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
![Page 55: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/55.jpg)
stream buffer query on microbatch incremental table update
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
● If the grouping contains a time-based column, we can do better
CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day
![Page 56: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/56.jpg)
stream buffer query on microbatch incremental table update
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
● If the grouping contains a time-based column, we can do better
● These continuous views are indexed with 64 bits:hash of grouping
Timestamp from group (32 bits) Regular 32-bit grouping hash
CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day
![Page 57: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/57.jpg)
stream buffer query on microbatch incremental table update
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
● If the grouping contains a time-based column, we can do better
● These continuous views are indexed with 64 bits:hash of grouping
● Pro: most incoming groups will have a similar timestamp, so better index caching
● Con: larger index footprint
Timestamp from group (32 bits) Regular 32-bit grouping hash
CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day
![Page 58: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/58.jpg)
stream buffer query on microbatch incremental table update
microbatch result generated from stream by worker✔
![Page 59: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/59.jpg)
stream buffer query on microbatch incremental table update
microbatch result generated from stream by worker
existing result retrieved from disk
✔✔
![Page 60: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/60.jpg)
stream buffer query on microbatch incremental table update
microbatch result generated from stream by worker
existing result retrieved from disk
✔✔
combine_plan = get_plan(SELECT group, combine(count), combine(avg) FROM microbatch_result UNION existing GROUP BY group);
combined = PortalRun(combine_plan, ...)
foreach(row, combined){ if (new_tuple(row)) heap_insert(row, …); else heap_update(row, …);}
![Page 61: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/61.jpg)
grouping (a, b, c)
grouping (d, e, f)
grouping (g, h, i)
grouping (j, k, l)
On-disk groupings are sharded over combiners by group
Each row is guaranteed to only ever be updated by one combiner process
Combiner process parallelism
Continuous view
![Page 62: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/62.jpg)
Just released! Continuous transforms
![Page 63: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/63.jpg)
Just released! Continuous transforms● Worker-only continuous queries
![Page 64: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/64.jpg)
Just released! Continuous transforms● Worker-only continuous queries
● Arbitrary procedure called on its output rows
![Page 65: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/65.jpg)
Just released! Continuous transforms● Worker-only continuous queries
● Arbitrary procedure called on its output rows
● Enable work sharing between continuous views
![Page 66: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/66.jpg)
Just released! Continuous transforms● Worker-only continuous queries
● Arbitrary procedure called on its output rows
● Enable work sharing between continuous views
CREATE CONTINUOUS TRANSFORM xform ASSELECT foo(col), bar(col) FROM raw_streamTHEN EXECUTE PROCEDURE pipeline_stream_insert(‘normalized_stream’)
![Page 67: PipelineDB Streaming SQL with - PostgreSQL · Benefits of continuous SQL on streams CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream Probabilistic computations](https://reader034.vdocument.in/reader034/viewer/2022042307/5ed36e39dee6c419bf4f1df8/html5/thumbnails/67.jpg)
Just released! Continuous transforms● Worker-only continuous queries
● Arbitrary procedure called on its output rows
● Enable work sharing between continuous views
CREATE CONTINUOUS TRANSFORM xform ASSELECT foo(col), bar(col) FROM raw_streamTHEN EXECUTE PROCEDURE pipeline_stream_insert(‘normalized_stream’)
CREATE CONTINUOUS VIEW v0 AS SELECT … FROM normalized_stream;CREATE CONTINUOUS VIEW v1 AS SELECT … FROM normalized_stream;