deterministic model for distributed speculative …deterministic model for distributed speculative...
TRANSCRIPT
![Page 1: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/1.jpg)
Deterministic Model for Distributed Speculative Stream Processing
DISCAN 2018
Igor Kuralenok, Artem Trofimov, Nikita Marshalkin, and Boris Novikov
JetBrains Research, Saint Petersburg State University
![Page 2: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/2.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
2
![Page 3: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/3.jpg)
Deterministic computations
Given a particular input, the same output will be produced after any number of reruns
For streaming it means: 𝐹 𝐼𝑛 … 𝐼0 : ∀𝑘, 𝐹(𝐼𝑘𝐼𝑘−1 … 𝐼0 ) = 𝐽𝑘
Usually, it is considered as: 𝐹 𝐼𝑛, 𝑆𝑛 : ∀𝑘, ∃𝑆𝑘 = 𝑆 𝐼𝑘−1 … 𝐼0 , 𝐹(𝐼𝑘, 𝑆𝑘) = 𝐽𝑘
3
![Page 4: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/4.jpg)
Why is determinism important?
Determinism is a desired property in many CS areas
• Natural for users (people got used to think sequentially)
• Computations are reproducible and predictable
• Implies consistency [Stonebreaker et al. The 8 requirements of real-time stream processing. ACM SIGMOD Record 2005]
4
![Page 5: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/5.jpg)
Determinism: simple way
Determinism can be easily achieved if
• All computations are sequential
• All transformations are pure functions
5
![Page 6: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/6.jpg)
Stream processing
• Shared-nothing distributed runtime
• Record-at-a-time model
• Latency is a key performance metric
6 6
![Page 7: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/7.jpg)
Determinism in stream processing
• It is considered that determinism is too difficult too achieve
• Systems usually provide low-level interfaces, which do not guarantee any level of determinism
• Trade-off between determinism and latency [Zacheilas et al. Maximizing Determinism in Stream Processing Under Latency Constraints. DEBS 2017]
7
![Page 8: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/8.jpg)
What is about batch processing?
• MapReduce is usually implemented deterministically
• Micro-batching (spark streaming, storm trident) is also deterministic
8
![Page 9: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/9.jpg)
The ultimate question of life, the universe, and everything
Is it possible to combine low-latency and determinism within distributed stream processing?
9
![Page 10: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/10.jpg)
The ultimate question of life, the universe, and everything
Is it possible to combine low-latency and determinism within distributed stream processing?
• In spite of asynchronous distributed processing
10
![Page 11: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/11.jpg)
The ultimate question of life, the universe, and everything
Is it possible to combine low-latency and determinism within distributed stream processing?
• In spite of asynchronous distributed processing
• Avoiding input buffering
11
![Page 12: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/12.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
12
![Page 13: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/13.jpg)
Dataflow
• Dataflow is a potentially unlimited sequence of data items • Timestamps can be assigned to data items to define an
order • Dataflow is expressed in the form of a graph • Vertices are operations, which are implemented by user-
defined functions • Edges declare an order between operations
13
![Page 14: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/14.jpg)
Physical deployment
14
![Page 15: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/15.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
15
![Page 16: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/16.jpg)
What do we require to achieve determinism?
• Total order and transformations as pure functions – We can define synthetic order by assigning timestamps at system entry
16
![Page 17: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/17.jpg)
What do we require to achieve determinism?
• Total order and transformations as pure functions – We can define synthetic order by assigning timestamps at system entry
• We need to care about the order only in the operations that are order-sensitive and before output
17
![Page 18: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/18.jpg)
What do we require to achieve determinism?
• Total order and transformations as pure functions – We can define synthetic order by assigning timestamps at system entry
• We need to care about the order only in the operations that are order-sensitive and before output
• Calculations are partitioned, and order between items from different partitions does not influence the result (if they will not be merged)
18
![Page 19: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/19.jpg)
Unrealistic requirement
• Total order preservation
19
![Page 20: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/20.jpg)
Unrealistic requirement
• Total order preservation
• Let’s try to rethink streaming computations
20
![Page 21: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/21.jpg)
Drifting state: idea
21
𝐼𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘
↑ ↓
𝑆𝑘 𝑆𝑘+1
![Page 22: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/22.jpg)
Drifting state: idea
22
𝐼𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘
↑ ↓
𝑆𝑘 𝑆𝑘+1 newState = combine(prevState, newItem) handler.update(newState) return newState
![Page 23: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/23.jpg)
Drifting state: idea
23
𝐼𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘
↑ ↓
𝑆𝑘 𝑆𝑘+1
What if we put state directly into the stream?
𝐼𝑘 , 𝑆𝑘 → Op 𝐼𝑘 , 𝑆𝑘 → 𝐽𝑘 , 𝑆𝑘+1
![Page 24: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/24.jpg)
Drifting state: implementation
• Any stateful transformation can be decomposed into map and windowed grouping operation with a cycle
• Map operation is stateless == order insensitive and pure
• Grouping operation is pure (and even does not contain user-define logic)
24
![Page 25: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/25.jpg)
Drifting state: optimistic grouping
• Grouping operation is pure, but order-sensitive • Buffers before each grouping can increase latency [Li et al.
Out-of-order processing: a new architecture for high-performance stream systems. VLDB 2008]
• Grouping can be implemented optimistically without blocking
25
![Page 26: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/26.jpg)
Drifting state: optimistic grouping
• Grouping operation is pure, but order-sensitive • Buffers before each grouping can increase latency [Li et al.
Out-of-order processing: a new architecture for high-performance stream systems. VLDB 2008]
• Grouping can be implemented optimistically without blocking
26
![Page 27: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/27.jpg)
Drifting state: the only buffer
• Optimistic approach produces invalid items • Invalid items must be filtered out before they are
sent to consumer • Punctuations (low watermarks) allow releasing
items
27
![Page 28: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/28.jpg)
Something is wrong here
• Dataflow graphs are cyclic
• It is unclear how to send low watermarks through the cycles
28
![Page 29: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/29.jpg)
Implementation notes: acker
29
![Page 30: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/30.jpg)
Implementation notes: modified acker
30
![Page 31: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/31.jpg)
Discussion: drifting state pros
• Determinism is closer than you think! • We moved from “state as a special item” to “state
as an ordinary item” – Business-logic becomes stateless – All guarantees that system provides regarding
ordinary items are satisfied for state – Any stateful dataflow graph can be expressed using
drifting state model – Transformations can be non-commutative, but should
be pure – Single buffer before output per dataflow graph
31
![Page 32: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/32.jpg)
Discussion: drifting state cons
• It is harder to write code
– A need for a convenient API
• Optimistic technique can potentially generate a lot of additional items
• How does drifting state behaves within real-world problem?
32
![Page 33: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/33.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
33
![Page 34: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/34.jpg)
Experiments: prototype
• FlameStream [https://github.com/flame-stream]
• Java + Akka, Zookeper
34
![Page 35: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/35.jpg)
Experiments: task
Incremental inverted index building
– Requires stateful operations
– Contains network shuffle
– Workload is unbalanced due to Zipf’s law
35
![Page 36: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/36.jpg)
Experiments: setup
• 10 EC2 micro instances
– 1 GB RAM
– 1 core CPU
• Wikipedia documents as a dataset
36
![Page 37: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/37.jpg)
Experiments: overhead
37
![Page 38: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/38.jpg)
Experiments: latency scalability
38
![Page 39: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/39.jpg)
Experiments: throughput scalability
39 Nodes
Do
cum
ents
/sec
![Page 40: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/40.jpg)
Experiments: comparison with conservative approach
40
• Posting lists update is order-sensitive operation
• Buffer elements before this operation
• Buffer is flushed on low watermarks
• Low watermarks are sent after each input element to minimize overhead
• [Li et al. Out-of-order processing: a new architecture for high-performance stream systems. VLDB 2008]
• Apache Flink as stream processing system
![Page 41: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/41.jpg)
Experiments: comparison with conservative approach (at most once)
41
10 nodes 5 nodes
![Page 42: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/42.jpg)
Experiments: comparison with conservative approach (at most once)
42
10 nodes 5 nodes
![Page 43: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/43.jpg)
Experiments: comparison with conservative approach
43
![Page 44: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/44.jpg)
Experiments: comparison with conservative approach
44
Do
cum
ents
/sec
Nodes
![Page 45: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/45.jpg)
Drifting state: conclusion
• Determinism and low-latency is achieved
• Overhead is low
• Throughput is not significantly degraded
• Model is suitable for any stateful dataflows
– If all transformations are pure
45
![Page 46: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/46.jpg)
• Map is not pure? – Determinism is lost by definition – Correctness is lost
• Multiple input nodes?
– Timestamp = timestamp@node_id – Latency ≥ out of sync time – Possible to sync in ~10ms
• Acker fails? – Replication – Separate ackers for timestamp ranges
What if…
46
![Page 47: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/47.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
47
![Page 48: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/48.jpg)
What do we need to add to achieve exactly-once?
• Input replay
• Restore consistent state in groupings
• Deduplicate items only at the barrier
48
![Page 49: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/49.jpg)
What do we need to add to achieve exactly-once?
• Input replay
• Restore consistent state in groupings
• Deduplicate items only at the barrier – Output items atomically
– Sink stores timestamp of the last received item
– Simply compare timestamps!
49
![Page 50: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/50.jpg)
Exactly-once: discussion
• Pure streaming
• Deduplication only at the barrier
• Snapshotting and outputting are independent (are not connected into transaction)
50
![Page 51: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/51.jpg)
Exactly-once: roadmap
51
![Page 52: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/52.jpg)
Outline
• Deterministic computations
• Stream processing computational model
• Optimistic determinism: drifting state
• Experiments
• Exactly-once on top of determinism
• Yet another experiment
52
![Page 53: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/53.jpg)
Experiments: latency (50 ms between checkpoints)
53
50 ms between state snapshots
![Page 54: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/54.jpg)
Experiments: latency (1000 ms between checkpoints)
54
1000 ms between state snapshots
![Page 55: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/55.jpg)
Experiments: throughput
55
Do
cum
ents
/sec
Nodes
![Page 56: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/56.jpg)
Conclusions
• Single extra requirement: all transformations are pure
• Results look promising
• A lot of work
– Understand properties and limitations
– Real-life deployment
• We are open for collaboration
56
![Page 57: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/57.jpg)
Future work
• Real-life deployment
• Efficient determinism and exactly-once can be used for system-level acceptance testing
– [Trofimov. Consistency maintenance in distributed analytical stream processing. ADBIS DC 2018]
57
![Page 58: Deterministic Model for Distributed Speculative …Deterministic Model for Distributed Speculative Stream Processing DISCAN 2018 Igor Kuralenok, Artem Trofimov, Nikita Marshalkin,](https://reader034.vdocument.in/reader034/viewer/2022050411/5f885964c88bbe7cb0214042/html5/thumbnails/58.jpg)
Papers
• Kuralenok et al. FlameStream: Model and Runtime for Distributed Stream Processing. BeyondMR@SIGMOD 2018
• Kuralenok et al. Deterministic model for distributed speculative stream processing. ADBIS 2018
• Long paper about exactly-once is on the anvil
58