![Page 1: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/1.jpg)
A Practical Guide to Selecting a Stream Processing Technology
Michael � G. � NollProduct � Manager, � Confluent
![Page 2: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/2.jpg)
Kafka Talk SeriesDate Title
Sep 27 Introduction To Streaming Data and Stream Processing with Apache Kafka
Oct 06 Deep Dive into Apache Kafka
Oct 27 Data Integration with Apache Kafka
Nov 17 Demystifying Stream Processing with Apache Kafka
Dec 01 A Practical Guide to Selecting a Stream Processing Technology
Dec 15 Streaming in Practice: Putting Apache Kafka in Production
https://www.confluent.io/apache-‐kafka-‐talk-‐series
![Page 3: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/3.jpg)
Agenda
• Recap: � What � is � Stream � Processing?• The � Three � Pillars � of � Stream � Processing � in � Practice• Key � Selection � Criteria• Organizational/Non-Technical � Dimensions• Technical � Dimensions
• Summary
![Page 4: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/4.jpg)
Agenda
• Recap: � What � is � Stream � Processing?• The � Three � Pillars � of � Stream � Processing � in � Practice• Key � Selection � Criteria• Organizational/Non-Technical � Dimensions• Technical � Dimensions
• Summary
![Page 5: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/5.jpg)
![Page 6: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/6.jpg)
![Page 7: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/7.jpg)
![Page 8: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/8.jpg)
![Page 9: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/9.jpg)
Agenda
• Recap: � What � is � Stream � Processing?• The � Three � Pillars � of � Stream � Processing � in � Practice• Key � Selection � Criteria• Organizational/Non-Technical � Dimensions• Technical � Dimensions
• Summary
![Page 10: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/10.jpg)
![Page 11: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/11.jpg)
![Page 12: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/12.jpg)
![Page 13: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/13.jpg)
![Page 14: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/14.jpg)
Powered by Kafka (﴾thousands more)﴿
![Page 15: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/15.jpg)
![Page 16: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/16.jpg)
![Page 17: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/17.jpg)
![Page 18: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/18.jpg)
![Page 19: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/19.jpg)
![Page 20: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/20.jpg)
Spark Streaming API (﴾2.0)﴿
![Page 21: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/21.jpg)
Kafka’s Streams API (﴾0.10)﴿
![Page 22: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/22.jpg)
![Page 23: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/23.jpg)
![Page 24: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/24.jpg)
![Page 25: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/25.jpg)
![Page 26: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/26.jpg)
![Page 27: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/27.jpg)
![Page 28: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/28.jpg)
![Page 29: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/29.jpg)
![Page 30: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/30.jpg)
![Page 31: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/31.jpg)
![Page 32: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/32.jpg)
![Page 33: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/33.jpg)
![Page 34: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/34.jpg)
![Page 35: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/35.jpg)
![Page 36: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/36.jpg)
![Page 37: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/37.jpg)
Example: Streams and Tables in Kafka
Word Count
hello 2
kafka 1
world 1
… …
![Page 38: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/38.jpg)
![Page 39: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/39.jpg)
![Page 40: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/40.jpg)
![Page 41: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/41.jpg)
![Page 42: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/42.jpg)
Streams & Databases
• A � stream � processing � technology � must � have � first-class � support � for Streams � and Tables• With � scalability, � fault � tolerance, � …
• Why? � Because � most � use � cases � require � not � just � one, � but � both!• Support � – or � lack � thereof � – strongly � impacts � the � resulting �
technical � architecture � and � development � efforts• No � support � means:• Painful � Do-It-Yourself• Increased � complexity, � more � moving � pieces � to � juggle
![Page 43: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/43.jpg)
Agenda
• Recap: � What � is � Stream � Processing?• The � Three � Pillars � of � Stream � Processing � in � Practice• Key � Selection � Criteria• Organizational/Non-Technical � Dimensions• Technical � Dimensions
• Summary
![Page 44: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/44.jpg)
Agenda
• Recap: � What � is � Stream � Processing?• The � Three � Pillars � of � Stream � Processing � in � Practice• Key � Selection � Criteria• Organizational/Non-Technical � Dimensions• Technical � Dimensions
• Summary
![Page 45: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/45.jpg)
Organizational/Non-‐Tech Dimensions
• Can � your � org � understand � and � leverage � the � technology?• Familiarity � with � languages; � intuitive � concepts � and � APIs; � trainings
• Are � you � permitted � to � use � it � in � your � organization?• Security � features, � licensing, � open � source � vs. � proprietary
• Can � you � continue � to � use � it � in � the � future?• Longevity � of � technology, � licensing, � vendor � strength
![Page 46: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/46.jpg)
Organizational/Non-‐Tech Dimensions
• Do � you � believe � in � the � long-term � vision?• Switching � technologies � in � an � organization � is � often � expensive/slow: �
legacy � migration, � re-training, � resistance � to � change, � etc.
• What � is � the � path � and � time � to � success?• Can � you � move � smoothly � and � quickly � from � proof-of-concept � to �
production?
• Areas � and � range � of � applicability in � your � organization• General-purpose � vs. � niche � technology• Viable � for � S/M/L/XL � use � cases � vs. � for � XL � use � cases � only• Building � core � business � apps � vs. � doing � backend � analytics
![Page 47: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/47.jpg)
Organizational/Non-‐Tech Dimensions
Licensing Vision/Roadmap ROI
Impact onOrganization
Broad vs. NicheApplicability
Time to Market
ProfessionalServices
Documentation Examples User CommunityLearning Curve
Impact on Tools,Infrastructure, …
![Page 48: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/48.jpg)
Agenda
• Recap: � What � is � Stream � Processing?• The � Three � Pillars � of � Stream � Processing � in � Practice• Key � Selection � Criteria• Organizational/Non-Technical � Dimensions• Technical � Dimensions
• Summary
![Page 49: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/49.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 50: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/50.jpg)
State
• Stateful � processing � of � any � kind � requires…state• Many � (most?) � use � cases � for � stream � processing � are � stateful• Joins, � aggregations, � windowing, � counting, � ...
• Is � state � performant? � Local � vs. � remote � state?
50
![Page 51: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/51.jpg)
![Page 52: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/52.jpg)
![Page 53: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/53.jpg)
State
• Stateful � processing � of � any � kind � requires…state• Many � (most?) � use � cases � for � stream � processing � are � stateful• Joins, � aggregations, � windowing, � counting, � ...
• Is � state � performant? � Local � vs. � remote � state?• Is � state � fault-tolerant? � How � fast � is � recovery/failover?
53
![Page 54: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/54.jpg)
![Page 55: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/55.jpg)
State
• Stateful � processing � of � any � kind � requires…state• Many � (most?) � use � cases � for � stream � processing � are � stateful• Joins, � aggregations, � windowing, � counting, � ...
• Is � state � performant? � Local � vs. � remote � state?• Is � state � fault-tolerant? � How � fast � is � recovery/failover?• Is � state � interactively � queryable?• Kafka: � ready � for � use � (GA)• Spark, � Flink: � under � development � (alpha)• Storm, � Samza, � and � others: � not � available
55
![Page 56: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/56.jpg)
![Page 57: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/57.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 58: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/58.jpg)
Abstractions
• What � are � the � data � model � and � the � available � abstractions?• Most � common � abstraction: � stream of � records, � events• Kafka, � Spark, � Storm, � Samza, � Flink, � Apex, � ...
• New, � very � powerful: � table � of � records• Currently � unique � to � Kafka• Represents � latest � state and � materialized � views• State � must � have � a � first-class � abstraction � because, � as � we � just � saw � in �
the � previous � section, � state � is � crucial � for � stream � processing!
58
![Page 59: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/59.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 60: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/60.jpg)
Time model
• Different � use � cases � require � different � time � semantics• Great � majority � of � use � cases � require � event-time semantics• Other � use � cases � may � require � processing-time (e.g. � real-
time � monitoring) � or � special � variants � like � ingestion-time• A � stream � processing � technology � should, � at � a � minimum, �
support � event-time � to � cover � most � use � cases � in � practice• Examples: � Kafka, � Beam, � Flink
![Page 61: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/61.jpg)
Time Model
61
![Page 62: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/62.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 63: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/63.jpg)
Windowing• Windowing � is � an � operation � that � groups events
![Page 64: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/64.jpg)
Windowing
Input data, wherecolors represent
different users events
Rectangles denotedifferent event-‐time
windows
processing-‐time
event-‐time
windowing
alicebob
dave
![Page 65: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/65.jpg)
Windowing• Windowing � is � an � operation � that � groups events• Most � commonly � needed: � time � windows, � session � windows• Examples:• Real-time � monitoring: � 5-minute � averages• Reader � behavior � on � a � website: � user � browsing � sessions
![Page 66: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/66.jpg)
Windowing
![Page 67: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/67.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 68: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/68.jpg)
Out-‐of-‐order and late-‐arriving data
• Is � very � common in � practice, � not � a � rare � corner � case• Related � to � time � model � discussion
![Page 69: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/69.jpg)
Out-‐of-‐order and late-‐arriving data
Users with mobile phones enterairplane, lose Internet connectivity
Emails are being writtenduring the 10h flight
Internet connectivity is restored,phones will send queued emails now
![Page 70: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/70.jpg)
Out-‐of-‐order and late-‐arriving data
• Is � very � common in � practice, � not � a � rare � corner � case• Related � to � time � model � discussion
• We � want � control over � how � out-of-order � data � is � handled• Example:• We � process � data � in � 5-minute � windows, � e.g. � compute � statistics• When � event � arrives � 1 � minute � late: � update the � original � result!• When � event � arrives � 2 � hours � late: � discard it!
• Handling � must � be � efficient because � it � happens � so � often
![Page 71: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/71.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 72: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/72.jpg)
Reprocessing
• Re-process � data � by � rewinding � a � stream � back � in � time• Use � cases � in � practice � include• Correcting � output � data � after � fixing � a � bug• Facilitate � iterative � and � explorative � development• A/B � testing• Processing � historical � data• Walking � through � "What � If?" � scenarios
• Also: � often � used � behind-the-scenes � for � fault � tolerance
![Page 73: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/73.jpg)
![Page 74: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/74.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 75: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/75.jpg)
Scalability, Elasticity, Fault Tolerance
• Can � the � technology � scale according � to � your � needs?• Desired � latency, � throughput?• Able � to � process � millions � of � messages � per � second?
• What � is � the � minimum � footprint?• Expand/shrink � capacity � dynamically � during � operations?
• Helps � with � resource � utilization � because � most � stream � apps � run � continuously• Resilience and � fault � tolerance
• Which � guarantees � for � data � delivery � and � for � state? � "At-least-once", � "exactly-once", � "effectively-once", � etc.
• Failover � behavior � and � recovery � time? � Automated � or � manual?• Any � negative � impact � of � fault � tolerance � features � on � performance?
![Page 76: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/76.jpg)
![Page 77: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/77.jpg)
![Page 78: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/78.jpg)
![Page 79: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/79.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 80: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/80.jpg)
Security
• To � meet � internal � security � policies, � legal � compliance, � etc.• Typical � base � requirements � for � stream � processing � applications:• Encrypt � data-in-transit � (e.g. � from/to � Kafka)• Authentication: � "only � some � applications � may � talk � to � production"• Authorization: � "access � to � sensitive � data � such � as � PII � is � restricted”
• The � easier � it � is � to � use � security � features, � the � more � likely � they � are � actually � being � used � in � practice
![Page 81: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/81.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 82: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/82.jpg)
Processing Model• True � stream � processing � is � record-at-a-time processing
• Benefits � include � low � latency (millisecs), � dealing � efficiently � with � out-of-order � data• Can � provide � both � latency � and � high � throughput � via � internal � optimizations• Examples: � Kafka, � Storm, � Samza, � Flink, � Beam
• Some � processing � technologies � opt � for � (micro)batching• Micro-batching � has � no � true � benefits: � consider � it � a � technical � workaround � to �
shoehorn � stream-like � functionality � into � a � tool• Suffers � from � significant � overhead � when � dealing � with � e.g. � out-of-order/late-arriving �
data, � when � performing � windowed � analyses � (e.g. � session � windows)• Typically � a � strong � blocker � for � use � cases � such � as � fraud � detection � or � anything � where �
"a � few � seconds" � of � latency � is � prohibitive• Examples: � Spark, � Storm � (Trident), � Hadoop*
![Page 83: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/83.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 84: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/84.jpg)
API
• Choice � of � API � is � a � subjective � matter � – skills, � preference, � …• Typical � options• Declarative, � expressive � API: � operations � like � map(), � filter()• Imperative, � lower-level � API: � callbacks � like � process(event)• Streaming � SQL: � STREAM SELECT … FROM … WHERE … • In � the � best � case � you � get � not � just � one, � but � all � three
• "Abstractions � are � great!"• "Abstractions � considered � harmful!"
![Page 85: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/85.jpg)
Technical Dimensions
Reprocessing Scalability &Elasticity
Fault Tolerance
API Dev/OpsLifecycle
Security ProcessingModel
Out of OrderData
Abstractions Time Model WindowingState
![Page 86: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/86.jpg)
Developer/Operations Lifecycle
• How � should � your � daily � work � look � and � feel � like?• "I � like � to � do � quick, � iterative � development" � (modify/test/repeat)• "I � want � to � decouple � team � roadmaps, � project � schedules"
• Big � difference � between � App � Model � <-> � Cluster � Model• Testing, � packaging, � deployment, � monitoring, � operations• "Do � I � need � to � know � Java � (app) � or � YARN � (cluster) � for � this?”• "I � want � reactive � processing � in � containers � that � run � on � Mesos!"
• Rolling, � no-downtime � upgrades?• Integration � with � existing � Ops � infra, � tools, � processes?
![Page 87: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/87.jpg)
Agenda
• Recap: � What � is � Stream � Processing?• The � Three � Pillars � of � Stream � Processing � in � Practice• Key � Selection � Criteria• Organizational/Non-Technical � Dimensions• Technical � Dimensions
• Summary
![Page 88: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/88.jpg)
Summary
• What � we � covered � is � a � good � starting � point• But, � no � free � lunch!• Understand � what � you � need, � and � weigh � criteria � appropriately• Think � end-to-end: � idea, � development, � operations, � troubleshooting• Think � big-picture: � future � use � cases, � architecture, � security, � training, � …• Do � your � own � internal � hackathons, � proof-of-concepts• Do � your � own � benchmarks
• If � in � doubt: � simplicity � beats � complexity• Faster � to � learn, � easier � to � understand, � less � likely � to � fail, � …
![Page 89: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/89.jpg)
Q&A Session
89
![Page 90: A Practical Guide to Selecting a Stream Processing Technology](https://reader031.vdocument.in/reader031/viewer/2022030305/587137331a28abf0568b5fdf/html5/thumbnails/90.jpg)
Coming Up NextDate Title Speaker
Dec 15 Streaming in Practice: Putting Apache Kafka in Production
Roger Hoover
https://www.confluent.io/apache-‐kafka-‐talk-‐series