stream analytics in the enterprise
TRANSCRIPT
Stream Analytics in the Enterprise
About Us
• Emerging technology firm focused on helping enterprises build breakthrough software solutions
• Building software solutions powered by disruptive enterprise software trends
-Machine learning and data science -Cyber-security -Enterprise IOT -Powered by Cloud and Mobile• Bringing innovation from startups and academic institutions to the enterprise
• Award winning agencies: Inc 500, American Business Awards, International Business Awards
• The elements of stream analytic solutions• Stream analytic platforms: on-premise vs. cloud• On-premise stream analytic platforms• Cloud stream analytic services• Complementary technologies
Agenda
The elements of enterprise stream analytic solutions
• Real time data ingestion• Execute SQL queries on dynamic streams of data• Time window queries • Connect query outputs to new data streams• Leverage reference data in the stream queries
Capabilities of Stream Analytic Solutions
Stream analytic platforms
Cloud vs. On-premise stream analytic platforms
Capabilities of Stream Analytic Solutions
ExtensibilityControlRich programming modelIntegration with on-premise big data pipeline
Complex infrastructureScalabilityMaintenance and monitoring
Simple provisioningElastic scalabilityIntegrated with PaaS offeringsRich monitoring and management experience
Integration with on-premise systemsExtensibility Lack of customization
On-premise stream analytic platforms Cloud stream analytic services
On-premise stream analytic platforms
Lead Platforms
Apache Storm
Apache Spark
Apache Samza
Apache Flink
Akka
Apache Storm
• Stream processing framework with micro-batching capabilities
• Included in most Hadoop distributions
• Main model (spouts and bolts) -One at a time -Lower latency -Operates on tuple streams• Trident -Micro-batching -Higher throughput
Apache Storm: Benefits vs. Challenges
• Broad adoption• Included in Hadoop distributions• Vibrant community • Extensibility • Support for different programming
languages
• Increasing competition from newer stacks
• Performance limitations at very large scale
Benefits Challenges
Apache Spark
• Micro-batching processing framework
• Elastic scalability models• Receivers split data into batches• Spark Streaming processes
batches and produces results• High throughput – higher latency • Functional APIs
Spark Streaming: Benefits vs. Challenges
• MPP infrastructure• Interoperability with other Spark
programming models (Java, Python, SQL)
• Integration with messaging frameworks
• Extensibility• Included in most Hadoop
distributions
• Time window queries• Complex infrastructure setup• Integration with line of business
systems
Benefits Challenges
Apache Samza
• Built to address some of the limitations of Apache Storm
• Deep integration with Samza and Yarn
• Simple API comparable to map-reduce
• Leverages Yarn for task distribution, fault tolerance and scalability
Apache Samza: Benefits vs. Challenges
• Highly scalable, fault-tolerant model
• Stateful stream data processing• Extensibility • Simple infrastructure
• Small adoption• Low level API• Heavy IO operations
Benefits Challenges
Apache Flink Streaming
• Alternative to Spark• Everything is a stream• Platform to unity batch and stream
processing• True streaming with adjustable
latency and throughput • Support different stream sources
and transformations
Apache Flink Streaming: Benefits vs. Challenges
• Combine batch and stream data processing
• Expressive APIs • Data flows and transformation • Extensiblity
• Small adoption• Limited state management • High availability models
Benefits Challenges
Akka Streams
• Micro-service, actor oriented model
• Messaging driven • Isolated failures• Reactive programming model
based on source, sinks and flows• DSL for stream data manipulation
Akka Streams: Benefits vs. Challenges
• Rich stream data processing model• Extensibility• Concurrency and thread-safey • Leverage mainstream Java and
Scala programming models
• Small adoption• Dependent on Akka’s architecture
style• Support for languages outside the
JVM
Benefits Challenges
Cloud stream analytic platforms
Lead Platforms
AWS Kinesis Analytics
Azure Stream Analytics
Bluemix Stream Analytics
AWS Kinesis
• Native stream data services in AWS
• Combines three products in a single platform
-Kinesis Streams -Kinesis Firehose -Kinesis Analytics• Kinesis Streams allows to collect
data streams from any applications• Kinesis Firehose provides a model
to load streaming data into AWS• Kinesis Analytics allow the
execution of SQL queries over data streams
AWS Kinesis: Benefits vs. Challenges
• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms
• AWS Kinesis Analytics hasn’t been released
• Interoperability with on-premise data streams
Benefits Challenges
Azure Stream Analytics
• Native stream analytic service in the Azure platform
• Allow the execution of SQL queries over dynamic streams of data
• Integrates with the other components of the Cortana Analytics suite
• Leverages Azure Event Hub for high volume data ingestion
• Very rich monitoring and analytic capabilities
Azure Stream Analytcis: Benefits vs. Challenges
• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms • Rich SQL query and analytics
model
• Interoperability with on-premise data streams
• Extensibility
Benefits Challenges
Bluemix Streaming Analytics
• Native stream analytic service in the IBM Bluemix platform
• Built upon IBM Streams technology
• Allow the execution of SQL queries over dynamic streams of data
• Support interactive and programmatic query models
• Rich analytic and monitoring capabilities
• Stream visualization graph
Azure Stream Analytcis: Benefits vs. Challenges
• Elastic scalability model• Simple provisioning • Interoperable APIs• Rich SQL query and analytics
model
• Adoption • Interoperability with on-premise
data streams • Extensibility
Benefits Challenges
You can’t buy everything!
Capabilities of Enterprise Stream Analytic Solutions
• Stream tracking • Replay and simulation• Stream data testing • Integration with line of business systems • Stream data search • Integration with mainstream analytic tools
Complementary technologies
Other Relevant Technologies in Stream Analytic Solutions
• Enterprise messaging platforms • Time series databases• Stream data connectors
Enterprise Messaging Platforms
• Persistent messaging• Pub-sub messaging • Support for multiple messaging
patterns• Ordered messaging
Time Series Databases
• Store time stamped data• Time series query functions• Integrate real time and reference
data
Stream data connectors
• Develop stream data sources from line of business systems
• Integrate real time and reference data from enterprise systems into the stream data pipeline
• Combine real time data from multiple line of business systems into single data streams
Summary
• Stream data processing and analytics is a key element of modern enterprise data pipelines
• Some of the lead on-premise stream analytic stacks include: Apache Storm, Apache Samza, Spark Streaming, Flink Streaming, Akka….
• Some of the lead cloud stream analytic services include: AWS Kinesis, Azure Stream Analytics, Bluemix Streaming Analytics…
• You can’t buy everything! Stream analytic solution require custom implementations
• When building stream analytic solutions, consider complementary technologies such as enterprise messaging stacks or time series databases
Thankshttp://[email protected]