stream analytics in the enterprise

37
Stream Analytics in the Enterprise

Upload: jesus-rodriguez

Post on 16-Apr-2017

660 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Stream Analytics in the Enterprise

Stream Analytics in the Enterprise

Page 2: Stream Analytics in the Enterprise

About Us

• Emerging technology firm focused on helping enterprises build breakthrough software solutions

• Building software solutions powered by disruptive enterprise software trends

-Machine learning and data science -Cyber-security -Enterprise IOT -Powered by Cloud and Mobile• Bringing innovation from startups and academic institutions to the enterprise

• Award winning agencies: Inc 500, American Business Awards, International Business Awards

Page 3: Stream Analytics in the Enterprise

• The elements of stream analytic solutions• Stream analytic platforms: on-premise vs. cloud• On-premise stream analytic platforms• Cloud stream analytic services• Complementary technologies

Agenda

Page 4: Stream Analytics in the Enterprise

The elements of enterprise stream analytic solutions

Page 5: Stream Analytics in the Enterprise

• Real time data ingestion• Execute SQL queries on dynamic streams of data• Time window queries • Connect query outputs to new data streams• Leverage reference data in the stream queries

Capabilities of Stream Analytic Solutions

Page 6: Stream Analytics in the Enterprise

Stream analytic platforms

Page 7: Stream Analytics in the Enterprise

Cloud vs. On-premise stream analytic platforms

Page 8: Stream Analytics in the Enterprise

Capabilities of Stream Analytic Solutions

ExtensibilityControlRich programming modelIntegration with on-premise big data pipeline

Complex infrastructureScalabilityMaintenance and monitoring

Simple provisioningElastic scalabilityIntegrated with PaaS offeringsRich monitoring and management experience

Integration with on-premise systemsExtensibility Lack of customization

On-premise stream analytic platforms Cloud stream analytic services

Page 9: Stream Analytics in the Enterprise

On-premise stream analytic platforms

Page 10: Stream Analytics in the Enterprise

Lead Platforms

Apache Storm

Apache Spark

Apache Samza

Apache Flink

Akka

Page 11: Stream Analytics in the Enterprise

Apache Storm

• Stream processing framework with micro-batching capabilities

• Included in most Hadoop distributions

• Main model (spouts and bolts) -One at a time -Lower latency -Operates on tuple streams• Trident -Micro-batching -Higher throughput

Page 12: Stream Analytics in the Enterprise

Apache Storm: Benefits vs. Challenges

• Broad adoption• Included in Hadoop distributions• Vibrant community • Extensibility • Support for different programming

languages

• Increasing competition from newer stacks

• Performance limitations at very large scale

Benefits Challenges

Page 13: Stream Analytics in the Enterprise

Apache Spark

• Micro-batching processing framework

• Elastic scalability models• Receivers split data into batches• Spark Streaming processes

batches and produces results• High throughput – higher latency • Functional APIs

Page 14: Stream Analytics in the Enterprise

Spark Streaming: Benefits vs. Challenges

• MPP infrastructure• Interoperability with other Spark

programming models (Java, Python, SQL)

• Integration with messaging frameworks

• Extensibility• Included in most Hadoop

distributions

• Time window queries• Complex infrastructure setup• Integration with line of business

systems

Benefits Challenges

Page 15: Stream Analytics in the Enterprise

Apache Samza

• Built to address some of the limitations of Apache Storm

• Deep integration with Samza and Yarn

• Simple API comparable to map-reduce

• Leverages Yarn for task distribution, fault tolerance and scalability

Page 16: Stream Analytics in the Enterprise

Apache Samza: Benefits vs. Challenges

• Highly scalable, fault-tolerant model

• Stateful stream data processing• Extensibility • Simple infrastructure

• Small adoption• Low level API• Heavy IO operations

Benefits Challenges

Page 17: Stream Analytics in the Enterprise

Apache Flink Streaming

• Alternative to Spark• Everything is a stream• Platform to unity batch and stream

processing• True streaming with adjustable

latency and throughput • Support different stream sources

and transformations

Page 18: Stream Analytics in the Enterprise

Apache Flink Streaming: Benefits vs. Challenges

• Combine batch and stream data processing

• Expressive APIs • Data flows and transformation • Extensiblity

• Small adoption• Limited state management • High availability models

Benefits Challenges

Page 19: Stream Analytics in the Enterprise

Akka Streams

• Micro-service, actor oriented model

• Messaging driven • Isolated failures• Reactive programming model

based on source, sinks and flows• DSL for stream data manipulation

Page 20: Stream Analytics in the Enterprise

Akka Streams: Benefits vs. Challenges

• Rich stream data processing model• Extensibility• Concurrency and thread-safey • Leverage mainstream Java and

Scala programming models

• Small adoption• Dependent on Akka’s architecture

style• Support for languages outside the

JVM

Benefits Challenges

Page 21: Stream Analytics in the Enterprise

Cloud stream analytic platforms

Page 22: Stream Analytics in the Enterprise

Lead Platforms

AWS Kinesis Analytics

Azure Stream Analytics

Bluemix Stream Analytics

Page 23: Stream Analytics in the Enterprise

AWS Kinesis

• Native stream data services in AWS

• Combines three products in a single platform

-Kinesis Streams -Kinesis Firehose -Kinesis Analytics• Kinesis Streams allows to collect

data streams from any applications• Kinesis Firehose provides a model

to load streaming data into AWS• Kinesis Analytics allow the

execution of SQL queries over data streams

Page 24: Stream Analytics in the Enterprise

AWS Kinesis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms

• AWS Kinesis Analytics hasn’t been released

• Interoperability with on-premise data streams

Benefits Challenges

Page 25: Stream Analytics in the Enterprise

Azure Stream Analytics

• Native stream analytic service in the Azure platform

• Allow the execution of SQL queries over dynamic streams of data

• Integrates with the other components of the Cortana Analytics suite

• Leverages Azure Event Hub for high volume data ingestion

• Very rich monitoring and analytic capabilities

Page 26: Stream Analytics in the Enterprise

Azure Stream Analytcis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms • Rich SQL query and analytics

model

• Interoperability with on-premise data streams

• Extensibility

Benefits Challenges

Page 27: Stream Analytics in the Enterprise

Bluemix Streaming Analytics

• Native stream analytic service in the IBM Bluemix platform

• Built upon IBM Streams technology

• Allow the execution of SQL queries over dynamic streams of data

• Support interactive and programmatic query models

• Rich analytic and monitoring capabilities

• Stream visualization graph

Page 28: Stream Analytics in the Enterprise

Azure Stream Analytcis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Rich SQL query and analytics

model

• Adoption • Interoperability with on-premise

data streams • Extensibility

Benefits Challenges

Page 29: Stream Analytics in the Enterprise

You can’t buy everything!

Page 30: Stream Analytics in the Enterprise

Capabilities of Enterprise Stream Analytic Solutions

• Stream tracking • Replay and simulation• Stream data testing • Integration with line of business systems • Stream data search • Integration with mainstream analytic tools

Page 31: Stream Analytics in the Enterprise

Complementary technologies

Page 32: Stream Analytics in the Enterprise

Other Relevant Technologies in Stream Analytic Solutions

• Enterprise messaging platforms • Time series databases• Stream data connectors

Page 33: Stream Analytics in the Enterprise

Enterprise Messaging Platforms

• Persistent messaging• Pub-sub messaging • Support for multiple messaging

patterns• Ordered messaging

Page 34: Stream Analytics in the Enterprise

Time Series Databases

• Store time stamped data• Time series query functions• Integrate real time and reference

data

Page 35: Stream Analytics in the Enterprise

Stream data connectors

• Develop stream data sources from line of business systems

• Integrate real time and reference data from enterprise systems into the stream data pipeline

• Combine real time data from multiple line of business systems into single data streams

Page 36: Stream Analytics in the Enterprise

Summary

• Stream data processing and analytics is a key element of modern enterprise data pipelines

• Some of the lead on-premise stream analytic stacks include: Apache Storm, Apache Samza, Spark Streaming, Flink Streaming, Akka….

• Some of the lead cloud stream analytic services include: AWS Kinesis, Azure Stream Analytics, Bluemix Streaming Analytics…

• You can’t buy everything! Stream analytic solution require custom implementations

• When building stream analytic solutions, consider complementary technologies such as enterprise messaging stacks or time series databases

Page 37: Stream Analytics in the Enterprise

Thankshttp://[email protected]