stream processing as game changer for big data and internet of things by kai wahner

100

Upload: big-data-spain

Post on 16-Apr-2017

274 views

Category:

Technology


1 download

TRANSCRIPT

Kai WähnerTechnology Evangelist

[email protected]

LinkedIn

@KaiWaehner

www.kai-waehner.de

Big Data Spain @ Madrid (November 2016)

Comparison of Streaming Analytics Frameworks

© Copyright 2000-2016 TIBCO Software Inc.

Key Take-Aways

• Streaming Analytics processes Data while it is in Motion!

• Automation and Proactive Human Interaction are BOTH needed!

• Streaming Analytics is Complementary to Hadoop and Machine Learning!

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

• Real World Use Cases• Introduction to Streaming Analytics

• Market Overview• Relation to other Big Data Components• Live Demo

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

• Real World Use Cases• Introduction to Streaming Analytics

• Market Overview• Relation to other Big Data Components• Live Demo

© Copyright 2000-2016 TIBCO Software Inc.

Analyze and Act on Critical Business Moments

© Copyright 2000-2016 TIBCO Software Inc.

Success Story

Predictive Fault Management

© Copyright 2000-2013 TIBCO Software Inc.

“An outage on one well can cost $10M per hour. We have 20-100 outages per year.“

- Drilling operations VP, major oil company

Data Monitoring• Motor temperature• Motor vibration• Current• Intake pressure• Intake

temperature

Ø Flow

Electrical power cablePumpIntakeProtectorESP motorPump monitoring unit

Electric Submersible Pumps (ESP)

Predictive Analytics - Fault Management

Voltage

Temperature

Vibration

Devicehistory

Temporal analytic: “If vibration spike is followed by temp spike then voltage spike [within 4 hours] then flag high severity alert.”

Predictive Analytics - Fault Management

© Copyright 2000-2016 TIBCO Software Inc.

Live Surveillance of Equipment

Continuous, live geospatial display of pump health and predictive signal breeches

Alerts based on predictive signals

Compare live readings and signals to historical average and means

Continuous, live visualization of stats per 100’s of wells

© Copyright 2000-2016 TIBCO Software Inc.

Success Story

Crowd Management

© Copyright 2000-2013 TIBCO Software Inc.

“Turn the customer into a fan and increase revenue significantly.“

© Copyright 2000-2016 TIBCO Software Inc.

World’s Smartest Building

© Copyright 2000-2015 TIBCO Software Inc.

© Copyright 2000-2016 TIBCO Software Inc.

All Customers are different… Treat them that way…

14

Capture – Engage – Expand - Monetize

Patterns – Real time

MO

RE P

ERSO

NAL

MORE CONTEXT

social

CRM

POS mobilewebe-mails

© Copyright 2000-2016 TIBCO Software Inc.

Success Story

Smart Manufacturing

© Copyright 2000-2013 TIBCO Software Inc.

““For every 1% increase in shipped product, we make $11MM in profit. The demand is there, we just need to fulfill it.“

- Head of Quality, Solar Panel Manufacturer

Scenario: Predictive Scrapping of Parts in an Assembly Line

Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.

Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?

Station 1 Station 2

Cost Before9€ 7€ 13€ Total Cost

29€(or more)

Scrap? Scrap?

Machine Learning Applied to Sensor Events in Real Time

© Copyright 2000-2016 TIBCO Software Inc.

Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)

© Copyright 2000-2016 TIBCO Software Inc.

Great success stories, but …

… how to realize these use cases?

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

• Real World Use Cases• Introduction to Streaming Analytics

• Market Overview• Relation to other Big Data Components• Live Demo

© Copyright 2000-2016 TIBCO Software Inc.

Traditional Data Processing: ”Request – Response”

Store

Analyze

Act

© Copyright 2000-2016 TIBCO Software Inc.

Traditional Data Processing: ”Request – Response”

• Data is collected from a variety of sources, and placed in a persistent store.

– Relational database.– NoSQL store.– Hadoop environment.

• Analytical processes are executed against the stored data to detect opportunities or threats.

• Actions are identified, delivered, and executed across various business channels.

Store

Analyze

Act

© Copyright 2000-2016 TIBCO Software Inc.

Traditional Data Processing: Challenges

Store

Analyze

Act• Introduces too much “decision

latency” into the business.• Responses are delivered “after-the-

fact”.• Maximum value of the identified

situation is lost.– Cross-sell / up-sell opportunities are

lost, impending equipment failure is missed, business processes are slow to respond and lack timely context.

• Decisions are made on old and stale data.

© Copyright 2000-2016 TIBCO Software Inc.

Event Value Decreases Over TimeV

alue

Time

© Copyright 2000-2016 TIBCO Software Inc.

Event Value Decreases Over TimeV

alue

Time

• Events are often most valuable “close to” the point of collection.

• As time passes, events tend to lose their value.• The ability to proactively

identify “threats” or “opportunities” will typically decrease.

• Real-time capability is needed to maximize event value.

© Copyright 2000-2016 TIBCO Software Inc.

The New Era: Streaming Analytics

Act & Monitor

Analyze

Store

© Copyright 2000-2016 TIBCO Software Inc.

The New Era: Streaming Analytics

• Events are analyzed and processed in real-time as they arrive.

• Decisions are timely, contextual, and based on fresh data.

• Decision latency is eliminated, resulting in:ü Superior Customer Experienceü Operational Excellenceü Instant Awareness and Timely Decisions

Act & Monitor

Analyze

Store

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics: What Is A “Stream”?

Clickstream

Sensors

Social Data

Logs

• Consists of pieces of data typically generated due to a change of state.

• One or more identifiers• Timestamp & payload• Immutable

• Typically unbounded; there is no end to the data.

• Batch dataset: “bounded”.

• Can be raw or derived.

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Deep ML

• Analytics

• …

Stream Analytics & Processing

Index / SearchNormalization

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics Processing Pipeline

Separation of concerns to easily adjust one part in response to

changing business requirementswithout the need for rewriting other parts!

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics: Ingest

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

• Stream data may come from a number sources, either at the edge, in the data center, or via the cloud.

• Need to handle a variety of data formats and protocols, all at global scale.

• Pay attention to “event time” vs. “processing time”!!

• Event Time: Time the event was created.• Processing Time: Time the event was received or processed.

• Event time is typically more relevant, and will lead to more predictable results.

• Eliminate time skew associated with clock synchronization, system outages, processing latency, network issues, etc.

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics: Preprocessing

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Normalization • Stream data often needs to be manipulated before it is processed by downstream components.

• Normalization• Transformation

• May filter unwanted events close to the source to eliminate “noise”.

• Events may also be enriched with additional context to provide additional data for further processing.

• Customer details, equipment details, location information, etc.• Data may be stored in a high-speed cache or other store for rapid

access.

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics: ProcessingBa

tch • Transform

• Deep ML• Analytics• Data Lake• …

Stream Analytics & Processing

Real

-Tim

e • RT Analytics• Contextual

Rules• Windowing• Patterns• …

• Streams may be immediately pushed to a data lake.• May be raw or preprocessed.• Used for subsequent analysis as part of an immutable data layer.• Typically processed in batch in this part of the architecture.

• In parallel, streams may be processed in real-time against a number of constructs.

• Real-time analytics.• Graph analysis / Geo Analysis• Rules.

• Results from the real-time processing may be fed into the batch component.

• The results of batch processing may also be pushed into the real-time layer.

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Deep ML

• Analytics

• …

Stream Analytics & Processing

Index / SearchNormalization

© Copyright 2000-2016 TIBCO Software Inc.

Dataflow Streaming Pipeline – Extract, Transform, Load in Real Time

https://www.linkedin.com/pulse/data-pipeline-hadoop-part-1-2-birender-saini

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Deep ML

• Analytics

• …

Stream Analytics & Processing

Index / SearchNormalization

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics: “Windows”

https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

© Copyright 2000-2016 TIBCO Software Inc.

Automation and Augmented Intelligence for Humans

Actions by OperationsHumandecisionsinrealtimeinformed

byuptodateinformation

38

Automatedactionbasedonmodelsofhistorycombinedwithlivecontextandbusinessrules

Machine-to-Machine Automation

Big Data Reference Architecture

AugmentedIntelligence

Operations

SENSOR DATA

TRANSACTIONS

MESSAGE BUS

MACHINE DATA

SOCIAL DATA

StreamingAnalyticsAction

Aggregate

Rules

StreamProcessing

Analytics

Correlate

Continuousqueryprocessing

Alerts

Manualaction,escalation

DataDiscoveryPython

R

DataScientists

CleansedData

History

VisualAnalytics

Spark

Integration

ERP MDM DB WMS

SOA/Microservices

BIGDATA

DataWarehouse,Hadoop

InternalData

IntegrationBus

API

EventServer

H2O.ai

LiveUI

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

• Real World Use Cases• Introduction to Streaming Analytics

• Market Overview• Relation to other Big Data Components• Live Demo

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics Market Growing Significantly

“Everything Flows:The value of stream processing

and streaming integration”(September 2016)

http://hortonworks.com/info/value-streaming-integration/

© Copyright 2000-2016 TIBCO Software Inc.

Alternatives for Stream Processing

Timeto

Market

StreamingFrameworks

StreamingProducts

Slow Fast

StreamingConcepts

IncludesIncludes

© Copyright 2000-2016 TIBCO Software Inc.

Alternatives for Stream Processing

Concepts (Continuous Queries, Sliding Windows)Patterns (Counting, Sequencing, Tracking, Trends)

Build everything by yourself! L

Timeto

Market

StreamingFrameworks

StreamingProducts

Slow Fast

StreamingConcepts

© Copyright 2000-2016 TIBCO Software Inc.

Usually not an option ...

… as there are a lot of

Frameworks and

Products available!

© Copyright 2000-2016 TIBCO Software Inc.

Alternatives for Stream Processing

Library (Java, .NET, Python)Query Language (often similar to SQL)

Scalability (horizontal and vertical, fail over) Connectivity (technologies, markets, products)

Operators (Filter, Sort, Aggregate)

Timeto

Market

StreamingFrameworks

StreamingProducts

Slow Fast

StreamingConcepts

Different frameworks (ingest, preprocess, analytics)

combined!

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Deep ML

• Analytics

• …

Stream Analytics & Processing

Index / SearchNormalization

© Copyright 2000-2016 TIBCO Software Inc.

Example for an Open Source Streaming Pipeline

http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm

“Realtime Event Processing in Hadoop with Apache NiFi, Kafka and Storm”

Dataflow Streaming Pipeline (Ingest, Preprocess)

AugmentedIntelligence

Operations

SENSOR DATA

TRANSACTIONS

MESSAGE BUS

MACHINE DATA

SOCIAL DATA

StreamingAnalyticsAction

Aggregate

Rules

StreamProcessing

Analytics

Correlate

Continuousqueryprocessing

Alerts

Manualaction,escalation

DataDiscoveryPython

R

DataScientists

CleansedData

History

VisualAnalytics

Spark

Integration

ERP MDM DB WMS

SOA/Microservices

BIGDATA

DataWarehouse,Hadoop

InternalData

IntegrationBus

API

EventServer

H2O.ai

LiveUI

© Copyright 2000-2016 TIBCO Software Inc.

Open Source Dataflow Streaming Pipelines

Streaming Analytics

AugmentedIntelligence

Operations

SENSOR DATA

TRANSACTIONS

MESSAGE BUS

MACHINE DATA

SOCIAL DATA

StreamingAnalyticsAction

Aggregate

Rules

StreamProcessing

Analytics

Correlate

Continuousqueryprocessing

Alerts

Manualaction,escalation

DataDiscoveryPython

R

DataScientists

CleansedData

History

VisualAnalytics

Spark

Integration

ERP MDM DB WMS

SOA/Microservices

BIGDATA

DataWarehouse,Hadoop

InternalData

IntegrationBus

API

EventServer

H2O.ai

LiveUI

© Copyright 2000-2016 TIBCO Software Inc.

Frameworks and Products (no complete list!)

OPEN SOURCE CLOSED SOURCE

PRODUCT

FRAMEWORK

Azure MicrosoftStream Analytics

Google CloudDataflow

© Copyright 2000-2016 TIBCO Software Inc.

Frameworks and Products (no complete list!)

OPEN SOURCE CLOSED SOURCE

PRODUCT

FRAMEWORK

Azure MicrosoftStream Analytics

Google CloudDataflow

© Copyright 2000-2016 TIBCO Software Inc.

Apache Storm

Spout Bolt

© Copyright 2000-2016 TIBCO Software Inc.

Apache Storm – Hello World

http://wpcertification.blogspot.ch/2014/02/helloworld-apache-storm-word-counter.html

© Copyright 2000-2016 TIBCO Software Inc.

AWS Kinesis – Integration with other AWS Components

https://aws.amazon.com/kinesis/

AWS S3 RedShift DynamoDB

© Copyright 2000-2016 TIBCO Software Inc.

AWS Kinesis – Hello World

© Copyright 2000-2016 TIBCO Software Inc.

AWS Kinesis – Public Cloud Trade-Off

… is easy to setup and scale.

But you do not have full control! L

• Any data that is older than 24 hours is automatically deleted• Every Kinesis application consists of just one procedure, so you can’t use Kinesis

to perform complex stream processing unless you connect multiple applications• Kinesis can only support a maximum size of 50KB for each data item

http://diamondstream.com/amazon-kinesis-big-real-time-data-processing-solution/

(blog post from 2014, might be outdated, but shows that you do not have full control over a cloud service)

© Copyright 2000-2016 TIBCO Software Inc.

Apache Spark

General Data-processing Frameworkà However, focus is especially on Analytics (these days)

x

© Copyright 2000-2016 TIBCO Software Inc.

Apache Spark – Focus on Analytics

http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/http://fortune.com/2015/09/09/cloudera-spark-mapreduce/http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/

“[IBM’s initiatives] include:• deepening the integration between Apache

Spark and existing IBM products like the Watson Health Cloud;

• open sourcing IBM’s existing SystemMLmachine learning technology;

© Copyright 2000-2016 TIBCO Software Inc.

Spark Streaming

Spark Streaming• is no real streaming solution• uses micro-batches• cannot process data in real-time (i.e. no ultra-low latency)• allows easy combination with other Spark components (SQL, Machine Learning, etc.)

© Copyright 2000-2016 TIBCO Software Inc.

Apache Spark – Hello World

Spark Streaming API

Spark Core API

© Copyright 2000-2016 TIBCO Software Inc.

Apache Spark – as a Cloud Service

© Copyright 2000-2016 TIBCO Software Inc.

Apache Flink

Spark Streaming• „Newcomer“• Looks very similar to Spark• But „Streaming First“ concept

© Copyright 2000-2016 TIBCO Software Inc.

Apache Beam

Generic API with unified programming model for stream processing frameworks

http://www.slideshare.net/DataTorrent/apache-beam-incubating-67428372

© Copyright 2000-2016 TIBCO Software Inc.

Frameworks and Products (no complete list!)

OPEN SOURCE CLOSED SOURCE

PRODUCT

FRAMEWORK

Azure MicrosoftStream Analytics

Google CloudDataflow

Alternatives for Stream Processing

Library (Java, .NET, Python)Query Language (often similar to SQL)

Scalability (horizontal and vertical, fail over) Connectivity (technologies, markets, products)

Operators (Filter, Sort, Aggregate)

Timeto

Market

StreamingFrameworks

StreamingProducts

Slow Fast

StreamingConcepts

Single Tool (Complete Processing Pipeline)Visual IDE (Dev, Test, Debug)

Simulation (Feed Testing, Test Generation)Live UI (monitoring, proactive interaction)

Maturity (24/7 support, consulting)Integration (out-of-the-box: ESB, MDM, Analytics, etc.)

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Deep ML

• Analytics

• …

Stream Analytics & Processing

Index / SearchNormalization

Dataflow Streaming Pipeline + Streaming Analytics

AugmentedIntelligence

Operations

SENSOR DATA

TRANSACTIONS

MESSAGE BUS

MACHINE DATA

SOCIAL DATA

StreamingAnalyticsAction

Aggregate

Rules

StreamProcessing

Analytics

Correlate

Continuousqueryprocessing

Alerts

Manualaction,escalation

DataDiscoveryPython

R

DataScientists

CleansedData

History

VisualAnalytics

Spark

Integration

ERP MDM DB WMS

SOA/Microservices

BIGDATA

DataWarehouse,Hadoop

InternalData

IntegrationBus

API

EventServer

H2O.ai

LiveUI

© Copyright 2000-2016 TIBCO Software Inc.

IBM Streams

© Copyright 2000-2016 TIBCO Software Inc.

TIBCO StreamBase

• Performance: Latency, Throughput, Scalability• Multi-threaded and clustered server from version 1• High throughput: Millions of messages, 100,000s of quotes, 10,000s of orders• Low-latency: microsecond latency for algo trading, pre-trade risk, market data

• Take Advantage of High Performance Hardware• Multicore (12, 24, 32 core) large memory (10s of gigabytes)• 64-bit Linux, Windows, Solaris deployment• Hardware acceleration (GPU, Solace, Tervela)

• Enterprise Deployment• High availability and fault tolerance• Distributed state management for large data sets• Management and monitoring tools• Security and entitlements Integration• Continuous deployment and QA Process Support

StreamSQLcompilerandstaticoptimizer

Inprocess,inthreadadapterarchitecture

Visualparallelismandscaling

In-MemoryDataGridintegrationfor

distributedsharedstate

Dataparallelismanddispatch

StreamBaseServerInnovations

© Copyright 2000-2016 TIBCO Software Inc.

TIBCO StreamBase - Visual Programming

Aggregate

Capturecardactivationsperlocation

Salestoohighà Fraud

Logtoanydatabase NoFraud

Salestoohigh?

Visual DebuggerFeed Simulation

Unit Testing

StreamBase Development StudioTIBCO StreamBase - Visual Programming

Live UI for Augmented Intelligence

AugmentedIntelligence

Operations

SENSOR DATA

TRANSACTIONS

MESSAGE BUS

MACHINE DATA

SOCIAL DATA

StreamingAnalyticsAction

Aggregate

Rules

StreamProcessing

Analytics

Correlate

Continuousqueryprocessing

Alerts

Manualaction,escalation

DataDiscoveryPython

R

DataScientists

CleansedData

History

VisualAnalytics

Spark

Integration

ERP MDM DB WMS

SOA/Microservices

BIGDATA

DataWarehouse,Hadoop

InternalData

IntegrationBus

API

EventServer

H2O.ai

LiveUI

© Copyright 2000-2016 TIBCO Software Inc.

Live User Interface

Live UI

Continuous Query Processor Alerts

CEP

MQTT

JMS

In-MemoryDataGrid

Integration

SocialMediaData

MarketData

SensorData

HistoricalData

In-MemoryDataGrid

EnterprisedataMarket Data

IoT

Mobile

Social

Browser / AppCommand & Control

ACTION

ContinuousQuery

© Copyright 2000-2016 TIBCO Software Inc.

Live UI in Desktop / Web Browser / Mobile App

Dynamic aggregation

Live visualization

Ad-hoc continuous query

Alerts

Action

© Copyright 2000-2016 TIBCO Software Inc.

Live UI - ProductsCharacteristics to Check• Alternative clients (rich client, browser,

mobile app)• Maturity for enterprise use cases• Performance and scalability• “Big data native” deployment (YARN, Mesos)• Monitoring and proactive actions• Streaming engine under the hood (not just

visualization layer)• New Ad-hoc queries by the business user

(without the help of IT department) • Various visual components• Extendibility (graphical designer vs. coding)

… or build your own solution using Websockets, Angular JS, etc.

© Copyright 2000-2016 TIBCO Software Inc.

Spoilt for Choice

Does it make sense to combine frameworks

and products?

© Copyright 2000-2016 TIBCO Software Inc.

Customer Example: Apache Storm + TIBCO Live Datamart

External Data

Snapshot Results

Continuous Query Processor

Query

TIBCO Live Datamart

Continuous Alerting

Active Tables Active Tables

Continuous Updates

Clients

Message Bus

Public Data

Customer Data

StreamBaseBolt

StreamBaseSpout

OperationalData

StreamBase Bolt and Spout connect Apache Storm to StreamBase to provide real-time analytics on operational data

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

• Real World Use Cases• Introduction to Streaming Analytics

• Market Overview• Relation to other Big Data Components• Live Demo

© Copyright 2000-2016 TIBCO Software Inc.

Closed Loop: Understand – Anticipate – Act

© Copyright 2000-2016 TIBCO Software Inc.

Closed Loop: Understand – Anticipate – Act

Insights Actions

MONITOR

PREDICT

ACT

DECIDE

MODEL

ORGANIZE

ANALYZE

WRANGLE

Data Discovery via Visual Analytics, Big Data and Machine Learning

AugmentedIntelligence

Operations

SENSOR DATA

TRANSACTIONS

MESSAGE BUS

MACHINE DATA

SOCIAL DATA

StreamingAnalyticsAction

Aggregate

Rules

StreamProcessing

Analytics

Correlate

Continuousqueryprocessing

Alerts

Manualaction,escalation

DataDiscoveryPython

R

DataScientists

CleansedData

History

VisualAnalytics

Spark

Integration

ERP MDM DB WMS

SOA/Microservices

BIGDATA

DataWarehouse,Hadoop

InternalData

IntegrationBus

API

EventServer

H2O.ai

LiveUI

Find Insights and Patterns in Historical Data

Visual Analytics + Machine Learning

Apply Insights and Analytic Models to Proactive Actions

Streaming AnalyticsH20.ai

Open Source R

TERR

Spark ML

MATLAB

SAS

PMML

© Copyright 2000-2013 TIBCO Software Inc.

80% of betting happens

AFTER the game begins

TODAY

Case Study: Streaming Analytics for Betting

• Situation: Today, 80% of Betting is Done After the Game Starts

• It’s not your father’s bookie anymore!

• Problem: How to Analyze Big Betting Data?• Thousands of concurrent games, constantly adjusting odds, dozens of

betting networks – firms must correlate millions of events a day to find the best betting opportunities in real-time

• Solution: TIBCO for Fast Data Architecture• TXOdds uses TIBCO to correlate, aggregate, and analyze large

volumes of streaming betting data in real-time and publish innovative predictive betting analytics to their customers

• Result: TXOdds First to Market with Innovative Zero Latency Betting Analytics

• Innovative real-time analytics help players who can process electronic data in real-time the edge

“With StreamBase, in two months we had our first betting analytics feed live, and we continually deploy new ideas and evolve our old ones.”

- Alex Kozlenkov, VP of technology, TXOdds

© Copyright 2000-2016 TIBCO Software Inc.

Big Data Architecture for Streaming Betting Analytics

Event Processing

MONITOR

REAL-TIME ANALYTICS

AGGREGATE

HISTORICAL COMPARISON

Predictive odds analytics

Zero Latency Betting Analytics

GLOBAL, DISTRIBUTED INFRASTRUCTURE

Historical odds deviations

BUS

BETTING LINES

SCORES

NEWS

HADOOP

Context: Historical Betting

Data, Odds, Outcomes

BUS

CACHE CACHE CACHE

Real-Time Analytics

CORRELATE

Live Datamart

SOCIAL

Real-Time Social Media Analytics

Twitter (#TomBradyBrokenLeg)

Twitter (#Boston)

Brady’s Stats

Actionable Insights

Twitter (#NFL)Something relevant happening?

Every second counts!

Change Odds (automated or manually triggered):

Stop live-betting for the current running game?• Who will win the game?• How many interceptions will the Quarterback throw?• Will the Patriots win the Super Bowl?• …

© Copyright 2000-2016 TIBCO Software Inc.

Real-Time Social Media Analytics

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

• Real World Use Cases• Introduction to Streaming Analytics

• Market Overview• Relation to other Big Data Components• Live Demo

Scenario: Predictive Scrapping of Parts in an Assembly Line

Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.

Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?

Station 1 Station 2

Cost Before9€ 7€ 13€ Total Cost

29€(or more)

Scrap? Scrap?

Big Data Architecture for Predictive Maintenance

OperationalAnalytics

OperationsLiveUI

CSV Batch

JSON Real Time

XML Real Time

StreamingAnalyticsAction

Aggregate

Rules

Analytics

Correlate

LiveDatamart

Continuousqueryprocessing

Alerts

Manualaction,escalation

HISTORICALANALYSIS DataScientists

FlumeHDFS

Spotfire

R/TERRHDFS

Hadoop (Cloudera)

StreamBase

TIBCO Fast Data Platform

H2O

OracleRDBMS

Avro Parquet … PMML

InternalData

Find Patterns à TIBCO Spotfire with H2O Integration

© Copyright 2000-2016 TIBCO Software Inc.

Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)

© Copyright 2000-2016 TIBCO Software Inc.

Apply Patterns à TIBCO StreamBase Connector for H2O.ai

Monitor Patterns à TIBCO Live Datamart

Augmented Intelligence (“Monitor the manufacturing process and change rules in real time!”)

Live Dartmart Desktop Client

Monitor Patterns à TIBCO Live Datamart

Augmented Intelligence (“Monitor the manufacturing process and change rules in real time!”)

Live Dartmart Web API

TIBCO Spotfire + StreamBase + Live Datamart + H2O.ai

Live DemoLive Demo

© Copyright 2000-2016 TIBCO Software Inc.

Key Take-Aways

• Streaming Analytics processes Data while it is in Motion!

• Automation and Proactive Human Interaction are BOTH needed!

• Streaming Analytics is Complementary to Hadoop and Machine Learning!

Questions? Please contact me!

Kai WähnerTechnology Evangelist

[email protected]@KaiWaehnerwww.kai-waehner.deLinkedIn