streaming visualization - doag.org · time extract-transform-load (etl) and data integration use...

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF

HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Streaming Visualization

Guido Schmutz DOAG Big Data 2018 – 20.9.2018

@gschmutz guidoschmutz.wordpress.com

Guido Schmutz

Working at Trivadis for more than 21 years

Oracle ACE Director for Fusion Middleware and SOA

Consultant, Trainer Software Architect for Java, Oracle, SOA and

Big Data / Fast Data

Head of Trivadis Architecture Board

Technology Manager @ Trivadis

More than 30 years of software development experience

Contact: guido.schmutz@trivadis.com

Blog: http://guidoschmutz.wordpress.com

Slideshare: http://www.slideshare.net/gschmutz

Twitter: gschmutz

Agenda

1. Visualization in Big Data Reference Architecture

2. How to implement „Data-in-Motion“?

3. Blueprints for Streaming Visualization

4. Blueprints for Stream Visualization – Implementation

Visualization in Big Data Reference

Architecture

Data Value Chain

Milliseconds • Place Trace • Serve ad • Enrich Stream • Approve Trans

Hundredths of Seconds • Calculate Risk • Leaderboard • Aggregate • Count

Second(s) • Retrieve Click

Stream • Show orders

Minutes • Backtest algo • BI • Daily Reports

Hours • Algo discovery • Log analysis • Fraud pattern match

Architekturen von Big Data Anwendungen

Traditional BI Infrastructures

Enterprise Data

Warehouse

ETL / Stored

Procedures

Bulk Source

Extract

Architekturen von Big Data Anwendungen

BI Tools

Search / Explore

Enterprise Apps

high latency

Bulk Source

Hadoop Clusterd Hadoop Cluster

Big Data Platform

BI Tools

Enterprise Data

Warehouse

Search / Explore Parallel

Processing

Storage

Results

high latency

Enterprise Apps

File Import / SQL Import

Extract

Big Data solves Volume and Variety – not Velocity

Introduction to Stream Processing

Bulk Source

Big Data Platform

BI Tools

Enterprise Data

Warehouse

Search / Explore Parallel

Processing

Storage

Results

high latency

Enterprise Apps

Extract

Event Source

Location

Telemetry

Mobile

Social

Event Stream

Bulk Source

Big Data Platform

BI Tools

Enterprise Data

Warehouse

Search / Explore

• Machine Learning • Graph Algorithms • Natural Language Processing

Parallel

Processing

Storage

Results

high latency

Enterprise Apps

Extract

Event Stream

Event Source

Location

Mobile

Social

Hub Event

Telemetry

"Data at Rest" vs. "Data in Motion"

Data at Rest Data in Motion

Analyze

Store Act

Analyze

111010101010110

Hub Event

Stream Analytics

Platform

Stream Processing Architecture solves Velocity

BI Tools

Enterprise Data

Warehouse

Search / Explore

Enterprise Apps

Search

Results Stream Analytics

Reference /

Models

Dashboard

Stream

Bulk Source

Event Source

Location

Extract

Mobile

Social

Low(est) latency, no history

Telemetry

Stream Analytics

Platform

Big Data for all historical data analysis

BI Tools

Enterprise Data

Warehouse

Search / Explore

Enterprise Apps

Search

Reference /

Models

Dashboard

Stream

Big Data Platform

Parallel

Processing

Storage

Results

Data Flow Event

Stream

Bulk Source

Event Source

Location

Extract

Mobile

Social

Telemetry

Data Store

Integrate existing systems through CDC

Event Hub

Integration

Consuming Systems

State Logic

CDC Connector

Traditional Silo-based

System

Logic User Interface

Capture changes directly on database

Change Data Capture (CDC) => think like

a global database trigger

Transform existing systems to event

producer

Stream

Stream Analytics

Platform

Integrate existing systems with lower latency through CDC

BI Tools

Enterprise Data

Warehouse

Search / Explore

Enterprise Apps

Search

Reference /

Models

Dashboard

Big Data Platform

Parallel

Processing

Storage

Results

Stream

Data Flow Event

Stream

Bulk Source

Event Source

Location

Extract

Mobile

Social

Telemetry

Big Data

Unified Architecture for Modern Data Analytics Solutions

Search

BI Tools

Enterprise Data

Warehouse

Search / Explore

Parallel

Processing

Storage

Results

Microservice State

Stream

Processor State

Stream

Service

Stream Analytics

Microservices

Enterprise Apps

Edge Node

Event Hub

Storage

Bulk Source

Event Source

Location

Extract

Mobile

Social

Event Stream

Telemetry

Two Types of Stream Processing

(from Gartner)

Stream Data Integration

• primarily focuses on the ingestion and

processing of data sources targeting real-

time extract-transform-load (ETL) and data

integration use cases

• filter and enrich the data

• optionally calculate time-windowed

aggregations before storing the results in a

database or file system

Stream Analytics

• targets analytics use cases

• calculating aggregates and detecting

patterns to generate higher-level, more

relevant summary information (complex

events)

• Complex events may signify threats or

opportunities that require a response from

the business through real-time dashboards,

alerts or decision automation

How to implement „Data-in-

Motion“?

”Data-in-Motion” Ecosystem

Stream Analytics

Event Hub

Open Source Closed Source

Stream Data Integration

Source: adapted from Tibco

Apache Kafka – A Streaming Platform

High-Level Architecture

Distributed Log at the Core

Scale-Out Architecture

Logs do not (necessarily) forget

Blueprints for Stream Visualization

1) Direct Streaming to the Consumer

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

Data Sources

2) Use a fast datastore and do regular polling from

consumer

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

3) Use stateful Stream Analytics and query directly the

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Blueprints for Stream Visualization

- Impementation

Visualization: many many options! But do they support

Streaming Data?

Oracle Stream Analytics

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

Data Sources

Oracle Stream Analytics

• Stream Analytics and Visualization in

• offers real-time actionable business

insight on streaming data

• automates action to drive today’s agile businesses (business user)

• Runs on top of Spark Streaming

• Cloud and on-premises

• Data Sources: Kafka, JMS, GoldenGate,

Web Sockets / SSE / Custom Java Script Application

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow Sever Sent Event (SSE)

Slack / WhatsApp / Twitter / …

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

WebSockets vs. Server Sent Events (SSE)

WebSockets

• provide a richer protocol to perform bi-

directional, full-duplex communication

• require full-duplex connections and

new Web Socket servers to handle the

protocol

• Having a two-way channel is more

attractive for things like games,

messaging apps, and for cases where

you need near real-time updates in

both directions

• SSEs are sent over traditional HTTP

• do not require a special protocol or

server implementation to get working

• If only one direction is necessary,

• Server-Sent Events on the other hand,

have been designed from the ground

up to be efficient

KSQL / REST API / Custom App

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

KSQL & Arcadia Data

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Arcadia Data

• Combines Batch and Streaming

Visualization in one

• Streaming Visualizations based on

Confluent KSQL (Kafka)

• Acadia Instant and Arcadia Enterprise

Druid & Superset / Imply

Stream

Analytics

Event Hub

Integration

Visualization

Data Flow

What is Druid?

• Open Source Time Series DB by

Metamarkets

• Apache Incubating

• Column-Oriented Storage

• Streaming and Batch Ingest

• Time optimized partitioning

• SQL Support

• Deep Storage can be HDFS / S3

• Commercial offering of Druid

• Built around Apache Druid

• Analytics, search and intelligence for

event-driven data

Superset

• Open source data visualization tool by

Airbnb

• Apache incubator

• Superset supports 30 types of

visualizations

• easy-to-use interface for exploring and

visualizing data

• Create and share dashboards

• Deep integration with Druid

• Integration with most SQL-speaking

RDBMS through SQLAlchemy

Elasticsearch / Kibana

Stream

Analytics

Event Hub

Integration

Visualization

Data Flow

Elasticsearch / Kibana

Elasticsearch

• NoSQL store

• a distributed, RESTful search and analytics

engine

• centrally stores your data so you can

discover the expected and uncover the

unexpected

• lets you perform and combine many types

of searches — structured, unstructured,

geo, metric

• aggregations let you zoom out to explore

trends and patterns in your data

Kibana

• Window into Elasticsearch

• Enables visual exploration and analysis of

data stored in Elasticsearch

InfluxDB / Grafana or Chronograf

Stream

Analytics

Event Hub

Integration

Visualization

Data Flow

InfluxDB

• Popular Time Series Database

• Open source as well as Commercial offering

Chronograf

Grafana

Grafana allows to query, visualize, alert

and understand metrics independent of

their storage

Supports various datasources

• Elasticsearch

• InfluxDB

• Prometheus

• OpenTSDB

• MySQL

• …

Technology on its own won't help you. You need to know how to use it properly.

streaming visualization - doag.org · time extract-transform-load (etl) and data integration use...

Documents

background document for deep-sea sponge aggregations

using obiee to retrieve essbase data - doag.org

stereo visual odometry with windowed bundle...

windowed interfaces

oracle forms 12c - doag.org

execution primitives for scalable joins and aggregations...

temporal, geographical and categorical aggregations viewed

multiple aggregations over data stream

approximating aggregations in probabilistic … › people...

designed particles aggregations 01

radio spectrum estimates using windowed data and the

spawning aggregations and migration patterns of...

seismic data processing with the parallel windowed ... ·...

seismic data processing with the parallel windowed

multiple aggregations over data streams

10 sorting & aggregations - cmu 15-445

a windowed fourier pseudospectral method for hyperbolic

two-dimensional trust rating aggregations in service

social threshold aggregations

windowed cross–correlation and peak picking for...