streaming visualization - doag.org · time extract-transform-load (etl) and data integration use...

Post on 28-Oct-2019

13 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF

HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Streaming Visualization

Guido Schmutz DOAG Big Data 2018 – 20.9.2018

@gschmutz guidoschmutz.wordpress.com

Guido Schmutz

Working at Trivadis for more than 21 years

Oracle ACE Director for Fusion Middleware and SOA

Consultant, Trainer Software Architect for Java, Oracle, SOA and

Big Data / Fast Data

Head of Trivadis Architecture Board

Technology Manager @ Trivadis

More than 30 years of software development experience

Contact: guido.schmutz@trivadis.com

Blog: http://guidoschmutz.wordpress.com

Slideshare: http://www.slideshare.net/gschmutz

Twitter: gschmutz

Agenda

1. Visualization in Big Data Reference Architecture

2. How to implement „Data-in-Motion“?

3. Blueprints for Streaming Visualization

4. Blueprints for Stream Visualization – Implementation

• ,

Visualization in Big Data Reference

Architecture

Data Value Chain

Milliseconds • Place Trace • Serve ad • Enrich Stream • Approve Trans

Hundredths of Seconds • Calculate Risk • Leaderboard • Aggregate • Count

Second(s) • Retrieve Click

Stream • Show orders

Minutes • Backtest algo • BI • Daily Reports

Hours • Algo discovery • Log analysis • Fraud pattern match

Architekturen von Big Data Anwendungen

Traditional BI Infrastructures

Enterprise Data

Warehouse

ETL / Stored

Procedures

Bulk Source

DB

Extract

File

DB

Architekturen von Big Data Anwendungen

BI Tools

Search / Explore

Enterprise Apps

Logic

{ }

API

high latency

Bulk Source

Hadoop Clusterd Hadoop Cluster

Big Data Platform

BI Tools

Enterprise Data

Warehouse

SQL

Search / Explore Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

high latency

Enterprise Apps

Logic

{ }

API

File Import / SQL Import

DB

Extract

File

DB

Big Data solves Volume and Variety – not Velocity

Introduction to Stream Processing

Bulk Source

Hadoop Clusterd Hadoop Cluster

Big Data Platform

BI Tools

Enterprise Data

Warehouse

SQL

Search / Explore Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

high latency

Enterprise Apps

Logic

{ }

API

File Import / SQL Import

DB

Extract

File

DB

Event Source

Location

Telemetry

IoT

Data

Mobile

Apps

Social

Big Data solves Volume and Variety – not Velocity

Introduction to Stream Processing

Event Stream

Bulk Source

Hadoop Clusterd Hadoop Cluster

Big Data Platform

BI Tools

Enterprise Data

Warehouse

SQL

Search / Explore

• Machine Learning • Graph Algorithms • Natural Language Processing

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

high latency

Enterprise Apps

Logic

{ }

API

File Import / SQL Import

DB

Extract

File

DB

Event Stream

Event Source

Location

IoT

Data

Mobile

Apps

Social

Big Data solves Volume and Variety – not Velocity

Introduction to Stream Processing

Event

Hub Event

Hub Event

Hub

Telemetry

"Data at Rest" vs. "Data in Motion"

Data at Rest Data in Motion

Store

Act

Analyze

Store Act

Analyze

111010101010110

111010101010110

Introduction to Stream Processing

Event

Hub Event

Hub

Hadoop Clusterd Hadoop Cluster

Stream Analytics

Platform

Stream Processing Architecture solves Velocity

BI Tools

Enterprise Data

Warehouse

Event

Hub

Search / Explore

Enterprise Apps

Search

Results Stream Analytics

Reference /

Models

Dashboard

Logic

{ }

API

Event

Stream

Event

Stream

Event

Stream

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

Introduction to Stream Processing

Low(est) latency, no history

Telemetry

Hadoop Clusterd Hadoop Cluster

Stream Analytics

Platform

Big Data for all historical data analysis

BI Tools

Enterprise Data

Warehouse

Search / Explore

Enterprise Apps

Search

Results Stream Analytics

Reference /

Models

Dashboard

Logic

{ }

API

Event

Stream

Event

Stream

Hadoop Clusterd Hadoop Cluster

Big Data Platform

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

Data Flow Event

Hub

Event

Stream

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

File Import / SQL Import

Introduction to Stream Processing

Telemetry

Data Store

Integrate existing systems through CDC

Data

Event Hub

Integration

Consuming Systems

State Logic

CDC

CDC Connector

Traditional Silo-based

System

Logic User Interface

Capture changes directly on database

Change Data Capture (CDC) => think like

a global database trigger

Transform existing systems to event

producer

Event

Stream

Event

Stream

Introduction to Stream Processing

Hadoop Clusterd Hadoop Cluster

Stream Analytics

Platform

Integrate existing systems with lower latency through CDC

BI Tools

Enterprise Data

Warehouse

Search / Explore

Enterprise Apps

Search

Results Stream Analytics

Reference /

Models

Dashboard

Logic

{ }

API

Hadoop Clusterd Hadoop Cluster

Big Data Platform

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

File Import / SQL Import

Event

Stream

Event

Stream

Data Flow Event

Hub

Event

Stream

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

Introduction to Stream Processing

Telemetry

Hadoop Clusterd Hadoop Cluster

Big Data

Unified Architecture for Modern Data Analytics Solutions

SQL

Search

BI Tools

Enterprise Data

Warehouse

Search / Explore

File Import / SQL Import

Event

Hub

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

Microservice State

{ }

API

Stream

Processor State

{ }

API

Event

Stream

Event

Stream

Service

Stream Analytics

Microservices

Enterprise Apps

Logic

{ }

API

Edge Node

Rules

Event Hub

Storage

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

Event Stream

Telemetry

Two Types of Stream Processing

(from Gartner)

Introduction to Stream Processing

Stream Data Integration

• primarily focuses on the ingestion and

processing of data sources targeting real-

time extract-transform-load (ETL) and data

integration use cases

• filter and enrich the data

• optionally calculate time-windowed

aggregations before storing the results in a

database or file system

Stream Analytics

• targets analytics use cases

• calculating aggregates and detecting

patterns to generate higher-level, more

relevant summary information (complex

events)

• Complex events may signify threats or

opportunities that require a response from

the business through real-time dashboards,

alerts or decision automation

How to implement „Data-in-

Motion“?

are

”Data-in-Motion” Ecosystem

Stream Analytics

Event Hub

Open Source Closed Source

Stream Data Integration

Source: adapted from Tibco

Edge

Introduction to Stream Processing

Apache Kafka – A Streaming Platform

High-Level Architecture

Distributed Log at the Core

Scale-Out Architecture

Logs do not (necessarily) forget

Blueprints for Stream Visualization

are

1) Direct Streaming to the Consumer

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

Data Sources

2) Use a fast datastore and do regular polling from

consumer

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

3) Use stateful Stream Analytics and query directly the

store

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Consumer Data Sources

Blueprints for Stream Visualization

- Impementation

are

Visualization: many many options! But do they support

Streaming Data?

Oracle Stream Analytics

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

Data Sources

Oracle Stream Analytics

• Stream Analytics and Visualization in

one

• offers real-time actionable business

insight on streaming data

• automates action to drive today’s agile businesses (business user)

• Runs on top of Spark Streaming

• Cloud and on-premises

• Data Sources: Kafka, JMS, GoldenGate,

File

Web Sockets / SSE / Custom Java Script Application

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow Sever Sent Event (SSE)

Slack / WhatsApp / Twitter / …

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

WebSockets vs. Server Sent Events (SSE)

WebSockets

• provide a richer protocol to perform bi-

directional, full-duplex communication

• require full-duplex connections and

new Web Socket servers to handle the

protocol

• Having a two-way channel is more

attractive for things like games,

messaging apps, and for cases where

you need near real-time updates in

both directions

SSE

• SSEs are sent over traditional HTTP

• do not require a special protocol or

server implementation to get working

• If only one direction is necessary,

• Server-Sent Events on the other hand,

have been designed from the ground

up to be efficient

KSQL / REST API / Custom App

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Consumer Data Sources

KSQL & Arcadia Data

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Consumer Data Sources

Arcadia Data

• Combines Batch and Streaming

Visualization in one

• Streaming Visualizations based on

Confluent KSQL (Kafka)

• Acadia Instant and Arcadia Enterprise

Druid & Superset / Imply

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

What is Druid?

• Open Source Time Series DB by

Metamarkets

• Apache Incubating

• Column-Oriented Storage

• Streaming and Batch Ingest

• Time optimized partitioning

• SQL Support

• Deep Storage can be HDFS / S3

Imply

• Commercial offering of Druid

• Built around Apache Druid

• Analytics, search and intelligence for

event-driven data

Superset

• Open source data visualization tool by

Airbnb

• Apache incubator

• Superset supports 30 types of

visualizations

• easy-to-use interface for exploring and

visualizing data

• Create and share dashboards

• Deep integration with Druid

• Integration with most SQL-speaking

RDBMS through SQLAlchemy

Elasticsearch / Kibana

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

Elasticsearch / Kibana

Elasticsearch

• NoSQL store

• a distributed, RESTful search and analytics

engine

• centrally stores your data so you can

discover the expected and uncover the

unexpected

• lets you perform and combine many types

of searches — structured, unstructured,

geo, metric

• aggregations let you zoom out to explore

trends and patterns in your data

Kibana

• Window into Elasticsearch

• Enables visual exploration and analysis of

data stored in Elasticsearch

InfluxDB / Grafana or Chronograf

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

InfluxDB

InfluxDB

• Popular Time Series Database

• Open source as well as Commercial offering

Chronograf

Grafana

Grafana allows to query, visualize, alert

and understand metrics independent of

their storage

Supports various datasources

• Elasticsearch

• InfluxDB

• Prometheus

• OpenTSDB

• MySQL

• …

Technology on its own won't help you. You need to know how to use it properly.

top related