federated stream processing support for real-time business ... · maxstream architecture: from...

21
Federated Stream Processing Support for Real-Time Business Intelligence Applications Irina Botan, Younggoo Cho, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Kihong Kim, Nesime Tatbul

Upload: others

Post on 16-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Federated Stream Processing Support for Real-Time Business Intelligence Applications

Irina Botan, Younggoo Cho, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Kihong Kim, Nesime Tatbul

Page 2: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Introduction

• Business Intelligence (BI) enables better decision-making for businesses.

• In operational BI, real-time response to business events is critical, which requires:– reducing latency

– providing rich contextual information

We propose MaxStream federated stream processing system as a platform to meet these needs.

2VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 3: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Talk Outline

• Example Use Cases & Motivation

• MaxStream System

– Architecture

– Usage

– Feasibility

• Conclusions & Open Challenges

3VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 4: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Example Use Cases

• Supply-Chain Optimization

• Call Center Management

• Quality Management in Manufacturing

• SLA Monitoring and Maintenance

• Global Shipment & Delivery Monitoring

• Fraud Detection in Financial Companies

• Real-time Marketing

• …

Different levels of latency and data persistence requirements

4VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 5: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

e.g., Call Center Management

• Multiple centers across the globe

• Every incoming call is captured with arrival time, service start and end times

• Main BI tasks:

– Run statistics on wait time, service duration, etc. for different regions

– Generate reports, analyzing problems and proposing strategic improvements

5VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 6: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

MaxStream Architecture: From 30,000 ft

• Key ideas:

– Uniform query language and API

– Relational database infrastructure as the basis for the federation layer (in our case: SAP MaxDB and SAP MaxDB Federator)

– “Just enough” streaming capability inside the federation layer

Data Agent

Client Application

Federation Layer

DBDB

Wrapper Wrapper Wrapper

SPESPE

6VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 7: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Putting MaxStream into Context

• vs. Federated Databases– Less focus on data locality, more focus on functional

heterogeneity

• vs. Stream Processing Engines (SPEs)– Unlike distributed SPEs, there may be heterogeneity

– Unlike stream-relational SPEs, MaxStream federator is not a full-fledged SPE

• vs. Business Intelligence Software– Tighter integration between (possibly heterogeneous)

SPEs and databases

7VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 8: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

MaxStream Architecture: A Closer Look

8VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

SQL Parser

Query Rewriter

Query Optimizer

Query ExecuterSQL DialectTranslator

MaxStreamFederator

Client Application

Output EventTables

Input EventTables

Metadata

DDL/DML statements in MaxStream’s SQL Dialect

Ou

tpu

t Ev

en

ts

Data Agent for SPE

SPE’s SDK

SPE

MaxDB ODBC

DDL/DML in SPE’s SQL

InputEvents

Data Agent

DBDB

Data Agent Data Agent for SPE

SPE’s SDK

SPE

MaxDB ODBC

Page 9: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

MaxStream ArchitectureTwo Key Building Blocks

• Streaming Inputs through MaxStream

– ISTREAM Operator for Persistent input events

– Tuple Queues for Transient input events

• Streaming Outputs through MaxStream

– Monitoring Select over Event Tables

• Persistent Event Tables for Persistent output events

• In-Memory Event Tables for Transient output events

9VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 10: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Streaming Persistent Input Events

• The ISTREAM (“Insert STREAM”) Operator

– Relation-to-Stream operator first proposed by Widom et al. [STREAM Project], that streams new tuples being inserted into a given relation.

– Example:INSERT INTO STREAM CallStream

SELECT OpCode, ArrivalTime, StartTime, EndTime

FROM ISTREAM(CallTable);

r1

r2

r3

r1

r2

r3

r4

r5

T+1T

ISTREAM(CallTable) at T+1 returns:

<r4, T+1>, <r5, T+1>

10VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 11: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Streaming Output Events

• Opposite of streaming input events, but…

– Unlike the SPE interface, the client application interface is not push-based.

• Alternative solutions:

– Each client monitors its own alerts on a given table.

• cumbersome and error-prone

– A monitoring program does so for all registered clients using periodic select queries (i.e., polling) or triggers.

• Not event-driven, inefficient, not scalable

11VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 12: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Streaming Output Events

• Our Solution: Monitoring Select

– Select operation blocks until there is at least one row to return.

– For continuous monitoring, the client program re-issues Monitoring Select in a loop.

– Monitoring Select operates on “Event Tables”.

• Example: Detect calls with unusually long waiting times.

12

SELECT *

FROM /*+ EVENT */ CallAnalysis

WHERE AvgWait > 10;

VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 13: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Hybrid Queries in MaxStream

• Hybrid queries are continuous queries that join Streams with Tables

– Similar to joining Fact tables with Dimension tables in data warehouses

• One can conveniently use hybrid queries in MaxStream in two ways:

– To enrich the input stream before it is passed to the SPE

– To enrich the output stream after it is received from the SPE

13VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 14: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Hybrid Queries: Call Center Example

14

CREATE TABLE CallTable (Opcode, ArrivalTime, StartTime, EndTime);

INSERT INTO STREAM CallStreamSELECT o.RegionNm AS Region, c.StartTime-c.ArrivalTime AS WaitTime,

c.EndTime-c.StartTime AS DurationFROM ISTREAM(CallTable) c, OperatorsbyRegion oWHERE c.Opcode = o.Operator;

INSERT INTO TABLE CallAnalysisSELECT Region, COUNT(*) AS Cnt, AVG(WaitTime) AS AvgWait,

AVG(Duration) AS CallLengthFROM CallStreamGROUP BY RegionKEEP 1 HOUR;

ContinuousQueryin SPE:

Enrichingthe output inMaxStream:

Enriching the input inMaxStream:

VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

SELECT a.Region, a.AvgWait, a.AvgDuration, r.NOps, r.TrainingFROM /* +Event */ CallAnalysis a, Regions rWHERE AvgWait > 10

AND a.Region = r.RegName;

Page 15: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Initial Feasibility Study

• Goal: to show

– if MaxStream is useful in supporting real-time BI applications

– whether MaxStream’s performance overhead is acceptable

• Setup: SAP Sales and Distribution Benchmark

– Persistent events, Throughput critical

– Original benchmark: No streaming

– We add streaming and compare the following two setups:• SD vs. SD with MaxStream/ISTREAM + SPE “X”

• SD vs. SD with MaxStream/Monitoring-Select

15VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 16: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

SAP Sales and Distribution (SD) Benchmark

• It is a business benchmark that models a sell-from-stock scenario that consists of 6 transactions, each with 1-4 dialog steps and around 10 seconds of think-time for each.

– Example transactions: Create customer order document, Create order delivery document, Create invoice, etc.

• Measure: throughput in the number of processed dialog steps per minute (SAPs).

16VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 17: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Use of MaxStream in SAP SD Benchmark

MaxStream/ISTREAM + SPE “X”

• Stream incoming orders.

• Forward sales orders to SPE “X” via MaxStream in order to continuously compute the daily sum of sales orders for each product and region.

MaxStream/Monitoring-Select

• Monitor big sales.

• Continuously monitor big sales orders (i.e., with amount > 95) by storing purchase orders in an event table and running Monitoring Select over it.

17

INSERT INTO STREAM SalesOrderStreamSELECT A.MANDT, A.VBELN, A.NETWR,

B.POSNR, B.MATNR, B.ZMENGFROM ISTREAM(VBAK) A, VBAP B

WHERE A.MANDT = B.MANDTAND A.VBELN = B.VBELN;

SELECT A.MANDT, A.VBELN, B.KWMENGFROM /*+ EVENT */ VBAK A, VBAP B

WHERE A.NETWR > 95AND A.MANDT = B.MANDTAND A.VBELN = B.VBELN;

VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 18: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

MaxStream SAP SD Benchmark Performance

SD SD with ISTREAM SD with Monitoring-Select

# of SD Users 16,000 16,000 16,000

Throughput (SAPs) 95,910 95,910 95,846

Dialog Response Time (msec)

13 13 13

DB Server CPU Utilization (%)

49.8% 50.6% 50.1%

18

SD with streaming features achieves similar performanceas the standard one.

VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 19: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Conclusions

• Real-time BI requires new platforms which offer– low latencies of stream processing

– support for analytics of data warehouses

– flexible, dynamic access to data of data federation engines

• MaxStream stream federation engine provides– access to heterogeneous SPEs and DBs

– flexible persistence and data federation capabilities

• MaxStream is low-overhead and useful in various operational BI scenarios.

19VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 20: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Open Challenges

• Unified continuous query execution model and semantics

• Cost- and Capability-based query optimization and dispatching over multiple SPEs

• Transactional aspects of federated stream processing

• Distributed operation aspects (e.g., load balancing, high availability)

20VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich

Page 21: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure

Thanks!

• You

• MaxStream team

• Chan Young Kwon (SAP Labs, Korea)

• ETH Zurich Enterprise Computing Center (ECC)

• More information:http://www.systems.ethz.ch/research/projects/maxstream/

21VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich