use this title slide only with an image towards a web-scale data management ecosystem demonstrated...
TRANSCRIPT
Use this title slide only with an image
Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA
Stefan Bäuerle, Jonathan Dees, Franz Faerber, Wolfgang Lehner
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 2Public
• Motivation & Requirements
• Different Processing Engines and Integration
• Scale out edition engine
Agenda
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public
Application requirements for a modern DBMS
data types consumption models data models notions of consistency application and query language levels of scaling hardware capabilities
Different:
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 4Public
HANA Platform
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public
HANA System
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public
Beyond relational data processing (1/3)
Bringing OLAP and OLTP together • Proven: works in thousands of customer systems• Simplicity: get rid of extracts, loads and redundancy, one system• OLAP dominates OLTP in real world systems: optimize accordingly
Data mining and prediction • Examples: Basked analysis, different forecasting algorithms…• Easy interaction with R and SAS
Unstructured data • Support text search > 30 languages including:• Stemming, speech tagging, noun extractions, …• Classification, clustering, named entity recognition, sentinel analysis
Planning extensions • Planning: Define and align business figures for foreseeable future• Data heavy operators like disaggregation or logical snapshots
• Integrate as deep as possible into the engine
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public
Beyond relational data processing (2/3)
Graph processing • Real world business data often resembles graphs• Model as graph: More explicit and more efficient operators• Distance, siblings, shortest path, reachability, transitive closure, …
Hierarchy processing • Special type of general graphs• Used by almost every business application• Support for time dependent and versioned hierarchies• Extended graph operators: level, neighbor, is_ancestor, …
Geospatial processing &Time series
• Native relational data types• Existing compression techniques + powerful specializations for sensor data• Spatial: WithinDistance, Contains, Area, …• Time series: Group by time interval, Interpolate Missing Values, …
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 8Public
Beyond relational data processing (3/3)
Scientific processing • Bring prominent operators into the engine• Simplifies and speeds up operations in scientific and financial area• Matrix operators: Eigenvalue, Multiply, …• Financial operators: Interest Rates, GarmanKohlagenProcess, …
No SQL processing • Document based models, XML, JSON, … • Key value stores• Flexible Schema, in HANA via specific flexible table type
Massive scale out • Conventional business applications fit on single box, but:there is a new kind of applications requiring massive scale out
• Deep and seamless integration with the Hadoop system• Scale out and single box application act as one system
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 9Public
Application integration ( examples )
Currency conversion
Hierarchy handling
Aging / dynamic tiering
Dictionary maintenance
Graph optimizations
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 10Public
HANA Data PlatformDynamic Tiering
HANA Dynamic Tiering Declare table to use disk storage Cost efficient for big data Optimized disk based processing powered by IQ
New warm option beside Hot (in-memory) Cold (Near Linear Storage)
CREATE TABLE „demo“.“SalesOrders_WARM“ (ID Integer NOT NULL, CustomerID Integer NOT NULL, OrderDate date NOT NULL, …,PRIMARY KEY (id)
) USING EXTENDED STORAGE;
INSERT INTO „demo“.“SalesOrders_WARM“ VALUES ( … );
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public
HANA Data PlatformBigData | Vision
HANA native BigData Dynamic Tiering Smart Data Streaming NoSQL | Graph | Geo |
TimeSeries
HANA & Hadoop SDA Hive | Spark MapReduce | HDFS Admin & Monitoring User Mgmt / Security
Hadoop Extension Velocity Engine Integrated with HANA and
Hadoop
HANA Data Management Platform
Instant Results
SAP HANAIn-Memory
Warm Data
HANADynamic Tiering
0.1sec ∞Infinite Storage Raw Data
HADOOPHANA Scale Out
Information Management | Text | Search | Graph | Geospatial | Predictive
Smart Data Streaming
Administration | Monitoring | Operations | User Management | Security
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public
SAP HANA Massive Scale Out Edition (Velocity)
Motivation:
• Engine for massive scale out and big data
Key Features:
• Scale to thousands of nodes• Different data freshness and consistency levels• Efficient fail safety design
• First class citizen within Hadoop (Spark)
• Support variety of hardware and operating systems
• Extreme query performance by compiling SQL to native code
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public
SAP HANA SOE (Velocity) and Hadoop (1/2)
Ambari Cluster Management
Hadoop Ecosystem
Zook
eepe
r C
oord
inat
ion
PigScripting
MLibMachine Learning
HiveSQL
SparkSQLSQL
Yarn Processing
HDFS Distributed File System HB
ase
D
atab
ase
Spark Processing
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public
SAP HANA SOE (Velocity) and Hadoop (2/2)
Steps Stage 1: Integration
with Spark (2015) Stage 2: Independent
execution cluster
Benefits Integration of SAP data
with data lakes HANA features add Value
into Hadoop(e.g. SQL extensions like time series, hierarchies, …)
Performance Holistic data platform
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public
Architecture to Support Different Data Freshness Levels
DTX
Query Engine 1
Transaction BrokerVersion Table
A, B, C
Query Engine 2
Query Engine 3
R
Storage 1
Storage n
Storage 2
Distributed Log
R
……
…
R
R
R
A, D
A, C, D
DQP
Storage (checkpoints)
Connection n
Connection 1(Session data)
• Options• read your own writes• up-to-date data vs. certain age
• Separate component for Transactions
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public
SAP HANA scale out integration
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public
Conclusion
• Today’s applications have multidimensional set of specialized requirements
• Gains from moving these requirements into a (single) DBMS:• Simplified and more explicit data modeling and processing for applications• Increased performance• No complicated data transfer between specialized engines
• Powerful orchestration required
• Web-scale processing is key to support new applications
SAP HANA strives to answer all these requirements in a single data management platform.
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Thank you