sap hana vora sitmty 20160707
TRANSCRIPT
SAP HANA VoraBridging the gap between Corporate and Big DataHenrique Pinto, Global HANA COEJuly 2016
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2
The Five Megatrends Driving Our Digitized WorldAnd Their Implications for Distributed Big Data Management
Hyper ConnectivityEverybody has
access
Super Computing
Super computers power everywhere
Cloud Computing
The cloud is where we compute
Smart World
Your fridge knows what you want for
dinner
Cyber-Security
High-powered security is now the
norm
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 3
Hadoop and Spark at a Glance
What is it?• Scalable fault-tolerant and distributed file system• Sits on top of a native file system• HDFS (Hadoop File System) is an append-only
file system, designed for batch, not real-time• Splits files in blocks and distributes them to data
nodes
Why?• Organizations want more business value from Big
Data• Hadoop configurations scale and perform at very
low cost• Hadoop complements Data Warehouses, Data
Integration and Analytics, but doesn’t replace them
Data Processing• MapReduce was invented to query data
residing in a Hadoop file system• MapReduce was not designed for interactive
queries but long running batch jobs• For more details see
http://hortonworks.com/hadoop/mapreduce/
• An open source in-memory analytics execution engine for fast, large-scale data processing
• Used on top of Hadoop• Does not replace Hadoop• Built to replace MapReduce
Hadoop and Spark at a Glance
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4
What’s Stopping Us?The Digital Divide between Enterprise and Big Data
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Internal
Too Complex Too Slow Unable to Work Together
ENTERPRISE BIG DATA
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 5
ENTERPRISE BIG DATA
Bridging the Digital Divide
Introducing
SAP HANA Vora
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 5Internal
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 6
SAP HANA VoraWhat’s Inside and What Does It Do?
DemocratizeData Access
Make PrecisionDecisions
SimplifyBig DataOwnership
SAP HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. Drill Downs on HDFS
Mashup API EnhancementsCompiled Queries
HANA-Spark ControllerUnified LandscapeOpen Programming
Any Hadoop Clusters
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 7
YARN
HDFS
Enable Precision DecisionsWith Contextual Insights In Enterprise Systems
Other Apps
Files Files Files
HANA-Spark Controller for improved performance between distributed systems
Gain business coherence with business data and big data
Compiled queries enable applications & data analysis to work more efficiently across nodes
Familiar OLAP experience on Hadoop to derive business insights from big data such as drill-down into HDFS data
Compiled Queries
Spark Controller
Drill Downs
SAP HANA in-memory platform
Vora
Spark
Vora
SparkIn-Memory
StoreApplication Services
Database Services
Integration Services
Processing Services
SAP HANA Platform
Vora
SparkHANA Smart Data
Access Spark Controller
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 8
Democratize Data Access for Data Science Discovery
Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice,
Pursue new inquiries without compromise on data and easily integrate these insights with all data
Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily
Optionally, leverage HANA’s multiple data processing engines for developing new insights from business and contextual data.
Mashup Enhancements
Open Programming
Optional Use of SAP HANA for Delegated, multi-engine pre-processing
Spark Data-source API enhancement
In-Memory Store
SAP HANA Platform
YARN
HDFSFiles Files Files
Vora
Spark
Vora
Spark
Vora
Spark
Application Services
Database Services
Integration Services
Processing Services
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 9
Vora Modeling Tool
• Vora Tools use the Thriftserver to provide access to the Modeler underhttp://<DNS_NAME_OF_JUMPBOX_NODE>:9225
• Perspectives:• Data Browser
• SQL Editor
• Modeler
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 10
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 11
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 12
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 13
SAP HANA Vora: Use Cases
Fraud Detection
Get access to all your data including historical and contextual trends and current business datato analyze anomalies
Risk Mitigation
Be assured of more precise data to perform
Monte Carlo simulations to produce distributions of
possible outcome values with more precise context
Targeted Marketing Campaigns
React rapidly to customer sentiment and pinpoint targeting for sales and
marketing campaigns with a more complete view of
customer needs and wants
360° Customer Service
Ensure a more complete picture of the customer with
analysis of unstructured customer data, such as social media profiles, emails, calls, complaint logs, discussion forums, and website history
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 14
Challenges Solution Why Vora• Current DW with more than 100TB of
data at end of life and not cost effective anymore
• Regulatory requirement to retain data for 10 years
• SAP HANA for most recent data, Hadoop for historical data
• SAP HANA Vora accesses and queries data across all tiers
• SAP HANA Vora provides enterprise analytics & OLAP like experience across data warehouse and HDFS.
• Perform detailed predictive analytics throughout the manufacturing processes based on sensor data
• More than 1PB of data
• SAP HANA Vora rapidly processes sensor data in HDFS and combines it with data in SAP HANA for predictive analytics
• SAP HANA Vora processing of HDFS data combined with HANA data reduced query runtime dramatically
• Demand forecast accuracy for flu related products is relatively low
• Difficult to detect and react quickly/intelligently to anticipate demand spikes created by outbreaks
• Data Lake using Vora combining internal and external data sources (Internal- shipment, External –Weather, Twitter, Google Search, Center of Disease Control)
• SAP HANA Vora enables fast analysis and forecasting of all types of data in HDFS
DW / Tiering
IoT
DataLake
Use Cases of Existing Vora Customers
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 15Internal
SAP HANA Multi-temperature Data ManagementBig Data: HANA In-Memory + HANA Dynamic Tiering + Hadoop
• Modern in-memory platform
• Transact/analyze data in real-time
• Native predictive, text, graph and spatial algorithms
• Real time analytics on top of streaming data
• Disk backed, smart column store
• High performance and efficient compression
• Transparent for all operations. No changes required for BW operations
• Excels at queries on structured data from terabyte to petabyte scale
• No data duplication
Hot Data
HANA In-Memory
Warm Data
HANA DT
• Hadoop virtualization possible with Smart Data Access (read only), via Hive or Spark (SP10+)
• Also possible to access HDFS & MR Jobs directly via HANA vUDFs, which can be embedded in SQL queries
• Future roadmap and new functionalities available on top of SAP HANA Vora:
• Native bi-directional communication between HANA & Hadoop via Spark for fast analytical scenarios
• Added ”BI-like” features on top of Hadoop (Hierarchies, UoM & currency conversions, etc.)
Hadoop
Cold Data
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 16
Data Tiering w/ HANA & VoraComparison of the different strategies
Component Performance Cost Factor Volume Processing
HANAIn-Memory
$$$$ (4 out of 4) Up to several TBs (no technical limit)
• ACID compliant• SQL, SQLScript, predictive,
time series, spatial, text, …
HANADynamic Tiering $$$ (3 out of 4) 100s of TB natively
integrated in HANA• ACID compliant• SQL
HadoopVora $$ (2 out of 4)
100s of TB (depending on available memory in Hadoop cluster)
• In-memory OLAP engine for Hadoop
• Compiled SQL code
HadoopSpark $ (1 out of 4) 100s of PB or more
• General Purpose In-memory engine
• Transformations and Actions
(4 out of 4)
(3 out of 4)
(1 out of 4)
(2 out of 4)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 17
Vora 1.3 Highlights (Beta Program)
• Simplified installation• Enhanced modeler• New engines (graph, time-series, doc store, disk store)• Kerberos support• UoM conversion, currency conversion
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 18
� Graph engine – SAP HANA Vora embeds an in-memory graph database for real-time graph analysis. The primary focus is on complex read-only analytical queries on very large graphs.
� Time Series – SAP HANA Vora provides a highly-distributed time series analysis engine which supports storing and analyzing time series data. By enabling efficient (memory and speed) time series compression and supporting features like standard aggregation, granularization, and advanced analysis; SAP HANA Vora allows you to join the relational data with series data to build efficient SQL models in Hadoop and other Big Data environments
� Document Store – SAP HANA Vora introduces NoSQL features like storing JSON documents using the new Document Store as part of the SAP HANA Vora 1.3 release. The new DocStore supports schema-less tables, allowing you to flexibly add or remove fields from any documents and helps scale horizontally
� Disk store – SAP HANA Vora provides relation capabilities without loading all the data into memory due to the data size
SAP HANA Vora – Latest innovations
-30
-20
-10
0
10
Temperature °C
Halifax Waterloo
SAP HANA Platform
The SAP focus: End-to-end value chain
SPATIAL PROCESSING
ANALYTICS, TEXT, GRAPH, PREDICTIVE
ENGINES
CONSUME
COMPUTE
STORAGE
SOURCE
INGEST
Application Development Environment
Transformations & Cleansing
Smart Data IntegrationSmart Data Quality
StreamProcessing
Smart Data Streaming
STREAM PROCESSING
LogsTextOLTP Social MachineGeoERP SensorStore & forward
Mobile applications and BI
Smart Data Access
Virtual Tables
User Defined Functions
101010010101101001110
Dynamic Tiering
Aged datain Disk
In-Memory
Data model& data
Calculation engine
Fastcomputing
Column Storage
High performance analytics
Series Data Storage
Store time-series data
Reporting &Dashboards
High Performance Applications
Data Exploration& Visualization
Adhoc & OLAP Analytics
PredictiveAnalysis
Business Planning & Forecasting Lumira / BI
But there is more work to do…
Hadoop / Vora
MapReduce
YARN
HDFS
© 2016 SAP SE or an SAP affiliate company. All rights reserved.
Thank You
Henrique PintoDirector, Global HANA [email protected]