brian wrona june 1, 2012 · random realtime read/write hadoop common mapreduce distributes &...
TRANSCRIPT
SAP BusinessObjects & Hadoop
Brian WronaJune 1, 2012
© 2011 SAP AG. All rights reserved. 2Confidential
What is Hadoop?
Hadoop is open source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers
Reliable
Software is fault tolerant, it expects and handles hardware and software failures
Scalable
Designed for massive scale of processors, memory, and local attached storage
Distributed
Handles replication. Offers massively parallel programming model, MapReduce
Hadoop framework handles: partitioning, scheduling, dispatch, execution, communication, failure handling, monitoring, reporting and more
© 2011 SAP AG. All rights reserved. 3Confidential
Hadoop Technology FamilyLogical View*
HiveData warehouse that provides
SQL interface. Data structure is projected ad hoc onto
unstructured underlying data
PigPlatform for manipulating and analyzing large data sets. Scripting language
for analysts
HBaseColumn oriented, schema-less, distributed database modeled
after Google’s BigTable. Random realtime read/write
Hadoop Common
MapReduceDistributes & monitors tasks,
restarts failed work
HDFSDistributes & replicates data
across machines
MapReduce• Parallel programming
• Large block data handling (e.g. 64MB)
Non-Relational DBFine-grained data handling
Scripting
* For simplicity, mapping to servers is omitted.
MahoutMachine learning libraries
for recommendations, clustering, classfication
and itemsets
Machine Learning
© 2012 SAP AG. All rights reserved. 4
BI4 FP3: A solution leveraging the existing BI 4.0 architecture
Common user experience for all front-ends
Web Intelligence Crystal Reports Dashboards Explorer
Empower all people, enable all workflows
All data sources
SAP BWHADOOP HIVE
Any RelationalDatabase Web
ServiceFilesSAP HANASybase
The new information The new information design tool is your point of
entry to business intelligence solutions
Best access method for each specific data source
Direct AccessUniverse Access
Highperformance, feature rich and secured
access
Empower business users with the autonomy they need to access, analyze, enrich, and share information freely and securely using
familiar business terms
© 2012 SAP AG. All rights reserved. 5
SAP BusinessObjects Front-end tools
Client tools that support the UNX Universe on Hadoop in 4.0 FP3
Dashboards
Crystal Reports
Explorer
Web Intelligence
© 2012 SAP AG. All rights reserved. 6
Providing Richer InsightSAP BusinessObjects Explorer
Provide all users with a simple and intuitive experience for immediate, interactive accessto information to answer common business questions on the fly
Casual users create their own compositions of multiple Explorer visualizationsExploration views support iOSGeographic awareness – new semantic type for geographic dimensionsTime aware – new semantic type for time dimensionsNatural visualization and navigation for time and geo dimensionsImproved search – auto-correction and ‘did you mean’ feature
Key New Capabilities
© 2012 SAP AG. All rights reserved. 7
Explorer FP3 on Hadoop Hive
© 2012 SAP AG. All rights reserved. 8
Information Design Tool on Apache Hadoop Hive
We connect to Hive using a relational connection and a JDBC driver
The connectivity for Amazon EMR is planned for the 4.1 release.
© 2012 SAP AG. All rights reserved. 9
Semantic LayerHadoop support
New Universe format (UNX) via a JDBC relational connection to Hive
Access via tables in Data Foundation
Can support Hive Partitioned tables.
The connectivity for Amazon EMR is planned for the 4.1 release.
© 2012 SAP AG. All rights reserved. 10
Information Design Tool on Hadoop Hive
A Data Foundation against a Hive schemaOne can draw joins between the Hive tablesWe support Hive tables, aliases, derived tables, Hive views and Hive partitioned tables
Support for multi-source is planned for 4.1.
© 2012 SAP AG. All rights reserved. 11
Querying Hive data
The business user can get his data out of Hadoop in a non-technical manner using the query panel.
Under the covers we generate a HiveQL statement that is then translated into map reduce tasks by Hadoop Hive.
Sub-query and Ranking features are not supported.
Sampling is supported.
© 2012 SAP AG. All rights reserved. 12
Text Analysis
We have 3 speeches in natural language accessible from an Hive table
© 2012 SAP AG. All rights reserved. 13
Text Analysis
Finding recurrent wordsWords extraction and counting is done by Hadoop HiveWebI only presents the aggregated counts data in a chart
The most occuring wordsfound in the speech.
© 2012 SAP AG. All rights reserved. 14
Text Analysis
Finding recurrent word-combinations
Group size is 3 Group size is 4
© 2012 SAP AG. All rights reserved. 15
Statistical Analysis
We have numerical data like Salary or Age accessible from an Hive table
© 2012 SAP AG. All rights reserved. 16
Statistical Analysis
Histogram on salary dataBins definition and counting are done by Hadoop HiveWebI only presents the aggregated counts data in a chart
The salary data distribution.
© 2012 SAP AG. All rights reserved. 17
Statistical Analysis
Summarizing a data setAll the computations required for the descriptive statistics are pushed to HadoopWebI is used as the presentation layer
© 2012 SAP AG. All rights reserved. 18
Time series
Climate data over timeUsing the HiveQL functions we derive dimension objects from a timestampWe also request Hive to perform ad-hoc aggregations
© 2012 SAP AG. All rights reserved. 19
Hana + Hadoop + Data Services + BusinessObjects + SAR
POC ObjectiveProvide insight into how a real-life retail problem is solved by SAP HANA-HadoopAnalyze 10 years of Transactional data = 13 Billion recordsShowcase how we can leverage Hot Data(2 years) in SAP HANA with Warm/Cold Data(10 Years) in HadoopDemonstrate at SAPPHIRE an efficient SAP HANA-Hadoop Cloud solution leveraging SI for data migration and integration services
Solutions:SAP HANA™ Apache Hadoop™ SAP BusinessObjects Data Services 4.1SAP BusinessObjects 4.0 fp3RDS-Sales Analysis for Retail on Hana
Benefits:Affordable solution for Data StorageReduce capital expenditures with cloud
management of data storage and SAP Analytics
HanaLinux
Web Intelligence Explorer
Virtual Private Servers on the Cloud
Data Services 4.1
Hadoop Universe SAR Universe
BusinessObjects 4.0 fp3
HOTCOLD/WARM
JDB
C
Hive (via SQL)
© 2012 SAP AG. All rights reserved. 20
SAP Database StrategySAP’s current innovative data assets
SAP Real-time Data Platform
SAP HANA
SAP Sybase SQL Anywhere
#1 Mobileand Embedded
Database
#1 Transactional Database with
Best TCO
SAP Sybase ASE SAP Sybase IQ
#1 Analytics Database with
Best TCO
Sybase ESP,Replication Server,
PowerDesigner, + SAP EIM
#1 Unified EIM platform for
Real-Time
Open for Partners
© 2012 SAP AG. All rights reserved. 21
Next generation SAP Real-time Data Platform
3rd Party BI Client
SAP NetWeaver (On Premise / Cloud)
Custom Apps
SAP Business
Suite
SAP Business
WarehouseSAP Big Data Applications
SAP Analytics
SAP Mobile
Open Developer APIs and Protocols
Com
mon
Lan
dsca
peM
anag
emen
t
SAP Smart Data Services Platform
SAP HANA Platform
SAP Real-time Data Platform
SAP Sybase ASE
Com
mon
Mod
elin
gSy
base
Pow
erD
esig
ner
HAD
OO
P 3rd
Part
y D
B
MPP
Sc
ale-
Out SAP Sybase SQLA
SAP Sybase ESP
SAP Sybase IQ
SAP Sybase Replication Server
SAP Data Services SAP MDG, MDM
SAP innovation without customer disruption
© 2012 SAP AG. All rights reserved. 22
This im
SAP HANA Vision/roadmap
Integrate Optimize Synthesize
Next-generation PaaSoffering based on in-memory architecture
Best application and database experience through in-memory optimization
Higher performance through in-memory database
Customer ValueLeverage the power of SAP HANA OnDemandSeamless structure for OnPremise and OnDemand integrationInnovative co-development across the community
Customer ValueDeep optimization: SAP applications, SAP HANA, SybaseSAP HANA becomes the single application platform for OLAP and OLTP for all applicationsHigher business value at reduced cost of operation
Customer ValueState of the art In-memory database platformCrafted for BW, ERP, B1, ByDesignand all other SAP applicationsAccelerate business by removing most common processing bottlenecks between layers
SAP innovation without customer disruption
© 2012 SAP AG. All rights reserved. 23
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM Corporation.
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.Oracle is a registered trademark of Oracle Corporation.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.
Java is a registered trademark of Sun Microsystems, Inc.
JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries.
© 2012 SAP AG. All rights reserved
Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company.
Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase, Inc. Sybase is an SAP company.
All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior written permission of SAP AG.
This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This document contains only intended strategies, developments, and functionalities of the SAP® product and is not intended to be binding upon SAP to any particular course of business, product strategy, and/or development. Please note that this document is subject to change and may be changed by SAP at any time without notice.
SAP assumes no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or non-infringement.SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This limitation shall not apply in cases of intent or gross negligence.
The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links contained in these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.