hana reference implementat

29
Reference Implementation for Application Development on HANA By TIP D&NA Data Management June 10th, 2011

Upload: pimpampum111

Post on 22-Jul-2016

22 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HANA Reference Implementat

Reference Implementation for Application Development on HANABy TIP D&NA Data Management June 10th, 2011

Page 2: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 2

Agenda

Motivations Languages available for application development Demo – CarShop, V1 CarShop, V2 Resource

We like to make this session as an open discussion instead of presentation; your feedback is very welcome.

Page 3: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 3

Motivations

•What’s this?• It is a sample application built on HANA. It leverages

HANA to provide features like “Analysis”, “Forecast”, “What-if Planning”, “Sales promotion like cross-selling” etc.

•Why we do this?• HANA has great features which other DBMSs don’t have,

such as column-based modeling, in-memory computing, build-in business library, build-in predictive library, R integration etc. The official HANA document can’t cover all the details, especially the sample codes. We make this app so that other developers can use this as a reference and quickly develop new apps on HANA.

•What are the benefits for SAP?• Other HANA Content/App developers can

quickly :master” HANA advanced features like column-based modeling, in-memory computing, build-in business library, predictive library, R integration, etc. As sample codes, other teams/developers can know how to make 2-tired and 3-tired application based on HANA.

SQLBFLR

Analysis

What-if

Cross-

selling

Forecast

Page 4: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 4

Motivations

Project DefinitionJust like Sun provides “petStore” for Java EE platform, here the HANA Reference Implementation Application, named “carShop”, is designed to illustrate how the HANA can be used to develop an amazing application. By learning this project, the learner can get the following things: SQLscript SQLscript V2 L IMSL R BFL /PAL .net/java frontend

Target Audience: Everyone who is interested in how to develop new applications on HANA

Virtual Business ScenarioThis project is based on a virtual car sales scenario. A company has lots of salesmen in different cities to sell cars. That company uses carShop system to analyze historical sales data, forecast and plan future sales data, set KPI to salesman based on the plan data, calculate the volume-driven bonus, analysis the potential customer information, cluster them, find selling opportunities.

Page 5: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 5

“Languages”

The following languages could be used to access the HANA functionalities:

• IMSL (International Math &

Statistics Lib)

• R

• BFL (Business Function Lib) /

PAL (Predictive Analysis Lib)

• L

• SQL Script V1 and V2

• calEngine

• few others (e.g. logic/inference) HANA

BFL / PAL

SQL Script

IMSL

LR

LL

Page 6: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 6

IMSL

The IMSL Numerical Libraries have been the cornerstone of high-performance and desktop computing applications in science, technical and business environments for well over three decades.

It’s developed by Visual Numerics, which has achieved an OEM agreement with SAP to embed IMSL C Numerical Library Into TREX Component to offer advanced analytics for SAP applications.

IMSL C math

IMSL C Statitacs

Page 7: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 7

IMSL

Functional areas included in the IMSL Numerical Libraries:

Mathematics Statistics •Matrix Operations•Linear Algebra•Eigensystems•Interpolation & Approximation•Numerical Quadrature•Differential Equations•Nonlinear Equations•Optimization•Special Functions•Finance & Bond Calculations•Genetic Algorithm

•Basic Statistics•Time Series & Forecasting•Nonparametric Tests•Correlation & Covariance•Data Mining•Regression•Analysis of Variance•Transforms•Goodness of Fit•Distribution Functions•Random Number Generation•Neural Networks

Page 8: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 8

IMSL

The IMSL sample code to access HANA

Note: Currently, IMSL functions are only available in the DEV branch of NewDB

Page 9: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 9

IMSL

Benefits of Embedding the IMSL

Accelerate Development Develop Better Software Applications Develop Flexible Software Applications Improve Quality and Reduce Uncertainty Reduce Costs (?) Fair or better results than other packages

Limitations of IMSL:OpenMP-based parallelism is not compatible w NewDBDo not work for partitioned tablesGovernance issue: Cannot monitor its memory usage and threading

Page 10: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 10

What is R?

Aims at building an open source version of S (under GNU GPL)

Project Home: http://www.r-project.org/ Available on Windows, Linux, and MaxOS

Latest version 2.12.1 (dated on 16/12/2010) Now has a core team of about 19 people Support for multiple languages CRAN

a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R

The R Journal a refereed journal of the R project for statistical computing

Some well-known weakness is not particularly efficient in handling large data sets it is rather slow in executing a large number of for loops Learning curve is somewhat steep compared to point and click

software

Page 11: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 11

Join OP

ROP

OLAP OP

Calc. Engine

Rich other Plug-in

(Forecasting, Parallelism,

statistics, etc.)

SHM Channel Plug-in

“REngine”•Parser•Runtime•Operators

RClient

TCP/IP Channel Plug-in

SHM SolutionSingle Server

TCP/IP SolutionDifferent Servers

NewDB Space OpenSource R Space

1

3

2

NOTE:1. SHM (SHared Memory) Solutoin

we use LGPL to solve the potential IP issue. NewDB and R need to be in the same server

2. TCP/IP solution – NewDB and R can be deployed in different machines.3. “REngine” – In discussion.

R Integration in NewDB (Available w HANA 1.0GA)

R Runtime

NewDB R Integration Open Source R

Milestones

1. May, 2010 – NewDB team had JDBC and CSV version for R Integration, but it was very slow. D&NA team joined to develop better solutions.

2. Oct, 2010 – Checked in SHM solution into NewDB Standard build. Got at least 50X performance improvement V.S. old solution.

3. March, 2011 – Checked in Parallemlism handling for data transition between NewDB and R. Gained at least another 3X improvements.

4. (In plan) HANA 1.5 Release – Release SHM LGPL version to lower possible IP issues.

5. (In plan) HANA 1.5 Release – Release TCP/IP solution to support multi server requirement.

Internal Customers

1. Oct, 2010 – DNA for SalesForecasting

2. Jan, 2011 – EPM SBC for spend analysis

3. Mar, 2011– PIO for personal financial analysis.

4. April, 2011 – IDDC PA for predictive analysis

Page 12: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 12

Language and tools

Packages

Page 13: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 13

Languages

SQLScript + R: determine the Poisson Regression Model

CREATE FUNCTION LR( IN input1 SUCC_PREC_TYPE, OUT output0 R_COEF_TYPE) LANGUAGE RLANG AS'''

CHANGE_FREQ<-input1$CHANGE_FREQ;SUCC_PREC<-input1$SUCC_PREC; coefs<-coef(glm(SUCC_PREC ~ CHANGE_FREQ, family = poisson )); INTERCEPT<-coefs["(Intercept)"]; CHANGEFREQ<-coefs["CHANGE_FREQ"]; names(INTERCEPT)<-NULL; names(CHANGEFREQ)<-NULL;result<-as.data.frame(cbind(INTERCEPT,CHANGEFREQ))

''';

TRUNCATE TABLE r_coef_tab; CALLS LR(SUCC_PREC_tab,r_coef_tab );SELCET * FROM r_coef_tab;

Page 14: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 14

Business Function Library

Business Function Library (BFL) is now the calculation library for the Applications which is built on top of NewDB. It resides in NewDB CalcEngine, consists of many Business Functions executing at NewDB layer and is written in C++.

Significant performance improvements for SAP apps1. Utilizing new hardware ( i.e. multi core,

built in vector engine)2. Massive parallel main memory

processing3. Changing the boundaries between

application server and data management layer

Simplification of application programming model1. Usage of extended SQL(SQLScript) 2. Rich Functionalities in Calculation Engine3. Quick apps delivery

Design Goals

BFL Wiki

Page 15: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 15

BFL Governance

Adam TheirRalf EhretWen-Syan LiVolkmar Soehner (LiveCache, Planning Eng)

Kai Stammerjohann Nico BohnsackVolkmar SoehnerPeter Goertz Thorsten GlebeFranz FaerberDaniel Booβ Andrei Suvernev

Volkmar Soehner (LiveCache, Planning eng, …)Wen-Syan Li (BFL)

Page 16: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 16

BFL Framework

BFL Framework:

Core Service+ RUNTIME EINVIRONNEMENT, will be residence in NewDB. Can be configured/Plug-in/Invoke BFL . With core service, the application teams can build BFL without whole NewDB code.

Future Release

As one proposal, we plan to develop BDK (BFL Development Kit) for BFL development environment, and the BRE (BFL Runtime Environment) for BFL runtime, including memory allocation, error handing and so on.

BDK plus BRE is the future BFL framework. With the new framework, clients don’t need directly interact with NewDB development environment.

Support the stateful execution of each function.

Page 17: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 17

L Language

“L” is tailored to NewDB by SAP.

The programming language L is targeted as a robust, low-level, high-performance programming language inside NewDB.

“L” can be described as a safe subset of C++ with NewDB data types and additional support for processing table like data

“L” provides direct access to the table and column objects which are used in the Calculation Engine.

Page 18: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 18

L Language

Llang — The L Programming Language

Page 19: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 19

L Language

Type Mappings

SQL Type Column Store Type L Null Type L Non-Null Type L Raw Type Notes on L Type     NullBool Bool          Size    TINYINT INT NullInt32 Int32    SMALLINT INT NullInt32 Int32    INTEGER INT NullInt32 Int32    

BIGINT FIXED8 NullFixed8<0> Fixed8<0>   default 8 bytes

length REAL FLOAT NullFloat Float RawFloat  DOUBLE DOUBLE NullDouble Double RawDouble  DATE DAYDATE NullDate Date    

CHAR(a) FIXEDSTRING(a) NullFixedString<a> FixedString<a>    

….......... …………… ……………. …………… ……………. ……………

Page 20: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 20

L Language

Embed L code in the SQLScript

Page 21: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 21

SQLScript

SQL is the main interface to applications. NewDB supports standard SQL with a set of NewDB specific extensions

SQLScript A new language for processing application-specific code in the database layer, The

main goal of SQL Script is to allow the execution of data intensive calculations inside NewDB

The main concept in SQL Script is the function. SQL Script functions can have multiple input and output parameters. They are composed of calls of other functions, and of SQL queries.

Intermediate results can be assigned to variables that are local to the function. Basic control flow is possible via if/else clauses and error handing is supported via try/catch blocks.

The recursion (direct or indirect) is not allowed. A SQL Script function is free of side effects, that means it computes the values of the

output parameters but modifies no other data. delete, update, insert statements are not allowed inside SQL Script functions. These restrictions ensure that two function calls that are not connected via data flows can be executed in parallel.

Page 22: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 22

SQLScript

Datatype Extension

SQLScript’s datatype extension also allows the definition of table types. These table types are used to define parameters for functions

A table type is created using the CREATE TABLE TYPE statement

Functional Extension

The functional extension allows its users to describe complex data-flow logic using side-effect free table functions

Functions can be created using CREATE FUNCTION and dropped using DROP FUNCTION

Page 23: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 23

SQLScript

Functional Extension

Built-in FunctionsThere are different categories of built-in functions

Tracing and debugging

Data source access

Relational Operators

Page 24: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 24

SQLScript

SQLScript version 2

Coming up soon (in a week)

Support loop flow control statements

Page 25: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 25

Comparisons

IMSL R BFL L SQLScript

OpenSource?

No Yes No No No

Directly Called by Clients

Via LVia SQLscriptExcel (soon)

Via SQLscriptVia R consoleExcel (soon)

Via SQLscrpt/LExcel

No Yes

“Known” Limitations

Not comply w IM-DB governance

not particularly efficient in handling large data sets

Limited availability

Pre-fined input and output

No flow control

Parallelism Limited via OpenMP

Limited via OpenMP etc

Yes No Partially Yes?

Page 26: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 26

Our suggestions

1. Use SQLscript as much as possible because Reasonable safer than C You control the development process independent from NewDB Good for reporting / simple aggregation

2. Use R if You need to develop algorithms and need interact with the data Quick prototyping / PoC / small data set for analysis Have flow control / GUI /Debugging tool

3. Use IMSL if computation is complex.

4. Use BFL/PAL if computation is complex, data set is large, and algorithms need customization. If product level quality is needed. If partitioned table and cluster supported are needed.

Page 27: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 27

Demo

CarShop

V1

Page 28: HANA Reference Implementat

© 2011 SAP AG. All rights reserved. 28

CarShop, V2

In the next version of CarShop, the following features will be considered to enrich its functionalities to make it more useful for SAP internal users:

SQLscript V2 (control flow)

Support transaction

Planning capability via planning engine

Reduce the memory footprint during execution

Support map/reduce on cluster (HANA 1.5)

“Best practice” in term of selecting right languages to implement applications

Testing related features? Cancel flag, profiling, ….

Page 29: HANA Reference Implementat

Q / A ?