0603 - engines bflandpal
DESCRIPTION
0603 - Engines BFLandPALTRANSCRIPT
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 1/39
© 2012 SAP AG. All rights reserved. 1Ramp-Up Knowledge Transfer Customer
November 2012
SAP HANA: Business Function Library (BFL) and
Predictive Analysis Library (PAL)
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 2/39
© 2012 SAP AG. All rights reserved. 2Ramp-Up Knowledge Transfer Customer
Agenda
1. Overview
2. Business Function Library
3. Predictive Analysis Library
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 3/39
Overview
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 4/39
© 2012 SAP AG. All rights reserved. 4Ramp-Up Knowledge Transfer Customer
Application Function Library (AFL)
Application Functions (C++)
SQLScript
SQLScript
HANA Clients (App Server, Analytics Technology, etc)
HANA Clients (App Server, Analytics Technology, etc)
SAP HANA
Business FunctionLibrary
Business FunctionLibrary
RLANG
RLANG
Predictive AnalysisLibrary
Predictive AnalysisLibrary
LLANG
LLANG AFLLANG
AFLLANG
AFLLANG Generator
AFLLANG Generator
AFL Framework
AFL Framework
…
AFL Technology inc ludes:
AFL Framework On demand library loading,
Independent make process of whole kernel
AFLLANG
New language type similar to R and L
Special implementation of L procedures
AFLLANG Generator Users will not create AFL procedure through
AFLLANG, instead through a generator
Generator is a pre-defined common SQLScriptprocedure, which will be ready when system is up.
Users need to have proper permissions
Application Functions Written in C++and delivered as AFL content
PAL (Predictive Analysis Library) andBFL (BusinessFunction Library) will be released in SPS05 as AFLcontent.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 5/39
Business Function Library
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 6/39
© 2012 SAP AG. All rights reserved. 6Ramp-Up Knowledge Transfer Customer
“ Run Smarter Faster”
Business Function Library (BFL)
• Compiled analytic function library for business functionality in HANA SP5
• Support various pre-built, parameter-driven algorithms
• Embedded into calculation engine
Compute Quickly
Reuse common business functionalities without developingthem.
Perform functions in real-time with high-performance computationin-memory
Help Customers To
Bring decision support capabilities to the business users throughsimplified experience and pre-built scenarios
Empower the business
Built applications Quickly
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 7/39
© 2012 SAP AG. All rights reserved. 7Ramp-Up Knowledge Transfer Customer
BFL in SAP HANA
SAP Business Suite Third-party systems
Real-time analytics
SAP HANA
Microsoft ExcelSAP Business Objects
SolutionsOthers…(Open)
Real-timereplication services Data services
Real-time apps
In-memorydatabase
Planning andcalculation engine
Business Functionlibraries
Predictive AnalysisLibrary
InformationComposer Modeling Studio
Applicationservices
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 8/39
© 2012 SAP AG. All rights reserved. 8Ramp-Up Knowledge Transfer Customer
BFL Example: Cycles Function
•Definition: This function calculates seasonal factors by using Fourier coefficients. Itcombines sine and cosine waves to help you determine seasonality or othercyclical business factors.
•ParametersInput/ Output Parameter Description
Amplitude Input Field Item Amplitude of sine/cosine.
Length Input Field Item Length (in years) over which the cycle repeats
itself.
Startdate Input Field Item Time in years at which the cycle starts.
Function Input Field Item 0 for a sine wave, and 1 for a cosine wave.
Time Input F ield Item Time periods.
Result Output Field Item Result table that contains the expected result.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 9/39
© 2012 SAP AG. All rights reserved. 9Ramp-Up Knowledge Transfer Customer
BFL Example: Cycles Function
•Syntax:
Table Preparation Execute statement
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 10/39
© 2012 SAP AG. All rights reserved. 10Ramp-Up Knowledge Transfer Customer
BFL: Functions -1-
Function Description
Annual
Depreciation
Calculates annual depreciation according to three common
methods: Diminishing balance depreciation, Straight line
depreciation and Sum-of-year depreciation. It allows variable length
of timescales forall assets/items.
Cumulate Calculates the cumulative totals in one row based on the original
numbers in another row.
Cycles Calculates seasonal factors from Fourier coefficients. It combines
sine and cosine waves to help you determine seasonality or other
cyclical business factors.Days Returns the number of days in each period defined by each pair of
From and To dates.
Days Outstanding Calculates receipts or payments based on the level of days
outstanding.
De-cumulate Calculates the original series starting from the cumulated totals.
Delay Calculates receivables or payables based on a delay between the
time of invoice and the time of payment.
Delay Debt Calculates cash receipts using actual sales. The closing debtor
balance for each period is calculated by referring to historic sales
levels for a specified number of days.
Delay Stock Calculates purchases required to meet future demand.
Discounted Cash
Flow
Converts a future stream of cash flow to constant prices. It
calculates the inflated value of today's money.
Driver Calculates the forecast for future periods using historical data and
as many drivers as needed. A driver drives cost, such as
headcount, floor space, units sold, and unit price.
Feed Calculates the closing balance and "feeds" it to the opening balance
of the next time period.
Feed Overflow Calculates the closing balance and feeds it to the opening balance
of the next period.
Function Description
Forecast Combines actual and forecast data to produce a rolling
forecast. Eliminates scripting of feeds.
Forecast Agents A specialized version of the Driver function focused on the
entities required to meet service levels. Used primarily for
labor in areas like call centers and mortgage processing based
on interest rate.
Forecast Driver A specialized version of the Driver function that calculates the
forecast for future periods using historical data and one single
driver.
Forecast DualDriver
Calculates the forecast for future periods using historical dataand two drivers. It also calculates the incremental effect of
each driver on the historical base figure.
Forecast Mix Mixes actual data prior to the SwitchOver date with forecast
data on and after the switchover date.
Forecast
Sensitivity
Returns a calculation for the proportion of requests that will be
queued because there are no agents available when the
request was answered.
Funds Calculates the use of funds or the source of funds.
Future Calculates the closing balance of an account given the start
balance and the conditions under which the account runs.
Grow Grows a base figure by a specified percentage each period. It
can be compound or linear.
Inflated Cash
Flow
Calculates the amount of cash you must receive in a future
period to compensate for inflation.
Internal Rate of
Return
Calculates the Internal Rate ofReturn for a series of cash flow
onspecifieddates.
Lag Calculates a result in one rowby lagging aninputfromanother
row bya specifiednumberof periods.
Last Looks back the series of data of the input row and returns the
most recent non-zero value.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 11/39
© 2012 SAP AG. All rights reserved. 11Ramp-Up Knowledge Transfer Customer
BFL: Functions -2-
Function Description
Lease Calculates a payment schedule for a lease, loan,
mortgage, annuity or savings account.
Lease
Variable
Allows an account to be scheduled along a time
scale representing the life of the loan.
Linear
Average
Calculates a linear average that applies a larger
weight tomore recent periods. The weights applied
decrease linearlyas timegoes back.
Max Value Returns the maximum value of a range.
Minimum
Value
Returns the minimum value of a specific range.
Moving
Average &
Moving Sum
Calculate a moving average or moving sum over
specified periods. Key statistical component
Moving
Median
Takes the median value after sorting all input values
into ascending sequence.
Number of
Periods
Calculates the number of periods over which the
account must run.
Net Present
Value
Calculates the sum of a series of future cashflow
values after discounting each to a present value
based on the annual rate input for the period in
which it is being calculated.
Outlook The Outlook is calculated by using actuals of past
months and plan figures of future months.
Payment Calculates the regular payment to an account for
each period.
Function Description
Present Value Calculates opening value through the given target
closing balance and various parameters.
Proportion Allows you to input a start and end date, and then
calculates the proportion of the period length. Important
for project planning with performance to plan
calculations
Rate Calculates the percentage interest rate per period for an
account, given its start balance, end balance, payment
amount per period and the number of periods.
Repeat It is used to repeat data from a single period or group of
periods through the time scale of the Dimension List.
Rounding Calculates the rounded values for a specified input item
according to a chosen rounding method.
Seasonal
Complex
Performs seasonal adjustments of time to determine
seasonal patterns in data.
Seasonal
Simple
Performs seasonal adjustments of time to determine
seasonal patterns in data.
Seasonal
Simulation
Provides the building blocks to Seasonal Simulation
seasonal data using a variety of characteristics.
Stock Flow Works out the level of supply needed to meet target
forecasts for stock cover.
Stock Flow
Reverse
Allows you to input stock cover and work out what
purchases were needed to meet the target stock levels.
Stock Flow
Batch
Let’s you use batch quantities in Stock flow calculations.
Key for constraint based models or non-discrete
manufacturing units of measure .
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 12/39
© 2012 SAP AG. All rights reserved. 12Ramp-Up Knowledge Transfer Customer
BFL: Functions -3-
Function Description
Year over
Year
Difference
Calculates the Year over Year Difference between
the current and previous time periods.
Year to Date Calculates year to date totals based on original data.
Year to Date
Statistical
Calculates the original numbers in one row based on
the year-to-date figures in another row.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 13/39
Predictive Analysis Library
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 14/39
© 2012 SAP AG. All rights reserved. 14Ramp-Up Knowledge Transfer Customer
“ Run Smarter Faster”
Predictive Analysis Library (PAL)
• Compiled analytic function library for predictive analysis in HANA SPS05
• Support multiple algorithms: K-Means, Association Analysis, C4.5 Decision Tree,Multiple Linear Regression, Exponential Smoothing…
Know Your Business
Decide with Confidence
Compute Quickly
Uncover deep insights & patterns about the business: associationrules, customer clustering, or sales prediction
Drive more advanced analyses. Decision is made with supportfrom analysis numbers
Query and analyze data in real-time with high-performancecomputation in-memory
Help Customers To
Bring decision support capabilities to the business users throughsimplified experience and pre-built scenarios
Empower the business
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 15/39
© 2012 SAP AG. All rights reserved. 15Ramp-Up Knowledge Transfer Customer
PAL in SAP HANA
SAP Business Suite Third-party systems
Real-time analytics
SAP HANA
Microsoft ExcelSAP Business Objects
SolutionsOthers…(Open)
Real-timereplication services Data services
Real-time apps
In-memorydatabase
Planning andcalculation engine
Business Functionlibraries
Predictive AnalysisLibrary
InformationComposer Modeling Studio
Applicationservices
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 16/39
© 2012 SAP AG. All rights reserved. 16Ramp-Up Knowledge Transfer Customer
PAL - Algorithms
Association Analysis
Apriori
Apriori Lite
Cluster Analysis
K-Means
Kohonen Self Organized Maps *
Classification Analysis
C4.5 Decision Tree Analysis
CHAID Decision Tree Analysis
K Nearest Neighbour
Multiple Linear Regression
Polynomial Regression *
Exponential Regression
Bi-Variate Geometric Regression
Bi-Variate Logarithmic Regression
Logistic Regression
* New in SPS05
Time Series Analysis
Single Exponential Smoothing
Double Exponential Smoothing
Triple Exponential Smoothing
Outlier Detection
Inter-Quartile Range Test (Tukey’s Test)
Variance Test *
Anomaly Detection *
Data Preparation
Sampling *
Binning *
Scaling *
Other
ABC Classification
Weighted Scores Table
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 17/39
© 2012 SAP AG. All rights reserved. 17Ramp-Up Knowledge Transfer Customer
Association Analysis
Definition: find the most frequent associations in a dataset.
Applications
Clearly - shopping carts and supermarket shoppers
Analysis of any product purchases… not just in shops
Analysis of telecom service purchasesAnalysis of telephone calling patterns
The ‘basket’ can be a household
…
Identification of fraudulent medical insurance claims - consider cases wherecommon rules are broken.
Differential analysis - compare results between different stores, betweencustomers in different demographic groups, between different days of theweek, different seasons of the year, etc.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 18/39
© 2012 SAP AG. All rights reserved. 18Ramp-Up Knowledge Transfer Customer
Association Analysis
We use it in our everyday lives
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 19/39
© 2012 SAP AG. All rights reserved. 19Ramp-Up Knowledge Transfer Customer
Association Analysis
Tran sact io n ID Reco rd s
0001 iPhone4s, Protector
0002 iPhone4s, Earphone, Protector
0003 iPhone4s, Protector
0004 Earphone
0005 iPad, iPhone4s
Item1 Item 2 Support Confidence
iPhone4s Protector 3 / 5 =60% 3 / 4 =75%
Support – The association (iPhone4s ->Protector) canbe found in 3 out of 5 =60% of the transactions.
Confidence – When a customer buys a iPhone4s, theyalso buy a Protector 3 out of 4 =75% of the time.
Example
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 20/39
© 2012 SAP AG. All rights reserved. 20Ramp-Up Knowledge Transfer Customer
Cluster Analysis
Definition: Cluster analysis looks for clusters or grouping of objects
Applications
Customer segmentation
Data reduction / problem refinement when faced with large, complex data sets
Market segmentation and determining target markets
Product positioning
Test markets selection
Crime pattern analysis
Medical research, social services, education, criminology, and so on
Anomaly detection (converse of segmentation)
…medical research, social services, psychiatry, education, archaeology,astronomy, taxonomy…
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 21/39
© 2012 SAP AG. All rights reserved. 21Ramp-Up Knowledge Transfer Customer
K-Means algorithm is used for partition a data set into K clusters. It is a verypopular cluster algorithm.
Kohonen Self Organizing Maps are a type of neural network that performclustering. When the network is fully trained, records that are similar shouldappear close together on the output map, while records that are different willappear far apart. This may give you a sense of the appropriate number of clusters.
Cluster Analysis
K Means on the Iris data set Kohonen Self Organizing Map
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 22/39
© 2012 SAP AG. All rights reserved. 22Ramp-Up Knowledge Transfer Customer
Definition: A classification is a model to define the relationships betweeninputs and an output. The output, in statistics referred to as the dependentvariable, is a function of one or more inputs, the independent variables. Weuse known inputs and outputs to define a model, and then use the model topredict or ‘score’ unknown values. This is sometimes referred to assupervised learning or directed data mining.
Classification algorithms can be sub-divided into:
Decision Tree algorithms CNR Tree is one of the most well known. CHAID analysis and C 5.0 are also popular.
Regression algorithms Multiple Linear Regression is the most well known
Neural Network algorithms These are defined in terms of their ‘topology’ e.g. MLP, RBF…
Other Support Vector Machines, K Nearest Neighbour…
Independent / Inputvariables
Dependent / Output
variable
Classification Analysis
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 23/39
© 2012 SAP AG. All rights reserved. 23Ramp-Up Knowledge Transfer Customer
A set of rules and graphical tree-shaped representation of the relationshipsbetween a dependent variable and a set of independent variables . The treemay be binary or multi-branching, depending upon the algorithm used tosegment the data. Each node represents a test of a decision.
There are many use cases for decision tree analysis
Determining the best targets for a mail shot campaign Churn analysis
Profiling high income earners from census data
Identifying spam
Loan applicant creditworthiness
Classification Analysis – Decision Tree Algorithms
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 24/39
© 2012 SAP AG. All rights reserved. 24Ramp-Up Knowledge Transfer Customer
Example
New Customer Profile
Income : 40000AGE : 42Gender : FemaleHouse Loan : N
Group : ?
HistoricalData
C4.5Decision Tree
Rules for A Class(The most important
customers)
Classification Analysis – Decision Tree Algorithms
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 25/39
© 2012 SAP AG. All rights reserved. 25Ramp-Up Knowledge Transfer Customer
In statistics, regression analysis is a collective name for techniques for themodelling and analysis of numerical data consisting of values of a dependentvariable (also called response or target) and of one or more independent variables(also known as explanatory variables or predictors).
The dependent variable in the regression equation is modelled as a function of theindependent variables, corresponding parameters ("constants"), and an error term.
The error term is treated as a random variable. It represents unexplained variationin the dependent variable.
The parameters are estimated so as to give a "best fit" of the data. Mostcommonly the best fit is evaluated by using the least squares method, but othercriteria can also been used.
Classification Analysis – Regression Algorithms
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 26/39
© 2012 SAP AG. All rights reserved. 26Ramp-Up Knowledge Transfer Customer
The PAL supports :
Multiple Linear Regression Y= a + b*X’ + c*X’’ + d*X’’’ …
Polynomial Regression Y= a + b*X + c*X2 + d*X3 …
Exponential Regression Y = a*bX
Bi-Variate Geometric Regression Y = a*Xb
Bi-Variate Logarithmic Regression Y =a +b * log(X)
Logistic Regression
Classification Analysis – Regression Algorithms
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 27/39
© 2012 SAP AG. All rights reserved. 27Ramp-Up Knowledge Transfer Customer
Exponential smoothing is a method of forecasting that uses weighted valuesof previous series observations to predict future values. The principle beingthat the older the data points, the less importance they should be given.
Single or Simple Exponential Smoothing – a weighted average of the past
Ft+1 = Xt+ (1- ) Xt-1 + (1- )2 Xt-2 + (1- )3 Xt-3 … + (1- )N Xt-N
where is a smoothing constant 0 < <1
Example: if is 0.1, then the weights are 0.1, 0.09, 0.081, 0.0729… If is 0.5, then the weights are 0.5, 0.25, 0.125, 0.0625… If is 0.9, then the weights are 0.9,0.81, 0.729, 0.6561…
Now the above equation can be shown to be equal to Ft+1 = Xt +(1- ) Ft
So the computation becomes very easy, but we have to start the process with the firstforecast and that is where different starting methods can lead to different fits and forecasts.
Time Series Analysis
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 28/39
© 2012 SAP AG. All rights reserved. 28Ramp-Up Knowledge Transfer Customer
Three basic patterns: stationary, trends, seasonality. These equate to single,double and triple exponential smoothing.
Double Exponential Smoothing which applies two smoothing constants, one for the stationaryelement and the other for the trend.
Holt’s Two-Parameter Model St = Xt +(1- ) (St-1+ bt-1)… the stationary element bt =µ (St – St-1) +(1 - µ) bt-1… the trend element Ft+m = St + bt m
Triple Exponential Smoothing – for stationary and trend and seasonality Winters’ Three-Parameter Model St = Xt / It-L +(1- ) (St-1+ bt-1) … the stationary element bt =µ (St – St-1) +(1 - µ) bt-1 … the trend It = Xt/St + (1 – ) It-L … the seasonality
Ft+m =(St + bt m)It-L+m
Time Series Analysis
0
100
200
300
400
500
600
700
800
900
1000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0. 00
5 0 . 0 0
100. 00
150. 00
200. 00
250. 00
300. 00
350. 00
1 2 3 4 5 6 7 8 9 1 0 1 1 12 1 3 14 1 5 16 1 7 18 1 9 20 2 1 22 2 3 24 2 5 26 2 7 28 2 9
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 29/39
© 2012 SAP AG. All rights reserved. 29Ramp-Up Knowledge Transfer Customer
Outliers
Definition: An outlier is an observation that lies an ‘abnormal’ distance from othervalues in a random sample from a population.
Outliers can occur because of measurement errors and might be removed fromthe data set or corrected.
They can occur naturally and therefore must be treated carefully.
Some statistics / algorithms can be heavily biased by outliers. For example thesimple mean, correlation, linear regression. In contrast the trimmed mean andmedian are not so affected.
Outliers can be detected visually, for example Scatter Plots and Box Plots. -
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 30/39
© 2012 SAP AG. All rights reserved. 30Ramp-Up Knowledge Transfer Customer
Outlier Algorithms – Inter Quarti le Range Test (IQR)
Outliers can be detected using various algorithms. The most well known beingthe Inter Quartile Range Test or the Tukey Test, named after it’s author. It’s thecalculation behind the construction of Box Plots.
Given a time series X1 to Xn, calculate the upper and lower quartiles (25th and 75th percentile),denoted as UQ and LQ. Calculate the mid spread as MID =UQ - LQ. An outlier is then defined to beany observation where
Xi < LQ - n * MID or Xi > UQ + n * MID
The value of n is usually set to 1.5, however for large time series, say more than 36 points, it isrecommended to use a value of 2. The concept of very significant and significant outliers could beintroduced by using values of n =3 and n =2 respectively.
The PAL supports: Inter-Quartile Range Test (Tukey’s Test)
Variance Test – this is just the simple identification of values outside x standard deviations from themean
Anomaly Detection – this is conceptually the ‘reverse’ of cluster analysis. We look for values furthestaway from their nearest cluster centre, measure the absolute and percentage distance and rank thelargest ‘outliers’.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 31/39
© 2012 SAP AG. All rights reserved. 31Ramp-Up Knowledge Transfer Customer
The PAL supports: Sampling
First N
Middle N
Last N
Every Nth
Simple Random Sampling with replacement
Simple Random Sampling without replacement Systematic Sampling
Stratified sampling
Binning
Equal widths based on the number of bins
Equal widths based on the bin width
Equal number of records per bin
Mean / Standard Deviation bin boundaries
Scaling
Standardized Variable or Z-Score or Standard Score
Normalization
Standardized Variable - Median / Median Absolute Deviation
Normalization by Decimal Scaling
Data Preparation
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 32/39
© 2012 SAP AG. All rights reserved. 32Ramp-Up Knowledge Transfer Customer
ABC Classification
Definition: Divide the data into 3 groups – A,B,C. In the example 60%, 30%, 10%
It’s a form of segmentation analysis – now you can examine the A group etc.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 33/39
© 2012 SAP AG. All rights reserved. 33Ramp-Up Knowledge Transfer Customer
Weighted Score Tables
Definition: the sum of each attribute multiplied by its weight.
In the example: Calculated score for Lily:
30000 * 0.0005 + 9 * 2 + 3 * 1 = 15 + 18 + 3 = 36
Score many records for comparison and sorting
Field Weights 0.0005 2 1
Customers Income Scores Age Scores City Scores
Smith 18000 18000 20 – 29 6 Big 9
Lily 30000 30000 30 - 39 9 Small 3
J orge 43000 43000 40 – 49 7 Medium 6
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 34/39
© 2012 SAP AG. All rights reserved. 34Ramp-Up Knowledge Transfer Customer
Working With PAL – 1-
Step 1: Generate a PAL Procedure: First you need to generate a procedure bycalling the AFL wrapper generator from SQLScript. The syntax is as follows:
CALL AFL_WRAPPER_GENEREATOR(<procedure_name>,<area_name>,<function_name>,<signature_tab>);
<procedure_name>: user-defined procedure name.
<area_name>: 'AFLPAL'. This is used for all PAL functions and cannot be changed by users.
<function_name>: PAL built-in function name.
<signature_tab>: user-defined table variable. The table contains records to describe input table type,parameter table type, and result table type. A typical table variable references a table with the followingdefinition:
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 35/39
© 2012 SAP AG. All rights reserved. 35Ramp-Up Knowledge Transfer Customer
Working With PAL – 2-
Step 2: Call a PAL Procedure: After generating a procedure, you can then call theprocedure using the below syntax.
CALL <procedure_name> (<data_input_tab> {,…},<parameter_tab>,<output_tab>{,…}) with overview;
<procedure_name>: the procedure name users defined when generating the procedure in Step 1.
<data_input_tab>: user-defined name(s) of the current procedure’s input table(s).
<parameter_tab>: user-defined name of the current procedure’s parameter table.
<output_tab>: user-defined name(s) of the current procedure’s output table(s).
Consult the documentation for more details.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 36/39
© 2012 SAP AG. All rights reserved. 36Ramp-Up Knowledge Transfer Customer
PAL: Functions Available for GA use by Customers and
Partners in SP5 – 1-
K Means – A method of cluster analysis whereby the algorithm partitions N observations or records into K clusters in which eachobservation belongs to the cluster with the nearest center.
K Nearest Neighbor - The K-Nearest Neighbor (KNN) algorithm is a method for classifying objects based on the closest K objects
and their average classification / value.
Multiple Linear Regression (MLR) - An approach to modeling the linear relationship between a variable Y, usually referred to asthe dependent variable, and one or more other variables, usually referred to as independent variables, denoted X1, X2, X3...
C4.5 Decision Tree – A classification algorithm, C4.5 builds decision trees from a set of training data, using the concept of information entropy. The training data is a set of already classified samples. At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits it into subsets in one class or the other. Its criterion is the normalized information gain(difference in entropy) that results from choosing an attribute for splitting the data. The attribute with the highest normalizedinformation gain is chosen to make the decision. The C4.5 algorithm then proceeds recursively until meeting some stopping criteriasuch as minimum number of cases in a leaf node.
CHAID Analysis - This model is similar to the C4.5 decision tree. CHAID stands forCHi-squared Automatic Interaction Detection,and is a classification method for building decision trees by using chi-square statistics to identify optimal splits. CHAID examinesthe cross tabulations between each of the input fields and the outcome, and tests for significance using a chi-square independence
test. If more than one of these relations is statistically significant, CHAID will select the input field that is the most significant(smallest p value). CHAID can generate non-binary trees
Apr iori & Aprior i L ite - Popular association discovery algorithm commonly associated with market basket analysis. The algorithmlooks for rules to describe frequent product and other items associations. Apriori Lite is a subset of Apriori when only singleantecedent and single subsequent are required and is therefore faster.
ABC Classi ficat ion – A dataset is divided into 3 groups – A,B,C so X% of a variable are in A, Y% in B and 100% - X – Y in C. Itcan be used for analyzing customer behavior and defining market segments.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 37/39
© 2012 SAP AG. All rights reserved. 37Ramp-Up Knowledge Transfer Customer
PAL: Functions Available for GA use by Customers and
Partners in SP5 – 2 –
Weighted Score Tables – Each column / variable in a table is allocated a score, which may vary across its range of values, and then a weight. Each record is scored and the scores are multiplied by the weights and summed. The summedscores can then be ranked to identify the highest.
Exponential Regression - An approach to model the relationship between a variable Y and one or more variablesdenoted X1, X2, X3... In exponential regression, data are modeled using an exponential function and unknown modelparameters are estimated from the data using the criteria of least squares.
Logistic Regression - Predicts the outcome of a categorical variable (a variable that can take on a limited number of categories) based on one or more predictor variables. The probabilities describing the possible outcome are modeled as afunction of the explanatory variables, using a logistic function. It is analogous to linear regression but takes a categorical
target field instead of a numeric one.
Inter-Quartile Range Test - Given a series of numeric data, the Inter-Quartile Range is the difference between 3rd-quartile(Q3) and 1st-quartile(Q1) of that data series. Values which are several multiples of the IQR from the median areidentified as outliers.
Bi-Variate Geometric Regression - An approach to model the relationship between a dependent numeric variable Y andan independent numeric variable X. In geometric regression, data are modeled using a geometric function, and unknownmodel parameters are estimated from the data using least squares regression.
Bi-Variate Natural Logarithmic Regression – An approach to model the relationship between a dependent numeric
variable Y and an independent numeric variable X. In geometric regression, data are modeled using a natural logarithmicfunction, and unknown model parameters are estimated from the data using least squares regression.
Single, Double, Triple Exponential Smoothing - Techniques that can be applied to time series data, either to producesmoothed data for presentation, or to make forecasts. Single smoothing is used when the time series is stationary, doublewhen there is a trend and triple when there is seasonality. Older values in the time series are given less importance with theweights forming an exponential decay.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 38/39
© 2012 SAP AG. All rights reserved. 38Ramp-Up Knowledge Transfer Customer
PAL: Functions Available for GA use by Customers and
Partners in SP5 – 3 –
Polynomial Regression - An approach to model the relationship between a numeric variable Y and a numeric variable X,raised to the power of 2,3,4 etc. denoted X2, X3, X4… In polynomial regression, data are modeled using polynomialfunctions, and unknown model parameters are estimated from the data using the criteria of least squares. .
Variance Test - Given a series of numeric data, the Variance Test simply calculates the variance. Values which are severalmultiples of the variance from the mean are identified as outliers.
Anomaly Detec tion - this is conceptually the ‘reverse’ of cluster analysis. We look for values furthest away from theirnearest cluster centre, measure the absolute and percentage distance and rank the largest ‘anomalies’ or outliers.
Sampling – An aspect of statistics concerned with the selection of an unbiased or random subset of individual observations
within a population of individuals intended to yield some knowledge about the population of concern, especially for thepurposes of making predictions based on statistical inference.
Binning – A common requirement prior to running certain predictive algorithms. It generally reduces the complexity of themodel, for example the model in a decision tree can become very complex if every value of a numeric variable becomes abranch in the tree. Binning methods smooth a sorted data value by consulting its “neighborhood”, that is, the values aroundit. The sorted values are distributed into a number of “buckets” or bins.
Scaling - This function is used where the data is to be scaled to fall within a specified range, such as -1.0 to 1.0, or 0.0 to1.0. You can normalize an attribute by scaling its values to make them fall within a specified range. Normalization isparticularly useful for classification algorithms involving neural networks, or distance measurements such as nearest-
neighbor classification and clustering. This PAL algorithm includes three data normalization methods: min-max, z-score,and decimal scaling.
Kohonen Self Organized Maps - A type of artificial neural network that is trainedusingunsupervised learning to produce alow-dimensional (typically two-dimensional), discretized representationof the input space of the training samples, calleda map. Self-organizing maps are different to other artificial neural networks in the sense that they use a neighborhoodfunction to preserve the topological properties of the input space.
7/15/2019 0603 - Engines BFLandPAL
http://slidepdf.com/reader/full/0603-engines-bflandpal 39/39
© 2012 SAP AG. All rights reserved. 39Ramp-Up Knowledge Transfer Customer
© 2012 SAP AG. Alle Rechte vorbehalten.
Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zuwelchem Zweck und in welcher Form auch immer, ohne die ausdrückliche schriftlicheGenehmigung durch SAP AG nicht gestattet. In dieser Publikation enthaltene Informationenkönnen ohne vorherige Ankündigung geändert werden.
Die von SAP AG oder deren Vertriebsfirmen angebotenen Softwareprodukte könnenSoftwarekomponenten auch anderer Softwarehersteller enthalten.
Microsoft, Windows, Excel, Outlook, und PowerPoint sind eingetragene Marken derMicrosoft Corporation.
IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x,System z, System z10, z10, z/VM, z/OS, OS/390, zEnterprise, PowerVM, P ower
Architecture, Power Systems, POWER7, POWER6+, POWER6, POWER, PowerHA,pureScale, PowerPC, BladeCenter, System Storage, Storwize, XIV, GPFS, HACMP,RETAIN, DB2 Connect, RACF, Redbooks, OS/2, AIX, Intelligent Miner, WebSphere, Tivoli,Informix und Smarter Planet sind Marken oder eingetragene Marken der IBM Corporation.
Linux ist eine eingetragene Marke von Linus Torvalds in den USA und anderen Ländern.
Adobe, das Adobe-Logo, Acrobat, PostScript und Reader sind Marken oder eingetrageneMarken von Adobe Systems Incorporated in den USA und/oder anderen Ländern.
Oracle und J ava sind eingetragene Marken von Oracle und/oder ihrer Tochtergesellschaften.
UNIX, X/Open, OSF/1 und Motif sind eingetragene Marken der Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame und MultiWinsind Marken oder eingetragene Marken von Citrix Systems, Inc.
HTML, XML, XHTML und W3C sind Marken oder eingetragene Marken des W3C®
,World Wide Web Consortium, Massachusetts Institute of Technology.
Apple, App Store, iBooks, iPad, iPhone, iPhoto, iPod, iTunes, Multi-Touch, Objective-C,Retina, Safari, Siri und Xcode sind Marken oder eingetragene Marken der Apple Inc.
IOS ist eine eingetragene Marke von Cisco Systems Inc.
RIM, BlackBerry, BBM, BlackBerry Curve, BlackBerry Bold, BlackBerry Pearl, BlackBerry Torch, BlackBerry Storm, BlackBerry Storm2, BlackBerry PlayBook und BlackBerry AppWorld sind Marken oder eingetragene Marken von Research in Motion Limited.
Google App Engine, Google Apps, Google Checkout, Google Data API, Google Maps,Google Mobile Ads, Google Mobile Updater, Google Mobile, Google Store, Google Sync,Google Updater, Google Voice, Google Mail, Gmail, YouTube, Dalvik und Android sindMarken oder eingetragene Marken von Google Inc.
INTERMEC ist eine eingetragene Marke der Intermec Technologies Corporation.
Wi-Fi ist eine eingetragene Marke der Wi-Fi Alliance.
Bluetooth ist eine eingetragene Marke von Bluetooth SIG Inc.
Motorola ist eine eingetragene Marke von Motorola Trademark Holdings, LLC.
Computop ist eine eingetragene Marke der Computop Wirtschaftsinformatik GmbH.
SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer,StreamWork, SAP HANA und weitere im Text erwähnte SAP-Produkte und -Dienstleistungen sowie die entsprechenden Logos sind Marken oder eingetragene Markender SAP AG in Deutschland und anderen Ländern.
Business Objects und das Business-Objects-Logo, BusinessObjects, Crystal Reports,Crystal Decisions, Web Intelligence, Xcelsius und andere im Text erwähnte Business-Objects-Produkte und Dienstleistungen sowie die entsprechenden Logos sind Markenoder eingetragene Marken der Business Objects Software Ltd. Business Objects ist einUnternehmen der SAP AG.
Sybase und Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere und weitere im Texterwähnte Sybase-P rodukte und -Dienstleistungen sowie die entsprechenden Logos sindMarken oder eingetragene Marken der Sybase Inc. Sybase ist ein Unternehmen derSAP AG.
Crossgate, m@gic EDDY, B2B 360°
, B2B 360°
Services sind eingetragene Marken derCrossgate AG in Deutschland und anderen Ländern. Crossgate ist ein Unternehmen derSAP AG.
Alle anderen Namen von Produkten und Dienstleistungen sind Marken der jeweiligenFirmen. Die Angaben im Text sind unverbindlich und dienen lediglich zu Informations-zwecken. Produkte können länderspezifische Unterschiede aufweisen.
Die in dieser Publikation enthaltene Information ist Eigentum der SAP. Weitergabe undVervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck undin welcher Form auch immer, nur mit ausdrücklicher schriftlicher Genehmigung durchSAP AG gestattet.