dmm117 – sap hana processing services text spatial graph series and predictive
TRANSCRIPT
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2Public
Speakers
Bangalore, October 5 - 7
Priyanka Nalakath
M S Poornapragna
Las Vegas, Sept 19 - 23
Anthony Waite
May Chen
Barcelona, Nov 8 - 10
Markus Fath
Anthony Waite
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 3Public
Disclaimer
The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. Except for your obligation to protect confidential information, this presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or any related document, or to develop or release any functionality mentioned therein.
This presentation, or any related document and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information in this presentation is not a commitment, promise or legal obligation to deliver any material, code or functionality. This presentation is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This presentation is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this presentation, except if such damages were caused by SAP’s intentional or gross negligence.
All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Public
Agenda
Introduction: a platform to analyze various data types
Text
Spatial
Graph
Series
Numbers
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 6Public
Example scenarios
Public Security
Generate real-time intelligence from multiple sources
• Case management, activities,master data
• Social media
• Phone monitoring
• Traffic data
Insurance
Analyze the impact of natural disasters from many perspectives
• Policy data, locations
• News/media
• Satellite imagery
• Business networks
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 7Public
SAP HANA – The Platform Powers the Digital Transformation
SAP HANA PLATFORMON-PREMISE | CLOUD | HYBRIDON-PREMISE | CLOUD | HYBRID
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 9Public
What types of text processing capabilities are supported?
Full-text searchIn addition to string matching, SAP HANA features full-text search which works on content stored in tables or exposed via views. Just like searching on the Internet, full-text search finds terms irrespective of the sequence of characters and words.
Text analysisCapabilities range from basic tokenization and stemming to more complex semantic analysis in the form of entity and fact extraction. Text analysis applies within individual documents and is the foundation for both full-text search and text mining.
Text miningText mining makes semantic determinations about the overall content of documents relative to other documents. Capabilities include key term identification and document categorization. Text mining is complementary to text analysis.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 10Public
Full-text search
SAP HANA provides an in-database search engine Supports 32 languages and handles binary file
formats Modeling tools for search Search queries via built-in procedure, SQL, and
OData Linguistic and fuzzy (error tolerant) search
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 11Public
Full-text index and full-text search
CREATE COLUMN TABLE "RESEARCH_PAPERS" ("ID" INTEGER PRIMARY KEY,"AUTHOR" NVARCHAR(200),"MIMETYPE" NVARCHAR(200),"DOCUMENT" BLOB
);
CREATE FULLTEXT INDEX "FTI_RESEARCH_PAPERS_DOCUMENT"ON "RESEARCH_PAPERS"("DOCUMENT")
;
SELECT "ID", "AUTHOR", "DOCUMENT"FROM "RESEARCH_PAPERS" WHERE CONTAINS(
("AUTHOR", "DOCUMENT"), 'roberd software', FUZZY(0.8));
Full Text Indexing
Fu
ll Tex
t Ind
ex
Full Text Indexing
insert
ID DOC
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 12Public
Search models
In a search model you define the structure of your “search object” and how it is exposed to an application Tables and joins Columns
– Default columns for search– Weights for ranking– Fuzziness – Default columns for facets
TableTable
Model
Access
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 13Public
Search models and data access
CALL ESH_SEARCH (query,?);Built-in procedure to search on multiple search models with an “OData” query and a “JSON” response
CALL ESH_CONFIG (config);Built-in procedure to add search annotations (request/response, facets, UI areas etc.) to views
search annotations
TableTable
SQL
search annotations
JSON
UI
*any* View
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 14Public
Text analysis
SAP HANA provides in-database text analysis
Linguistic analysis
Entity extraction
e.g. persons, organizations
Fact extraction
e.g. sentiments, mergers & acquisitions
Grammatical role analysis
subject-predicate-object
Custom dictionaries and rules for domain adaptation
e.g. chemical substances, product launch
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 15Public
SAP HANA SAP HANA
ExtendedApplication Services
Text analysis
Text Analysis as an optional processing step “on top” of full-text indexing
Full Text Indexing
Fu
ll Tex
t Ind
ex
TextAnalysisResultsTable
Full Text Indexing with TA
insert
ID DOC
Text Analysis on non-persisted data
Text
Text Analysis
TextAnalysisResults
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 16Public
Text analysisadvanced configuration options
Custom dictionaries for domain specific entity extraction Dictionaries are stored in repository Updates to dictionaries are considered “immediately”
Standard Form
Variant Type
Arnold Schwarzenegger
Arnie American Film Actor
Sylvester Stallone
Sly American Film Actor
SAP SE SAP AG Company
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 17Public
for: currency
type: company
stem: acquire,
buy
type: company
Text analysisadvanced configuration options
Custom rules for domain specific fact extraction Rules are stored in repository Updates to rules are considered “immediately”
Rule elements Tokens, stems, part-of-speech tags Iteration operators Wildcards, alternation, negation Character classifiers (case-sensitivity) Grouping and containment (regEx)
*
SAP acquired Sybase for $5.8 billion
IBM buys Softlayer for $2 billion
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 18Public
Text analysisusing text analysis results
Search-based applications Include text analysis results in a search model for navigation and
filtering
Analytics Simple calculations like term frequencies and co-occurrence Clustering, topic modeling or other text mining techniques
– R, Predictive Analysis Library (PAL) functions
Geotagging Assign longitude/latitude coordinates to “location” entities
Graph Analysis Store co-occurrences or semantic triples as graph for pattern
matching, reasoning etc.
Result list item 1this is the abstract of the document shown in line 1
Result list item 1this is the abstract of the document shown in line 1
Result list item 1this is the abstract of the document shown in line 1
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 19Public
Text mining
SAP HANA provides in-database text mining
Identify similar documents
Identify key terms of a document
Identify related terms
Categorize new documents based on a training corpus
Scenarios
Highlight the key terms when viewing a patent document
Identify similar incidents for faster problem solving
Categorize new scientific papers along a hierarchy of topics
t1
tn
d1
d2
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 20Public
Text mining
The text mining table is built from the results of linguistic analysis.
Essentially, it is a large term-document matrix.
The matrix is fully accessible for custom algorithms.
Full Text Indexing
Fu
ll Tex
t Ind
ex
TextMiningTable
TextAnalysis
Table
insert
ID DOC
Full Text Indexing with TA and TM
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 21Public
SAP HANA
Text mining
Text mining functions• Related documents• Relevant terms• Related terms• Classify kNN• and more
Text MiningTables
TM SQLExtended
Application Services
Text Mining.js API
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 23Public
Spatial
SAP HANA provides native spatial data processing
Store 2D and 3D vector datatypes
50+ geospatial functions and algorithms
Geocoding and reverse geocoding
Geo content (GAB) and mapping services
Open standards (OGC, 1999 SQL/MM)
SDK for custom geospatial algorithms
Bulk and streaming data integration capabilities
Integration with Esri, Pitney Bowes, HERE and more
Spatial Analytics with SAP HANAiDMM270 (H2)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 24Public
Geographic dataCategories
Vector data
Point, Linestring, Polygon, MultiPoint, …
Networks, Topologies, Point Clouds, …
Metadata– spatial reference systems (SRS) – unit of measures (UOM)
Raster data
Gridded datae.g. digital terrain elevation, weather information
Image datae.g. created from optical or spectral sensors
Metadata Raster- and grid information Spatial- and band reference system
Point Linestring Polygon CircularString
14 35 25
17 39 59
16 15 17
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 25Public
Spatial predicates
g1 g2
g1
g2
g1.ST_Touches(g2)
(g1 ∩ g2 ≠ ) (B(∅ ∧ g1) ∩ B(g2) = )∅
g1.ST_Within(g2)
g1 ∩ g2 = g1 I(g1) ∩ E(g2) = ø∧
g1.ST_Equals(g2)
g1 = g2
g2
g1
g1 g1.ST_Crosses(g2)
I(g1) ∩ I(g2) ≠ (g1 ∩ g2 ≠ g1) (g1 ∩ g2 ≠ g2)] ∅ ∧ ∧
g2
g1
g1
g2
g1.ST_Overlaps(g2)
(I(g1) ∩ I(g2) ≠ ) ∅ ∧
(I(g1) ∩ E(g2) ≠ ) ∅ ∧
(E(g1) ∩ I(g2) ≠ ) ∅
g1.ST_Intersects(g2)
g1 ∩ g2 ≠ ø
g1
g2
g1.ST_Disjoint(g2)
g1 ∩ g2 = ø
g1
g2
g2
g1
g2
g1
g2
g1.ST_Contains(g2)
g1 ∩ g2 = g2 I(g1) ∩ I(g2) ≠ ∧ø
g2
g1
g1
g1.ST_Covers(g2) *
g1 ∩ g2 = g2
g2
g1g2
* No OGC standard
g1g2
g2
g1
g1g2
g1 g2
g1 g2
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 26Public
Spatial clustering and joins
Clustering - grid, k-means, dbscan
SELECT ST_ClusterId() AS CID, ST_ClusterCentroid() AS CENTROID, COUNT(*) AS C
FROM "RESEARCH_ORGANIZATIONS"GROUP CLUSTER BY "LON_LAT" USING KMEANS CLUSTERS 5;
Join
SELECT * FROM "RESEARCH_ORGANIZATIONS" AS T1,
"PROJECT_LOCATION" AS T2WHERE T2."LON_LAT".ST_DISTANCE(
T1."LON_LAT", 'kilometer‚) <100;
spherical clusters non-spherical clusters
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 27Public
Spatial joins in Calculation View modeler
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 28Public
SpatialGeocoding
SAP HANA supports geocoding, reverse geocoding, and address cleansing.
This data transformation/ enrichment can either run local (reference data is stored in HANA) or via a remote service.
Local geocoding and address cleansing is handled by SAP HANA smart data quality.
SAP HANA
Geocode reference data
Geocoding service,
e.g. HERE
Address DataLongitude, Latitude
Geocode transform or
geocode index
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 29Public
SpatialGeo content and services
SAP HANA includes HERE mapping content and services
Mapping services API/SDK
Map content for “generalized administration boundaries” (GAB) and “postcode areas” (POC)
mapping service
SAP HANA
mapcontent
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 30Public
Sample spatial clients
SAP HANA
ODBCEsri ArcGIS
Server
SAP Business Objects Cloud
Esri ArcGIS Portal
Esri ArcGIS Desktop
MapService
QueryLayer
ODBC
shapefileupload
Native SAP UI5 app
ExtendedApplication Services
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 32Public
Graph
SAP HANA provides a native graph engine property graph model full transactional (ACID) properties basic graph functions like shortest path and strongly
connected components native graph viewer tightly integrated in SAP HANA operations (security, backup
etc.)
Benefits Store and analyze graph data in real-time Tools and graph algorithms to navigate and extract insight
from relationship data Combine text, spatial, and advanced analytics with
relationship intelligence
SAP HANA Graph Processing: Information and Demonstrationi
DMM212 (L1)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 33Public
Workspace
Property graph
Powerful and flexible property graph model
vertices (nodes) and edges (relationships) tables
vertices connected via multiple edges of any type
dynamic graph workspace view
Up-to-date insights without replicating data
Enhance graph semantic by adding new attributes to vertices and edges
Key Name Birthdate
Herman Herman Hesse 19270530
Samuel Samuel Becket 19281001
Key Source Target Type
1 Maria Herman hasSon
2 Maria Samuel hasSon
Vertices Edges
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 34Public
Graph algorithms
Neighborhood Search Shortest Path Strongly Connected Components
Pattern Matching
AphroditeHera ArtemisCronus
LetoHadesPoseidonGaia
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 35Public
SELECT * FROM GET_SHORTEST_PATHS ORDER BY "WEIGHT" WITH PARAMETERS ( 'placeholder' = ('$start$', ['zeus']), 'placeholder' = ('$level$', '5'));
With a calculation view, a graph node can be used which triggers a graph algorithm
When retrieving data from a calculation view, the graph algorithm is executed.
Graph modeler
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 37Public
Series data
SAP HANA provides native support for series data
Store and generate series data
SQL integration for query processing
Detect and correct errors or anomalies
“Horizontal” aggregation/disaggregation (e.g. hourly to daily)
Series analysis (similarity, regression, smoothing, binning etc.)
Benefits
Efficient, scalable storage of series data
Simple and concise SQL interface
Optimized series algorithms
Seamless integration into existing database
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 38Public
Series table
CREATE COLUMN TABLE "WEATHER"(
"STATION_ID“ varchar(3) not null references "WEATHER_STATION",
"DATE“ date not null,
"MAXTEMP“ decimal(3,1),
primary key("STATION_ID", "DATE")
) SERIES (
SERIES KEY("STATION_ID")
EQUIDISTANT INCREMENT BY 1 DAY MISSING ELEMENTS NOT ALLOWED
PERIOD FOR SERIES ("DATE", NULL)
);
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 39Public
Series data functions
Functions that make it easier to manipulate series data
SERIES_GENERATE – Generate a complete series
SERIES_DISAGGREGATE – Move from coarse units (day) to finer (hour)
SERIES_ROUND – Convert a single value to a coarser resolution
SERIES_PERIOD_TO_ELEMENT – Convert a timestamp in a series to its offset from start
SERIES_ELEMENT_TO_PERIOD – Convert an integer to the associated period
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 40Public
Analytical functions
Functions for analyzing series data:
LINEAR_APPROX – Replace NULL values by interpolating adjacent non-NULL values
CUBIC_SPLINE_APPROX – Replace NULL values by interpolating adjacent non-NULL values
CORR – Pearson product-moment correlation coefficient
CORR_SPEARMAN – Spearman rank correlation
DFT – Compute the discrete Fourier transform
MEDIAN
AUTO_CORR – Correlation of a (sub-)series with itself at varying lags
…
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 42Public
Advanced Analytics
SAP HANA provides in-database data mining
Application Function Library (AFL) contains packages for data mining and predictive analysis, e.g. Predictive Analysis Library (PAL)
– Native algorithms for advanced analysis
– In-database processing for fast results
– Support for common data mining tasks like clustering, classification, association, time series etc.
R integration for SAP HANA
– use the R open source environment in context of SAP HANA
– R integration via fast, parallelized connection
– R script is embedded within SAP HANA SQL Script
Introduction to Predictive Modeling and Application Deployment for SAP HANAi
DMM271 (H2)
BA101 (L1)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 43Public
Advanced Analytics
SAP applications
SAP HANA Platform
Integration Services
SpatialText Analysis, Text Mining
GraphRules Engine
OtherMachine
DataLocation
DataTextTransaction
SAP Predictive Analytics
Application Function Library
APL, BFL, PAL, UDF, OFL, etc.
R
SAP HANA Studio & Application
Function ModelerSmart Data Access
Event Stream Processing Smart Data IntegrationEmbedded Predictive
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 44Public
Advanced AnalyticsPredictive Analysis Library (PAL)
SAP HANA In-Memory Predictive Analytics
SAP HANA embeds multiple advanced analytics function libraries, optimized for massive parallel in-memory processing Predictive Analytics Library
– Core of numerous powerful, native predictive algorithms for in-database & in-memory processing that fully exploit the power of SAP HANA, resulting in quicker insight and faster implementations
Content and Usage– The library includes common as well as specialized algorithms targeting
various data mining and machine learning areas– Leveraged and embedded in native SAP applications and usage from within
SAP HANA development tools as well as SAP Predictive Analytics
Scenarios & Use Cases– Various LoB / industry scenarios making use of Association Analysis, Time
Series Forecasting, Link Prediction, Predictive Modeling, etc.
SAP HANA Platform
Predictive Analysis LibraryPredictive Analysis Library
continuous growth and enhancements
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 45Public
Advanced AnalyticsPredictive Analysis Library (PAL)
Association Analysis– Apriori – Apriori Lite– FP-Growth – KORD – Top K Rule Discovery
Classification Analysis– CART– C4.5 Decision Tree Analysis– CHAID Decision Tree Analysis– K Nearest Neighbor– Logistic Regression (incl. SGD)– Neural Network– Naïve Bayes– Random Forest– Support Vector Machine– Parameter Selection / Model
EvaluationConfusion Matrix, Area Under Curve
Regression– Multiple Linear Regression– Polynomial Regression– Exponential Regression– Bi-Variate Geometric Regression– Bi-Variate Logarithmic Regression
Probability Distribution– Distribution Fit– Cumulative Distribution Function– Quantile Function– Kaplan-Meier Survival Analysis
Outlier Detection– Inter-Quartile Range Test
(Tukey’s Test)– Variance Test – Anomaly Detection– Grubbs Outlier Test
Link Prediction– Common Neighbors– Jaccard’s Coefficient– Adamic/Adar– Katzβ
Data Preparation– Sampling, Random Distribution S.– Binning– Scaling– Partitioning– Principal Component Analysis (PCA)
Statistic Functions (Univariate)– Mean, Median, Variance, Standard
Deviation– Kurtosis– Skewness
Statistic Functions (Multivariate)– Covariance Matrix– Pearson Correlations Matrix– Chi-squared Tests:
Test of Quality of FitTest of Independence
– F-test (variance equal test)
Other– Weighted Scores Table– Substitute Missing Values
Cluster Analysis– ABC Classification– DBSCAN – K-Means– K-Medoid Clustering– K-Medians– Kohonen Self Organized Maps– Agglomerate Hierarchical– Affinity Propagation– Latent Dirichlet Allocation (LDA)– Gaussian Mixture Model (GMM)– Cluster Assignment
Time Series Analysis– Single/Double/Triple Exponential
Smoothing– Forecast Smoothing– ARIMA/ Seasonal ARIMA– Brown Exponential Smoothing– Croston Method– Linear Regression with Damped Trend
and Seasonal Adjust– Forecast Accuracy Measures,
Test for White Noise, Trend, Seasonality
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 47Public
SAP TechEd Online
Continue your SAP TechEd education after the event!
Access replays of Keynotes Demo Jam SAP TechEd live interviews Select lecture sessions Hands-on sessions …
http://sapteched.com/online
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 48Public
Further information
Related SAP TechEd sessions:DMM212 - SAP HANA Graph Processing: Information and Demonstration (L1)DMM270 - Spatial Analytics with SAP HANA (H2)DMM271 - Introduction to Predictive Modeling and Application Deployment for SAP HANA (H2)
SAP Public Webscn.sap.com www.sap.com
SAP Education and Certification Opportunitieswww.sap.com/education
Watch SAP TechEd Onlinewww.sapteched.com/online
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 49Public
Thanks for attending this session.
Please complete your session evaluation for DMM117.
Contact information:
Markus [email protected]
Feedback