oracle openworld 2019...oracle machine learning for sql (oml4sql) python (oml4py) r (oml4r) empower...
TRANSCRIPT
Oracle OpenWorld 2019S A N F R A N C I S C O
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning Overview of New Features and Roadmap
Mark Hornick, Senior Director, Product Management
Marcos Arancibia, Product Manager
Charlie Berger, Senior Director, Product Management
Copyright © 2019 Oracle and/or its affiliates.
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
Safe Harbor
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning Key Attributes
Copyright © 2019 Oracle and/or its affiliates.
Automated
Get better results faster with less effort –
even non-expert users
Scalable
Handle big data volumes using parallel, distributed algorithms –
no data movement
Production-ready
Deploy and update data science solutions faster with
integrated ML platform
Increase productivity, Achieve enterprise goals, Innovate More
Why this matters
Machine Learning offers tremendous promise for enterprises, but data access, model scalability, and ultimate model deployment issues all too often derail even the best initiatives
Oracle Machine Learning integrates with key enterprise infrastructure, and delivers the performance, scalability, and automation required by enterprise-scale data science projects
Copyright © 2019 Oracle and/or its affiliates.
Agenda
• Oracle Machine Learning family of products
• Supporting multiple personas
• OML component details
• Enabling applications with ML
• Roadmap
Copyright © 2019 Oracle and/or its affiliates.
d
Oracle Machine Learning
OML Microservices*Supporting Oracle Applications
Image, Text, Scoring, Deployment,Model Management
* Coming soon
OML4SQLOracle Advanced Analytics
SQL API
OML4Py*Python API
OML4ROracle R Enterprise
R API
OML Notebookswith Apache Zeppelin on
Autonomous Database
OML4SparkOracle R Advanced Analytics
for Hadoop
Oracle Data MinerOracle SQL Developer extension
Copyright © 2019 Oracle and/or its affiliates.
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application / Dashboard Developers
Executives
OML empowers Enterprise Users
OracleMachineLearning
Copyright © 2019 Oracle and/or its affiliates.
Data Scientists
•Popular data science languages: Python, R, SQL•Augment with 3rd party packages•Scalability and performance•Automation-enhanced productivity•Greater enterprise collaboration•Integrate and analyze data across the enterprise
OracleMachineLearning
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application / Dashboard Developers
Executives
Copyright © 2019 Oracle and/or its affiliates.
Business and Data Analysts•Expand analytical tool set with ML•Enable non-ML experts with AutoML•Leverage domain knowledge for better results•Collaborate with Data Scientists and IT
OracleMachineLearning
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application / Dashboard Developers
Executives
Copyright © 2019 Oracle and/or its affiliates.
DBA and IT Professionals
OracleMachineLearning
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application / Dashboard Developers
Executives
•Even greater value from Oracle investment•Support scalability and performance•Simpler, streamlined infrastructure•Maintain data security, backup, recovery•Use SQL, expand to Python and R•Leverage Database and Big Data sources
Copyright © 2019 Oracle and/or its affiliates.
Application and Dashboard Developers
•Realize intelligent solutions faster through Oracle stack integration•Easily uptake data scientists’ R, Python, SQL scripts and rapidly deploy solutions •Embed ML in applications and dashboards using SQL, REST, and SODA APIs
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application / Dashboard Developers
Executives OracleMachineLearning
Copyright © 2019 Oracle and/or its affiliates.
Executives
•Benefit from world-class data management technology and support•Democratize ML across the enterprise to enable better data-driven decisions•Deploy solutions faster to realize ROI
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application / Dashboard Developers
Executives OracleMachineLearning
Copyright © 2019 Oracle and/or its affiliates.
Evolving into End-to-End Analytics Data Platform
Analytics
Machine Learning
Application Development
Data Acquisition &
Transformation
Data Virtualization
Autonomous Database Cloud Service
Tightly integrated analytics data platformCopyright © 2019 Oracle and/or its affiliates.
Cross-Platform Machine LearningMultiple user interfaces and APIsDeployed in cloud and on-premisesFrom database to entire data management ecosystem
Oracle Cloud SQLOML4R
OML4Python
REST
OML4SQL
SQL Developer
Popular RIDEs
Popular Python IDEs
OML Notebooks
SelectUser Interface, e.g.
APIOptions
Cloud or On-premises
Reach broaderData Sources
Oracle Object Storage
Big DataService (HDFS)
NoSQLDatabases
KafkaStreams
Amazon S3
Azure Blob Storage
Oracle Database
Data Lake
OML4Spark
Oracle Big Data SQL
Copyright © 2019 Oracle and/or its affiliates.
OCI Data Science
CLASSIFICATIONNaïve BayesLogistic Regression (GLM)Decision TreeRandom ForestNeural NetworkSupport Vector Machine (SVM)Explicit Semantic Analysis
CLUSTERINGHierarchical K-MeansHierarchical O-ClusterExpectation Maximization (EM)
ANOMALY DETECTIONOne-Class SVM
TIME SERIESForecasting - Exponential SmoothingIncludes popular models
e.g. Holt-Winters with trends, seasonality, irregularity, missing data
REGRESSIONLinear ModelGeneralized Linear Model (GLM)Support Vector Machine (SVM)Stepwise Linear regressionNeural NetworkLASSO
ATTRIBUTE IMPORTANCEMinimum Description LengthPrincipal Component Analysis (PCA)Unsupervised Pair-wise KL Div CUR decomposition for row & AI
ASSOCIATION RULESA priori/ market basket
PREDICTIVE QUERIESPredict, cluster, detect, features
SQL ANALYTICSSQL WindowsSQL PatternsSQL Aggregates
FEATURE EXTRACTIONPrincipal Comp Analysis (PCA)Non-negative Matrix FactorizationSingular Value Decomposition (SVD)Explicit Semantic Analysis (ESA)
TEXT MINING SUPPORTAlgorithms support text columnsTokenization and theme extractionExplicit Semantic Analysis (ESA) for
document similarity
STATISTICAL FUNCTIONSBasic statistics: min, max,
median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.
R AND PYTHON PACKAGESThird-party R and Python Packages
through Embedded ExecutionSpark MLlib algorithm integration
Oracle Machine Learning Algorithms and Analytics
Copyright © 2019 Oracle and/or its affiliates.
OML algorithm features
Feature In-Database Spark
No data movement to separate analytical engines
Wide range of ML techniques supported Native Native and Spark MLlib
High performance from parallel, distributed executionSpark 2-based. Use all nodes from Hadoop
cluster
Greater scalability from improved memory utilization
Handle narrow, wide, and sparse data Plus star schema, nested data
AutomationData preparation,
text mining, partitioned models,
AutoML
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning Notebooks
for Oracle Autonomous Database
Oracle Machine Learning Notebooks
Collaborative UI Based on Apache Zeppelin
Supports data scientists, data analysts, application developers, DBAs
Easy sharing of notebooks and templates
Permissions, versioning, and execution scheduling
Included with Autonomous DatabaseAutomatically provisioned, managed, backed up
In-database SQL algorithms and analytics functions
Soon to be augmented with Python and R
Autonomous Database as a Data Science Platform
Copyright © 2019 Oracle and/or its affiliates.
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning forSQL (OML4SQL)Python (OML4Py)R (OML4R)
Empower SQL users with immediate access to ML in Oracle Database and Oracle Autonomous Database
Empower data scientists with open source environments
Traditional Analytics and Data Source Interaction
Access latency
Paradigm shift: R/Python Data Access Language R/Python
Memory limitation – data size, in-memory processing
Single threaded
Issues for backup, recovery, security
Ad hoc production deployment
DeploymentAd hoccron job
Data SourceFlat Filesextract / exportread
export load
Data source connectivity packages
Read/Write files using built-in tool capabilities
?
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning for SQL
In-database, parallel, distributed algorithms
ML models as first class database objects
Export / import models across databases
Batch and real-time scoring
Explanatory predictive details
Leverage ML across Oracle stack
Component of Oracle Autonomous Database and Oracle Advanced Analytics option to Oracle Database
SQL InterfacesSQL*PlusSQLDeveloper…
OracleAutonomous
Database
OML Notebooks
Oracle Databasewith OAA option
Copyright © 2019 Oracle and/or its affiliates.
OML4SQL: Model Build and Real-time Prediction
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'BUY_INSUR1',
mining_function => dbms_data_mining.classification,
data_table_name => 'CUST_INSUR_LTV',
case_id_column_name => 'CUST_ID',
target_column_name => 'BUY_INSURANCE',
settings_table_name => 'CUST_INSUR_LTV_SET');
END;
Simple SQL Syntax—Classification Model
SELECT prediction_probability(BUY_INSUR1, 'Yes'
USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age,
'Married' as marital_status, 93 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)
FROM dual;
Model build (PL/SQL)
Real-time scoring (SQL query)
Copyright © 2019 Oracle and/or its affiliates.
Oracle Data Miner User Interface
SQL Developer Extension
Automates typical data science steps
Easy to use drag-and-drop interface
Analytical workflows quickly defined and shared
Wide range of algorithms and data transformations
Generate SQL code for immediate deployment
Create analytical workflows – supports “Citizen Data Scientists”
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning for R and Python*
Oracle Database as HPC environment
In-database parallel and distributed machine learning algorithms
Manage scripts and objects in Oracle Database
Integrate results into applications
and dashboards via SQL
OML4Py automated machine learning
Components of Oracle Advanced Analytics option to Oracle Database
Database ServerMachine
Client SQL InterfacesSQL*Plus
SQLDeveloperOML4Py OML4R
Copyright © 2019 Oracle and/or its affiliates. * Coming soon
Oracle Machine Learning for R and Python
Transparency layerLeverage proxy objects so data remain in database
Overload native functions translating functionality to SQL
Use familiar R / Python syntax on database data
Parallel, distributed algorithmsScalability and performance
Exposes in-database algorithms available from OML4SQL
Embedded executionManage and invoke R or Python scripts in Oracle Database
Data-parallel, task-parallel, and non-parallel execution
Use open source packages to augment functionality
OML4Py AutoMLModel selection, feature selection, hyper-parameter tuning
Components of Oracle Advanced Analytics option to Oracle Database
Database ServerMachine
Client SQL InterfacesSQL*Plus
SQLDeveloperOML4Py OML4R
Copyright © 2019 Oracle and/or its affiliates.
Example using OML4R interface
Proxy objects
data.frame
Proxydata.frame
Inherits from
Copyright © 2019 Oracle and/or its affiliates.
Transparency Layer
Leverages proxy objects for database data: oml.DataFrame
# Create table from Pandas DataFrame data
DATA = oml.create(data, table = 'BOSTON')
# Get proxy object to DB table boston
DATA = oml.sync(table = 'BOSTON')
Uses familiar Python syntax to manipulate database data
Overloads Python functions translating functionality to SQL
In-database performance – indexes, query optimization, parallelism, partitioning
DATA.shape
DATA.head()
DATA.describe()
DATA.std()
DATA.skew()
TRAIN, TEST =
DATA.split()
TRAIN.shape
TEST.shape
Copyright © 2019 Oracle and/or its affiliates.
In-database modeling using Support Vector Machine
Parallel, Distributed Algorithms
User tables
Oracle Database
from oml import svm
# create proxy object
ONTIME_S = oml.sync(table='ONTIME_S')
# define model object
settings = {'svms_outlier_rate' : 0.01}
svm_mod = svm('anomaly_detection',
svms_kernel_function =
'dbms_data_mining.svms_linear',
**settings)
# build anomaly detection model
svm_mod = svm_mod.fit(x=ONTIME_S, y=None)
# view model object
svm_mod
OML4Py
Python Client
Copyright © 2019 Oracle and/or its affiliates.
Embedded Execution
User tables
pyq*eval () interface
extproc
Oracle Database
extproc
# user-defined function using sklearn
def build_lm(dat):
from sklearn import linear_model
lm = linear_model.LinearRegression()
X = dat[['PETAL_WIDTH']]
y = dat[['PETAL_LENGTH']]
lm.fit(X, y)
return lm
# select column(s) for partitioning data
index = oml.DataFrame(IRIS['SPECIES'])
# invoke function in parallel on IRIS table
mods = oml.group_apply(IRIS, index,
func=build_lm,
parallel=2)
mods.pull().items()
OML4Py
Python Engine
OML4Py
Python Engine
OML4Py
Python Client
Example of parallel execution for partitioned data flow using third party package
Copyright © 2019 Oracle and/or its affiliates.
AutoML – new with OML4Py
Auto Feature Selection
– Reduce # of features by identifying most predictive
– Improve performance and accuracy
Increase data scientist productivity – reduce overall compute time
Auto ModelSelection
Much faster than exhaustive search
Auto FeatureSelection
>50% reduction in features
AutoTune
Significant score improvement
MLModel
Auto Model Selection– Identify in-database
algorithm that achieves highest model quality
– Find best model faster than with exhaustive search
Auto Tune Hyperparameters– Significantly improve
model accuracy
– Avoid manual or exhaustive search techniques
Copyright © 2019 Oracle and/or its affiliates.
Enables non-expert users to leverage Machine Learning
DataTable
Reduce # features by identifying most relevantImprove performance and accuracy
Auto Feature Selection: examples
0
5
10
15
20
25
30
299 9
Trai
nin
g ti
me
(sec
on
ds)
ML training time
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
299 9
Acc
ura
cy
Prediction Accuracy
33xreduction
+4%
OpenML dataset 312 with 1925 rows, 299 columns OpenML dataset 40996 with 56K rows, 784 columns
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
784 309
Acc
ura
cy
Prediction Accuracy
+18%
60% reduction1.3X time reduction to build SVM Gaussian model
97% reduction
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning for Spark (OML4Spark)
supported by Oracle R Advanced Analytics for Hadoop
Oracle Machine Learning for Spark
Leverage Spark 2 environment for powerfuldata preparation and machine learning
Use data across range of Data Lake sources
Achieve scalability and performance using full Hadoop cluster
Parallel and distributed ML algorithms from native and Spark MLlib implementations
R Language API Component to Oracle Big Data Connectors
Java API
HDFS | Hive | Spark DF | Impala | JDBC Sources
BDABDSDIY
OML4Spark
R Client
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning for Spark
Transparency layerProxy objects reference data from file system, HDFS, Hive, Impala, Spark DataFrame and JDBC sources
Overloaded R functions translate functionality to native language, e.g., HiveQL for HIVE and Impala
Users manipulate data via standard R syntax
Parallel, distributed machine learning algorithmsScalability and performance leveraging full Hadoop cluster
Spark-based custom LM, GLM, NN, K-Means plus Spark MLlib
Use expressive R Formula specification
Compute framework with custom R mappers/reducers
Data-parallel and task-parallel execution
Allows for open source CRAN packages run on Cluster Nodes
R Language API Component to Oracle Big Data Connectors
Copyright © 2019 Oracle and/or its affiliates.
Java API
HDFS | Hive | Spark DF | Impala | JDBC Sources
BDABDSDIY
OML4Spark
R Client
OML4Spark PerformanceLogistic Regression (GLM)
Data fits in memoryUp to 7x faster than Spark MLlib
Data cannot fit memoryAble to solve a 10B row model
Benchmark environmentORAAH 2.8.0
Big Data Appliance X7-2
6 Nodes, 256GB of RAM per Node
Formula: cancelled ~ distance + origin + dest + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum)
1
10
100
1,000
10,000
100K 1M 10M 100M 1B 10B
Exec
uti
on
Tim
e (s
eco
nd
s)
Dataset Size (# rows)
OML4Spark vs. Spark MLlibfor GLM Logistic Regression
OML4Spark MLlib
Copyright © 2019 Oracle and/or its affiliates.
Enabling Oracle Applications
with OML Models and Microservices
HCM Cloud Workforce Predictions
CRM Sales Cloud Sales Prediction
Retail GBUCustomer Insights, Customer Segmentation
Adaptive Intelligent Applications for Manufacturing
Configure, Price, Quote Cloud
Content and ExperienceUnstructured Data Analytics
Oracle Integration CloudDigital Process Automation
Industry Data ModelsCommunications, SNA, Utilities, Airlines, Retail, …
EBS Spend Classification
Organize spend into logical categories
EBS Depot Repair
Optimize speed, cost, quality of product repair, reuse, recycling
Oracle Identity ManagementAdaptive Access Management
FSGBUAnalytical ApplicationsInfrastructure
Applications integrating OML
Copyright © 2019 Oracle and/or its affiliates.
Oracle Integration Cloud (OIC)
Digital Process AutomationHelp business users make better decisions by using recommendations from ML models
Increase automation of human-centric approval workflows
Used by Oracle SaaS process-centric apps
PaaS service that exposes OML featuresBuild models in ADBDeploy via OML Microservices
Oracle Content and Experience (OCE)
Improve Content DiscoverabilitySearch, organize content, reduce duplication
Find relevant images/docs during content creation
Automatic tagging and classification of images, videos, text
Visual search
Cloud-based content hub to drive omni-channel content management and accelerate experience delivery
Leverages OML cognitive microservices
Application platforms using OML Platform
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning Microservices
Model Management ServicesBuilding and deploying OML models
Cognitive ServicesFeature Extraction, Image and Text
Model repository Store, version, compare ML models
REST APIs for application integration
Currently available to Oracle Applications teams – GA coming soon
Copyright © 2019 Oracle and/or its affiliates.
Sample of Microservices APIs
Copyright © 2019 Oracle and/or its affiliates.
ModelRepository REST
ModelDeploy REST
CognitiveImage REST
CognitiveText REST
Model ManagementGET /models
GET /{model name}GET /{model name}/{version}
POST /{model name}POST /{model name}/{version}
DELETE /{model name}/{version}
Model DeploymentGET /models
GET /{uri}GET /{uri}/api
POST /{uri}POST /{uri}/score
DELETE /{uri}
Cognitive ImagePOST /imageClassification
POST /nsfwPOST /objectDetectionPOST /faceDetection
POST /imageSimilarityPOST /faceSimilarity
Cognitive TextPOST /ner
POST /topicsPOST /keywordsPOST /sentimentPOST /summaryPOST /similarity
Oracle Machine Learning RoadmapAlgorithmsOML NotebooksOML User InterfaceOML4R and OML4PyOML4SparkOML MicroservicesGPUs
Roadmap: Algorithms for Database 20c
eXtreme Gradient Boosting Trees (XGBoost)Classification, regression, ranking, survival analysis
Highly popular and powerful algorithm
MSET-SPRTAnomaly detection for sensors, IoT data sources
“Multivariate State Estimation Technique”
A non-linear, non-parametric anomaly detection ML technique
Based on Oracle Labs algorithm
Frequently requested algorithms
Copyright © 2019 Oracle and/or its affiliates.
Roadmap: Expand Autonomous Databasewith Python and R
OML Notebooks add support for Python and R
Python and R scripts managed in-databaseInvoke from OML Notebooks, and REST or SQL APIs
Deploy into SQL and Web applications easily
Scalable Python and R executionTransparency layer-enabled database functionality
In-database machine learning algorithms
AutoML functionality via OML4Py
OML4Py integrated with OCI Data Science (DataScience.com)
Autonomous Database as a Data Science Platform
DATA SCIENTISTS SQL Clients
REST Applications
$
SQL
Copyright © 2019 Oracle and/or its affiliates.
Roadmap: OML AutoML User Interface
Automate production and deployment of ML modelsEnhance Data Scientist productivity, user-experience
Enable non-expert users to leverage ML
Unify model deployment and monitoring
Support model management
FeaturesMinimal user input: data, target
Model leaderboard
Model deployment in applications via REST endpoint
Model monitoring: accuracy, prediction/predictor drift
Cognitive features for processing image and text
“Code-free” user interface supporting automated end-to-end machine learning
Copyright © 2019 Oracle and/or its affiliates.
Roadmap: OML4R and OML4Py
Expose additional OML4SQL algorithms to Python and R
Support for recent R and Python releases
Enhance analytics and data operations furtherwith in-database computation
Deliver high performance, scalable, auto-tuning algorithms and data analytics to Python and R users
Enable Oracle Database standard, integrated installation, patching, upgrade/downgrade
Expand support for open source languages and ecosystems
Copyright © 2019 Oracle and/or its affiliates.
Roadmap: OML4Spark
Support advanced machine learning activities on Big DataModel management and cognitive image and text processing
Model deployment and monitoring on Big Data (including Database models)
Cloud-oriented packaging (containers, REST APIs) of OML4Spark functionality
Enable OML4Py and OML4R for uniform experience across platforms
AlgorithmsNeural Network gradient descent enhancements avoid over-fitting
New native Support Vector Machine with linear and non-linear kernels
New native k-Means and k-Mode clustering algorithms
New cloud-based architecture with powerful Spark analytics
Copyright © 2019 Oracle and/or its affiliates.
Roadmap: OML Platform REST APIs
Extend Model Management MicroservicesModel Monitoring of accuracy and
prediction/predictor drift
User-defined scripts deploymentPython and R user-defined functions
REST API
ModelMonitoring REST
User-defined ScriptsDeployment
REST
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning Platform REST APIs
Cognitive Services
FeatureExtraction REST
CognitiveImage REST
CognitiveText REST
ModelRepository REST
ModelDeploy REST
ModelMonitoring REST
Model Management and Online Scoring Services
Build / Batch Scoring / Export
User-defined ScriptsDeployment
REST
Store Models and Metadata
Roadmap
Roadmap
Copyright © 2019 Oracle and/or its affiliates.
Oracle REST Data Services
REST
Roadmap: Enabling GPUs
Enable GPUs for in-database algorithms –replace MKL with cuBLAS
Leverage GPUs for user-defined R and Python functions
Include 3rd party packages leveraging GPUs, e.g., Tensorflow, Keras
Execute on GPU cloud compute shape
Support state-of-the-art ML processing, e.g., deep learning
Augment OML Microservices for GPU processing – key for images
Copyright © 2019 Oracle and/or its affiliates.
Oracle Machine Learning Key Attributes
Copyright © 2019 Oracle and/or its affiliates.
Automated Scalable Production-ready
Increase productivity, Achieve enterprise goals, Innovate More
For more information…
https://www.oracle.com/database/technologies/datawarehouse-bigdata/machine-learning.html
Copyright © 2019 Oracle and/or its affiliates.
Thank You
Mark Hornick
Senior Director, Product ManagementData Science and Machine Learning
Appendix
Business Value of Machine Learning
Deliver better outcomes
Understand the reasons behind outcomes
Capitalize on an enterprise’s unique data
Create sustainable competitive advantage
Increase the rate of better decision making
Respond to market conditions faster
Attain a higher level of customer satisfaction
Know which of your big data you should pay attention to
64
Traditional analytics tools examine historical data, Machine learning explains the past and predicts the future
On-premises configuration options
Oracle Machine Learning Components
Oracle Database Enterprise Edition
Oracle Advanced Analytics option
Oracle Advanced Analytics option
Oracle Data Mining
Oracle R Enterprise
Oracle Data Miner UI in SQL Developer
Oracle R Distribution
Standard R IDEs
Oracle Machine Learning for Python
Standard Python IDEs
Python
Oracle R Enterpriseclient library
Copyright © 2019 Oracle and/or its affiliates.
On-premises configuration options
Oracle Machine Learning Components
Oracle Database Enterprise Edition
Oracle Advanced Analytics option
Oracle Advanced Analytics option
OML4SQL
OML4R
Oracle Data Miner UI in SQL Developer
Oracle R Distribution
Standard R IDEs
OML4Py
Standard Python IDEs
Python
OML4Rclient library
Copyright © 2019 Oracle and/or its affiliates.
OML4R / ORE 1.5.1Machine Learning algorithms in-database
• Decision Tree• Logistic Regression • Naïve Bayes• Support Vector Machine• RandomForest
Regression
• Linear Model• Generalized Linear Model• Multi-Layer Neural Networks• Stepwise Linear Regression• Support Vector Machine
Classification
Attribute Importance
• Minimum Description Length• Expectation Maximization
Clustering
• Hierarchical k-Means • Orthogonal Partitioning• Expectation Maximization
Feature Extraction
• Nonnegative Matrix Factorization• Principal Component Analysis• Singular Value Decomposition• Explicit Semantic Analysis
Market Basket Analysis
• Apriori – Association Rules
Anomaly Detection
• 1 Class Support Vector Machine
Time Series
• Single Exponential Smoothing• Double Exponential Smoothing
…plus open source R packages for algorithms in combination with embedded R execution
Supports integrated partitioned models, text miningCopyright © 2019 Oracle and/or its affiliates.
OML4Py 1.0Machine Learning algorithms in-database
• Decision Tree• Naïve Bayes• Generalized Linear Model• Support Vector Machine• RandomForest• Neural Network
Regression
• Generalized Linear Model• Neural Network• Support Vector Machine
Classification
Attribute Importance
• Minimum Description Length
Clustering
• Expectation Maximization• Hierarchical k-Means
Feature Extraction
• Singular Value Decomposition• Explicit Semantic Analysis
Market Basket Analysis
• Apriori – Association Rules
Anomaly Detection
• 1 Class Support Vector Machine
…plus open source Python packages for algorithms in combination with embedded Python execution
Supports integrated partitioned models, text mining
ODB >= 18c
Copyright © 2019 Oracle and/or its affiliates.
R and Python Object PersistenceDatastore: *.save() and *.load()
Provide database storage to save/restore objects across sessions
Use casesPass R/Python predictive models for
embedded R/Python execution
Pass arguments to R/Python functions with embedded R/Python execution, especially when non-scalar for SQL invocation
Preserve objects across R/Python sessions
x1 <- ore.lm(...)x2 <- ore.frame(...)ore.save(x1,x2,name="ds1")
R Datastore
ore.load(name="ds1")ls()“x1” “x2”
ds1 {x1,x2}
Copyright © 2019 Oracle and/or its affiliates.
R and Python Object PersistenceDatastore: *.save() and *.load()
Provide database storage to save/restore objects across sessions
Use casesPass R/Python predictive models for
embedded R/Python execution
Pass arguments to R/Python functions with embedded R/Python execution, especially when non-scalar for SQL invocation
Preserve objects across R/Python sessions
x1 = rf_mod.fit(...)x2 = oml.push(...)oml.ds.save(objs={'x1': x1, 'x2': x2},
name="ds1")
oml.ds.load(name="ds1")[‘x1’, ‘x2’]
PythonDatastoreds1 {x1,x2}
Copyright © 2019 Oracle and/or its affiliates.
IoT Use Case: Sensor Data Analysis
Model each customer’s behavior and identify deviations in individual behaviorand overall aggregate demand
200 thousand households, each with a utility “smart meter”
1 reading / meter / hr
200K x 8760 hrs / yr 1.752B readings
3 years worth of data 5.256B readings
Each customer has 26280 readings
If each model takes 10 seconds to build, 555.6 hrs (23.2 days) …with 128 DOP 4.3 hrs
Massive Predictive Modeling
Copyright © 2019 Oracle and/or its affiliates.
f(dat,args,…) {
}
Oracle Database
Data c1 c2 ci cn
Scriptbuildmodel
f(dat,args,…) f(dat,args,…) f(dat,args,…) f(dat,args,…)
Model c1
Model c2
Model cn
Model ci
Datastore Script Repository
Scalable Sensor Data Analysis – Model BuildingIoT Use Case: Sensor Data Analysis
Copyright © 2019 Oracle and/or its affiliates.
Build models and store in database, partition on CUST_ID
OML4R embedded R execution
ore.groupApply (CUST_USAGE_DATA,
CUST_USAGE_DATA$CUST_ID,
function(dat, ds.name) {
cust_id <- dat$CUST_ID[1]
mod <- lm(Consumption ~ . -CUST_ID, dat)
mod$effects <- mod$residuals <- mod$fitted.values <- NULL
name <- paste("mod", cust_id,sep="")
assign(name, mod)
ds.name1 <- paste(ds.name,".",cust_id,sep="")
ore.save(list=paste("mod",cust_id,sep=""), name=ds.name1, overwrite=TRUE)
TRUE
},
ds.name="myDatastore", ore.connect=TRUE, parallel=TRUE
)
14 lines
Copyright © 2019 Oracle and/or its affiliates.
For model build and batch scoring
OML4R embedded R execution – SQL API
begin
--sys.rqScriptDrop('Example2')
sys.rqScriptCreate('Example2',
'function(dat,datastore_name) {
mod <- lm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
ore.save(mod,name=datastore_name, overwrite=TRUE)
TRUE
}');
end;
/
select *
from table(rqTableEval(
cursor(select ARRDELAY,
DISTANCE,
DEPDELAY
from ontime_s),
cursor(select 1 "ore.connect",
'myDatastore' as "datastore_name"
from dual),
'XML',
'Example2' ));
begin
--sys.rqScriptDrop('Example3')
sys.rqScriptCreate('Example3',
'function(dat, datastore_name) {
ore.load(datastore_name)
prd <- predict(mod, newdata=dat)
prd[as.integer(rownames(prd))] <- prd
res <- cbind(dat, PRED = prd)
res}');
end;
/
select *
from table(rqTableEval(
cursor(select ARRDELAY, DISTANCE, DEPDELAY
from ontime_s
where year = 2003
and month = 5
and dayofmonth = 2),
cursor(select 1 "ore.connect",
'myDatastore' as "datastore_name" from dual),
'select ARRDELAY, DISTANCE, DEPDELAY, 1 PRED from ontime_s',
'Example3'))
order by 1, 2, 3;
Copyright © 2019 Oracle and/or its affiliates.
Automatic Data Preparation
Supported transformations
• Binning: applies a supervised transformation to numeric data to generate categorical bins
• Normalization: normalizes numeric data to fit required range, e.g., 0..1
• Outlier treatment: removes values that deviate significantly from most other values in the column, which can affect normalization and binning
Transformation “instructions” embedded in model for automatic application during scoring
Can turn off automatic data preparation if user needs more control over preparation stages
Automatically performs transformations required by each algorithm
Partition Models
Builds ensemble model with multiple sub-models, one for each data partitionPotentially achieve better accuracy through multiple targeted models
Sub-models managed and used as one
Simplified scoring using top-level model onlyProper sub-model chosen by system based on row of data to be scored
Oracle Database
Table
Specify Partition Column(s)
Partition-1
Partition-2
Partition-3
Partition-n
…
Sub-Model-1
Sub-Model-2
Sub-Model-3
Sub-Model-n
Top Level Model
New Data
Score data using top level model
In-DBAlgorithm
…
Automates typical machine learning tasks for data scientists
Core APIs Feature Summary
Feature OML4SQL OML4Py OML4R
Transparency Layer n/a
Parallel, Distributed Algorithms
Embedded Execution n/a
Automated Data Preparation
Automated Text Processing
Automated Partitioned Models
Automated Machine Learning (AutoML)
PGX Integration for Graph Analytics implicit
DML table package transparency n/a
Extensible Algorithm Models
SQL plus two most popular open source languages for machine learning
n/a – not applicableCopyright © 2019 Oracle and/or its affiliates.