model deployment - chemical engineeringchem-eng.utoronto.ca/~datamining/presentations/... · model...

23
Model Deployment Dr. Saed Sayad University of Toronto 2010 [email protected] 1 http://chem-eng.utoronto.ca/~datamining/

Upload: duongdan

Post on 12-Apr-2018

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Model Deployment

Dr. Saed SayadUniversity of Toronto

2010

[email protected]

1http://chem-eng.utoronto.ca/~datamining/

Page 2: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Model Deployment

• Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it.

• Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process.

• In many cases it will be the customer, not the data analyst, who will carry out the deployment steps. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions will need to be carried out in order to actually make use of the created models.

http://chem-eng.utoronto.ca/~datamining/ 2

Page 3: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Model Deployment - Poll

http://chem-eng.utoronto.ca/~datamining/ 3

May 2009

http://www.kdnuggets.com/

Page 4: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Model Deployments

• Use the data mining tool

• Programming Scripts

– Java, C, VB, …

– SAS, SPSS, …

• SQL Scripts

– TSQL, PL-SQL, …

– SQL functions

• PMML (Predictive Model Markup Language)

http://chem-eng.utoronto.ca/~datamining/ 4

Page 5: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Using Data Mining Tool (Orange)

http://chem-eng.utoronto.ca/~datamining/ 5

http://www.ailab.si/orange/

Page 6: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Programming Scripts - Visual Basic

http://chem-eng.utoronto.ca/~datamining/ 6

Page 7: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

SQL Scripts - SQL Function

http://chem-eng.utoronto.ca/~datamining/ 7

select RegressionModel(null,25000,'street')

Page 8: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

• PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.

• PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre and post processing).

8http://chem-eng.utoronto.ca/~datamining/

Page 9: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML

• It is developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.

• PMML eliminates need for custom model deployment and allows for the clear separation of tasks: model development vs. model deployment.

9http://chem-eng.utoronto.ca/~datamining/

Page 10: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Predictive Models supported by PMML

• Regression• Neural Networks• Support Vector Machines• Decision Trees• Naïve Bayes• Clustering• Sequences• Rule Sets• Association Rules• Time-Series (as of PMML 4.0)• Text Models

10http://chem-eng.utoronto.ca/~datamining/

Page 11: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Processes

1. Pre-Processing– Data Dictionary: Allows for the explicit specification of valid, invalid and missing

values.

– Mining Schema: Used to define the appropriate treatment to be applied to missing and invalid values.

– Transformations: Allow for variable discretization, normalization, and mapping with handling of missing and default values.

– Built-in Functions: Arithmetic expressions, handling of date and time as well as strings. Also used for implementing IF-THEN-ELSE logic and Boolean operations.

2. Models– PMML allows for several predictive modeling techniques to be fully expressed.

3. Post-Processing– Scaling of model outputs can be performed with PMML element Targets.

11http://chem-eng.utoronto.ca/~datamining/

Page 12: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Components

12http://chem-eng.utoronto.ca/~datamining/

Page 13: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Components - Header

• Header: contains general information about the PMML document, such as copyright information for the model, its description, and information about the application used to generate the model such as name and version.

• It also contains an attribute for a timestamp which can be used to specify the date of model creation.

http://chem-eng.utoronto.ca/~datamining/ 13

Page 14: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Components – Data Dictionary

• Data Dictionary: contains definitions for all the possible fields used by the model. It is here that a field is defined as continuous, categorical, or ordinal.

• Depending on this definition, the appropriate value ranges are then defined as well as the data type (such as, string or double).

http://chem-eng.utoronto.ca/~datamining/ 14

Page 15: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Components – Data Transformations

• Data Transformations: transformations allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations. – Normalization: map values to numbers, the input can be

continuous or discrete.– Discretization: map continuous values to discrete values.– Value mapping: map discrete values to discrete values.– Functions: derive a value by applying a function to one or

more parameters.– Aggregation: used to summarize or collect groups of

values.

http://chem-eng.utoronto.ca/~datamining/ 15

Page 16: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Data Transformations

http://chem-eng.utoronto.ca/~datamining/ 16

Page 17: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Components – Model

• Model: contains the definition of the data mining model. For example a fee-forward neural network is represented in PMML by a "NeuralNetwork" element which contains attributes such as:

– Model Name (attribute modelName)

– Function Name (attribute functionName)

– Algorithm Name (attribute algorithmName)

– Activation Function (attribute activationFunction)

– Number of Layers (attribute numberOfLayers)

http://chem-eng.utoronto.ca/~datamining/ 17

Page 18: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Components – Mining Schema

• Mining Schema: the mining schema lists all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as:

– Name (attribute name): must refer to a field in the data dictionary

– Usage type (attribute usageType): defines the way a field is to be used in the model. Typical values are: active, predicted, and supplementary. Predicted fields are those whose values are predicted by the model.

– Outlier Treatment (attribute outliers): defines the outlier treatment to be use. In PMML, outliers can be treated as missing values, as extreme values (based on the definition of high and low values for a particular field), or as is.

– Missing Value Replacement Policy (attribute missingValueReplacement): if this attribute is specified then a missing value is automatically replaced by the given values.

– Missing Value Treatment (attribute missingValueTreatment): indicates how the missing value replacement was derived (e.g. as value, mean or median).

http://chem-eng.utoronto.ca/~datamining/ 18

Page 19: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Model and Schema

http://chem-eng.utoronto.ca/~datamining/ 19

Page 20: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML Components – Targets

• Targets: allow for post-processing of the predicted value in the format of scaling if the output of the model is continuous.

• Targets can also be used for classification tasks. In this case, the attribute priorProbability specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. This can happen, e.g., if an input value is missing and there is no other method for treating missing values.

http://chem-eng.utoronto.ca/~datamining/ 20

Page 21: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

Targets

http://chem-eng.utoronto.ca/~datamining/ 21

Page 22: Model Deployment - Chemical Engineeringchem-eng.utoronto.ca/~datamining/Presentations/... · Model Deployment Dr. Saed Sayad ... •Use the data mining tool •Programming Scripts

PMML 4.0 – New Features

• Improved Pre-Processing Capabilities: Additions to built-in functions include a range of Boolean operations and an If-Then-Else function.

• Time Series Models: New exponential Smoothing models; also place holders for ARIMA, Seasonal Trend Decomposition, and Spectral Analysis, which are to be supported in the near future.

• Model Explanation: Saving of evaluation and model performance measures to the PMML file itself.

• Multiple Models: Capabilities for model composition, ensembles, and segmentation (e.g., combining of regression and decision trees).

• Extensions of Existing Elements: Addition of multi-class classification for Support Vector Machines, improved representation for Association Rules, and the addition of Cox Regression Models.

http://chem-eng.utoronto.ca/~datamining/ 22