how to build your first predictive model

Post on 20-Aug-2015

1.888 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How-To

Build YourFirst Model

A publication of

Building your first model with a new data mining tool can be intimidating.

Though some of us may have some intuition for model building, it’s pretty daunting to look at the default settings, knowing you have a ways to go before you have an accurate, explainable predictive model to hand over to your boss.

To make sure you’re set up for data mining success, follow these simple steps to build your first models in the SPM software suite.

INTRO

Want to skip ahead? Here’s what we’re going to cover.

IMPORT DATA5 … Prepare 6 … Stay Organized

Model Setup8 … Select and Engine9 … Analysis Type10 … Variables11 … Testing12 … Control Parameters

PERFORMANCE15 … What To Look For17 … What’s Next

IMPORTDATA

We’re going to walk you through best practices for preparing and uploading

your data into the SPM software.

PREPAREMake sure your data is in a ‘flat’ file (i.e. rows x columns)1

2 Make sure you understand your variable labels! If you don’t understand what your variables represent, you’re going to have a heck of a time understanding your results.

Want to read the nitty gritty?Check out the complete SPM User Guide.

Want to read the nitty gritty?Check out the complete SPM User Guide.

STAY ORGANIZEDSave your data set, or sets, in one, easy-to-find folder. If you’re pulling in data from all over creation, you’re just making the process longer and more difficult to comprehend. Do yourself a favor and dedicate a directory to each data mining project you’re working on.

Model Setup

Once you have imported your data, you need to set a few parameters (leaving most of them in default settings) before you click ‘start.’

10 parameters to pay attention to when building a model

Select an Engine.

CARTMARS

TreeNet

Random Forests

CART Ensembles

RuleLearner/Model Compression

Regression

Logit

GPS/Generalized Lasso

Data Binning

Classification. Regression.Logistic Binary.Unsupervised.

You must have a target variable.

SELECT A TARGET VARIABLE AND PREDICTORS1

2

3

4

You should have multiple predictors.

You don’t need to use all of your predictors.Take note of categorical vs. continuous variables.

SELECT A TESTING METHOD

No independent testing – exploratory treeFraction of cases selected at random for testing (%)Test sample contained in a separate file

V-fold cross-validation (i.e 10)

• Learn rate

• Number of trees built

• Number of nodes in a tree

• Loss criterion

*These will vary depending on the modeling engine being used to build a model.

Salford Systems RecommendsThat You Manually Set Your:

CLICK START!CLICK START!YOU ARE NOW BUILDING

YOUR FIRST MODEL

EVALUATING YOUR PERFORMANCE

Don’t get overwhelmed by all of the fancy reporting features available in the SPM software suite. Start slow. We will show you where to begin if you are new to using SPM and just want to understand what your model means.

What To Look For

• Mean Squared Error (MSE)• R-Squared• Test vs. Learn Performance• Variable Performance• Variable Dependence Plots (TreeNet)

If you have already downloaded the SPM software, build a model!

Once you’ve built your first model, start tweaking some of the control parameters we discussed.

What is your best model performance so far?

… AND YOU’RE DONE!

top related