predictive analytics -workshop

49
PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16 PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA #GHCI16 2016 Introduction to Predictive Analytics- Hands On Workshop Using R & Python Presenters: Python Lavanya Sita Tekumalla Sharmistha Jat R Maheshwari Dhandapani Subramanian Lakshminarayanan Sowmya Venugopal Bindu

Upload: subbul

Post on 11-Apr-2017

228 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

#GHCI16

2016Introduction to Predictive Analytics- Hands On Workshop Using R & Python

Presenters:

PythonLavanya Sita Tekumalla Sharmistha Jat

R

Maheshwari DhandapaniSubramanian LakshminarayananSowmya VenugopalBindu

Page 2: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Agenda

•Basics of Predictive Modeling Techniques (30m)•Hands on Workshop: Regression• (1) Build Model : R (30m) (2) Build Model : Python(30m)

Page 3: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

What is Predictive Analytics?Learn from available data and make meaningful predictions

Why Predictive Analytics?

Too much data – too many scenarios...Hard for humans to explicitly describe predictive rules for all scenarios

Exercise: lets predict something…

Predict how long it takes to reach home

Page 4: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Common Analytics Tasks...

Supervised Learning Regression : Predict continuous target

Can I predict time taken to get home from past history?Can I predict Sensex Value from past market history?

Page 5: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Common Analytics Tasks...

Supervised Learning Classification : Predict the class/type of objectClassify Images of Cats from Dogs from examples?Identify hand written digits by studying examples

Page 6: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Common Analytics Tasks...

Unsupervised Learning Clustering : Identify groups inherent in dataGiven a set of news articles, what are the underlying topics or themes?

Page 7: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Predict Movie Success ??

Page 8: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Predict Movie Success: Features

•Features:

–Actors–Director–Gross budget–Social media feedback–Genre and keywords–Release date

Page 9: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Example: Predict Movie Sales?

Known Data: Available advertising dollars and corresponding sales for lots of prior movies

Prediction Task: For a new movie, given advertising budget – can you forecast sales ?

Regression:

Sales = f (Advertising budget)

How to learn f ????

Page 10: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Example: Movie Hit / Flop from budget and Trailer Facebook likes?

Known Data: Available budgets and facebook statistics of various hit and flop movies...

Prediction Task: For a new movie, I know budget and facebook likes on trailer – what is the probability of hit ?

Classification:

Can I learn the Seperating LineBetween hit and flop movies? Budget

Face

book

Lik

es

Page 11: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

The Predictive Analytics Framework

Data/Examples

Feature Extraction Learning Algorithm

Model

New Data Instance

PredictionEvaluation: How well is my algorithm working ?Model Selection: What learning Algorithm to use ?

Page 12: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Important Aspects of Analytics Framework:

•Feature Engineering: Finding the discerning characteristics

•Data Collection: Collecting the right data / combining multiple sources

•Cleanup: Huge effort - noise/missing data/format conversion...

"If you torture the data long enough, it will confess to anything." -- Ronald Coase

“The goal is to turn data into information and information into insight." -- Carly Fiorina

Page 13: Predictive Analytics -Workshop

PAGE 13 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Regression AnalysisWhat ?

●“Regression analysis is a way of finding and representing the relationship between two or more variables.”

●Simple tool yet effective for prediction and estimates

Why ?● To predict an event/outcome using attributes or

features influencing it.Examples• Why UPS truck drivers don’t take a left turn?• Predict movie rating

Page 14: Predictive Analytics -Workshop

PAGE 14 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Regression AnalysisHow ?The key is to arrive at equation which brings in the relationship between the outcome and its influencing features.

It answers the questions:• Which variables matter most or the least?

— Independent /Predictors/Features — Dependent/Outcome

• How do those variables interact with each other?

Y = β0+β1x1+β2x2......+εMovie Rating

Budget

Duration

Page 15: Predictive Analytics -Workshop

PAGE 15 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Data ExplorationIdentify the nature of data and pattern in underlying set

Descriptive analysis : Describes or summarizes the raw data making it more human interpretable. It condenses data into nuggets of information (Mean,Median)

- Missing data , when impute, when omit (R packages :Mice, VIM, Amelia)

- Nature of data distribution ( around the mean, skewness, outliers)

Data Variable

Continuous-Quantitative

Categorical-Qualitative

Page 16: Predictive Analytics -Workshop

PAGE 16 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Visualize Data Distribution

Page 17: Predictive Analytics -Workshop

PAGE 17 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Visualization of variables relationship

- How two features/variables are related with one-another?• -1.00 → Increase in one variable

cause decrease in other• +1.00 → increase in one variable

causes increase in other• 0 → is a perfect zero correlation

- Is there a redundancy?

Page 18: Predictive Analytics -Workshop

PAGE 18 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Data Cleansing

What is cleaning“Conversion of raw data → technically correct data → to consistent data “

Why is cleansing importantIncorrect or inconsistent data can lead to drawing false conclusions.

• Removal of outliers which can skew your results• Removal of missing data • Removal of duplicates• Transformation of data

List of R Packages for data cleansingMICE, Amelia, missForest, Hmisc, mi

Page 19: Predictive Analytics -Workshop

PAGE 19 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Plotting missing data using mice package in R

Data Cleansing

Page 20: Predictive Analytics -Workshop

PAGE 20 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Feature selectionTo identify the important variables for building predictive models which is free from “correlated variables”, “bias” and “unwanted noise”.e.g. Boruta Package in R → Identifies important variables using Random Forrest

Page 21: Predictive Analytics -Workshop

PAGE 21 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Building the Model

Page 22: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

R - Workshop

Page 23: Predictive Analytics -Workshop

PAGE 23 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

R SetUP

• Copy the install binaries and packages to your laptop• Install R & Rstudio• Install the Packages (ggplot2,VIM,mice,Hmisc etc)• Copy the Model code, RDS file and the Dataset• Set the working directory using

• Setwd(<dir where you have the script, dataset,RDS file>)

Page 24: Predictive Analytics -Workshop

PAGE 24 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Explore Data using R

Page 25: Predictive Analytics -Workshop

PAGE 25 | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Validate the model• Run model against “test” data set which was set

aside to predict after training• Check the Prediction vs Actual observed value• (Cross)Validation is done to assess the “fit”ness of

model• Model should not under (or) over-fit future unseen

data• Validate regression using

— R2 (higher is better)— Residuals ( ideally should have random distribution to avoid

heteroscedasticity )

Page 26: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Python - Workshop

Page 27: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Basic Pipeline

1) Data loading and Inspection2) Cleaning and Preprocessing3) Train , Test partitioning4) Feature Selection5) Regression6) Model Selection, parameter tuning, regularisation

Page 28: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Data Loading

# loading imdb data into a python list format

import csvimdb_data_csv= csv.reader(open('movie_metadata.csv'))imdb_data=[]for item in imdb_data_csv: imdb_data.append(item)

Page 29: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Columns in Data 'color' 'director_name' 'num_critic_for_reviews' 'duration' 'director_facebook_likes' 'director_facebook_likes' 'actor_2_name' 'actor_1_facebook_likes' 'gross' 'genres' 'actor_1_name' 'movie_title' 'num_voted_users'

'cast_total_facebook_likes', 'actor_3_name', 'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link', 'num_user_for_reviews', 'language', 'country', 'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'

Page 30: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Preprocessing of data

Steps:

1) Convert text fields to numbers2) Convert strings (numbers in CSV get read up as strings) to float or int type3) Remove NANs4) Remove un-interesting columns from data5) Feature selection

data_float = preprocessing(imdb_data)data_np = np.array(data_float)

Page 31: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Train and Test data partitioning

from sklearn.model_selection import train_test_split

# remove label from datadata_np_x = np.delete(data_np, [20], axis=1)

# data partitioning x_train, x_test, y_train, y_test = train_test_split(data_np_x, data_np[:,20], test_size=0.25, random_state=0)

Page 32: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Regression

# apply regression and voila !!

from sklearn.linear_model import Ridgeregr_0 = Ridge(alpha=1.0)regr_0.fit(x_train, y_train)y_pred = regr_0.predict(x_test)

# model evaluationfrom sklearn.metrics import mean_absolute_errorprint 'absolute error: ', mean_absolute_error(y_test, y_pred)

from sklearn.metrics import mean_squared_errorprint 'squared error: ',mean_squared_error(y_test, y_pred)

Page 33: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Feature Selection

Select important columns which correlate well with output1) Model learning and inference faster2) Accuracy Improvement3) Feature Selection using PCA

from sklearn.decomposition import TruncatedSVDfrom copy import deepcopysvd = TruncatedSVD(n_components=5, n_iter=7, random_state=42)

data_svd = deepcopy(data_np_onehot)data_svd = svd.fit_transform(data_svd)

Page 34: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Model SelectionHow to select parameters of a model

Types of Regression

Popular regression models1) Linear Regression2) Ridge Regression: L2 Smoothing3) Kernel regression: Higher order/non-linear4) Lasso Regression: L1 Smoothing5) Decision Tree regression (CART)6) Random Forest Regression

Page 35: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Ridge Regression: Regularization

Why Regularization??

-- Less Training Data: Avoid Overfitting

-- Noisy Data: Smoothing/ Robustness to Outliers

Page 36: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Ridge Regression: Regularization

# apply Ridge regression !!

from sklearn.linear_model import Ridgeregr_ridge = Ridge(alpha=10);regr_ridge.fit(x_train, y_train)y_pred = regr_ridge.predict(x_test)

# model evaluationprint 'ridge absolute error: ', mean_absolute_error(y_test, y_pred)print 'ridge squared error: ',mean_squared_error(y_test, y_pred)

#Alpha determines how much smoothing/ regularization of weights we want

Page 37: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

How to select Parameter alpha?K-fold Cross Validation:

Page 38: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

How to select Parameter alpha?K-fold Cross Validation:

verbose_level=10from sklearn.model_selection import GridSearchCVregr_ridge = GridSearchCV(Ridge(), cv=3, verbose=verbose_level, param_grid={"alpha": [ 10,1,0.1]})

regr_ridge.fit(x_train, y_train)y_pred = regr_ridge.predict(x_test)print(regr_ridge.best_params_);

# model evaluationprint 'ridge absolute error: ', mean_absolute_error(y_test, y_pred)print 'ridge squared error: ',mean_squared_error(y_test, y_pred)

Page 39: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Lasso regression: Feature Sparsity

Another form of Regularization with L1 Norm:

# Lasso Regression

from sklearn.linear_model import Ridgeregr_0 = Ridge(alpha=1.0)regr_0.fit(x_train, y_train)y_pred = regr_0.predict(x_test)

#Alpha determines how much sparsity inducing smoothing/ regularization of weights we want

Page 40: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Lasso regression: Feature Sparsity

Ridge Regression Lasso Regression

Plotting the Coefficients in Ridge Regression vs Lasso Regression

Page 41: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Lasso Regularization Regression

verbose_level=1from sklearn.linear_model import Lassoregr_ls = GridSearchCV(Lasso(), cv=2, verbose=verbose_level, param_grid={"alpha": [ 0.01,0.1,1,10]})regr_ls.fit(x_train, y_train)y_pred = regr_ls.predict(x_test)print(regr_ls.best_params_);

# model evaluationprint 'Lasso absolute error: ', mean_absolute_error(y_test, y_pred)print 'Lasso squared error: ',mean_squared_error(y_test, y_pred)

Page 42: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Decision Tree Regression

Page 43: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Decision Tree Regression: Vizualization with depth

Depth 1 Depth 2Depth 1 Depth 5

Page 44: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Decision Tree Regression

regr_dt = GridSearchCV(DecisionTreeRegressor(), cv=2, verbose=verbose_level, param_grid={"max_depth": [ 2,3,4,5,6]})#regr_dt = DecisionTreeRegressor(max_depth=2)regr_dt.fit(x_train, y_train)y_pred = regr_dt.predict(x_test)print(regr_dt.best_params_);

# model evaluationprint 'decision tree absolute error: ', mean_absolute_error(y_test, y_pred)print 'decsion tree squared error: ',mean_squared_error(y_test, y_pred)

Page 45: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Random Forest for Regression

--> Learn multiple Decision Trees with random partitions of data--> Predict value as average of prediction from multiple trees

Page 46: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Random Forest Regression

from sklearn.ensemble import RandomForestRegressorregr_rf = GridSearchCV(RandomForestRegressor(), cv=2, verbose=verbose_level, param_grid={"max_depth": [ 2,3,4,5]})

#regr_dt = DecisionTreeRegressor(max_depth=2)regr_rf.fit(x_train, y_train)y_pred = regr_rf.predict(x_test)print(regr_rf.best_params_);

# model evaluationprint 'Random Forest absolute error: ', mean_absolute_error(y_test, y_pred)print 'Random Forest squared error: ',mean_squared_error(y_test, y_pred)

Page 47: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Other Forms Of Regression

# Support Vector Regression

kfold_regr = GridSearchCV(SVR(), cv=5, verbose=10, param_grid={"C": [ 10,1,0.1, 1e-2], "epsilon": [ 0.05,0.1, 0.2]})

#Gaussian Process Regression

kfold_regr = GridSearchCV(GaussianProcessRegressor(kernel=None), cv=5, verbose=10, param_grid={"alpha": [ 10,1,0.1, 1e-2]})

Page 48: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Recap of Python Session

Preprocessing – --> Feature Selection, --> Handling missing data --> Handling categorical data

Model Evaluation: Making training and testing data

Model Selection - --> Find parameters : Cross validation --> Various regression models: a. Simple Model : Linear Regression b. Regularization (L2 norm): Ridge regression c. Sparse Regularization: Lasso regression d. Interpretable – decision trees e. Random forests– Ensambles on Decision trees

Page 49: Predictive Analytics -Workshop

PAGE | GRACE HOPPER CELEBRATION INDIA 2016 | #GHCI16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY INDIA

Thank you