![Page 1: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/1.jpg)
Missing Data:
A Machine Learning
Approach
![Page 2: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/2.jpg)
DAMIAN MINGLE
CHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
![Page 4: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/4.jpg)
What’s Imputation Anyway?
Some models don’t do well with missing values, so filling with values could
prove useful.
Missing values can be replaced by the mean, median, or frequent value.
![Page 5: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/5.jpg)
Why Imputation Matters
Imputing the missing values can give better results than discarding the
samples containing any missing value.
Imputing does not always improve the predictions – cross-validation is good to
review.
In some cases, dropping rows or using marker values is more effective.
![Page 6: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/6.jpg)
Preprocessing
Clustering
Regression
Classification
Dimensionality Reduction
Model Selection
![Page 7: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/7.jpg)
Let’s Look
at an
ML Recipe
Imputation
![Page 8: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/8.jpg)
The Imports
import numpy as np
import urllib
from sklearn.preprocessing import Imputer
![Page 9: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/9.jpg)
Load Dataset with Missing Values
url = “https://goo.gl/3jvZXE”
raw_data = urllib.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=",") print(dataset.shape)
![Page 10: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/10.jpg)
Separate Features from Target
X = dataset[:,0:7]
y = dataset[:,8
![Page 11: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/11.jpg)
Mark Values with 0
X[X==0]=np.nan
![Page 12: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/12.jpg)
Impute Missing Values with Mean
imp = Imputer(missing_values =‘NaN’, strategy=‘mean’)
imputed_X = imp.fit_transform(X)
![Page 13: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/13.jpg)
Imputation Recipe
# Impute missing values with the mean
import numpy as np
import urllib
from sklearn.preprocessing import Imputer
# Load dataset from UCI Machine Learning Repo
url = “https://goo.gl/3jvZXE”
raw_data = urllib.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=",") print(dataset.shape)
# Segregate the data by features and target
X = dataset[:,0:7] y = dataset[:,8]
# All values with 0 become “not actual number” (NaN)
X[X==0]=np.nan
# Make use of the mean value for attribute
imp = Imputer(missing_values='NaN', strategy='mean')
imputed_X = imp.fit_transform(X)
![Page 14: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/14.jpg)
Missing Data:
A Machine Learning
Approach
![Page 15: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/15.jpg)
DAMIAN MINGLE
CHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
![Page 17: Scikit Learn: How to Deal with Missing Values](https://reader038.vdocument.in/reader038/viewer/2022100803/58ee72101a28ab0f1c8b46cd/html5/thumbnails/17.jpg)
Resources
Society of Data Scientists
SciKit Learn
Also:
Fit the imputer on X, fit(X[,y])
Fit to data, then transform it, fit_transform (X[,y])
Impute all missing values in X, transform(X)