scikit learn: data normalization techniques that work
TRANSCRIPT
HELP YOUR DATA BE NORMAL
DAMIAN MINGLECHIEF DATA SCIENTIST
@DamianMingle
Want faster model run times and better accuracy?
Try Normalizing Your Data
What’s Normal Anyway?
Often stated as “scaling individual samples to have unit norm” or “scale input vectors individually to unit norm (vector length).
Adjusting values measured on different scales to a notionally common scale
Why Normalization Matters
In truth, not all machine learning models are sensitive to magnitude.
Data on the same scale can help machine learning models learn (think k-nearest neighbors and coefficients in regression)
Power in SciKit Learn
Preprocessing Clustering Regression Classification Dimensionality Reduction Model Selection
Power of SciKit Learn
Let’s Look at ML Recipe
Normalization
The Imports
from sklearn.datasets import load_iris from sklearn import preprocessing
Separate Features from Target
iris = load_iris() print(iris.data.shape)X = iris.data y = iris.target
Normalize the Features
normalized_X = preprocessing.normalize(X)
Normalization Recipe
# Normalize the data attributes for the Iris dataset. from sklearn.datasets import load_iris from sklearn import preprocessing # load the iris dataset iris = load_iris() print(iris.data.shape) # separate the data from the target attributes X = iris.data y = iris.target # normalize the data attributes normalized_X = preprocessing.normalize(X)
HELP YOUR DATA BE NORMAL
DAMIAN MINGLECHIEF DATA SCIENTIST
@DamianMingle
Resources
Society of Data Scientists SciKit Learn