machine learning rum - velocity 2016

18
Using machine learning to determine drivers of bounce and conversion 2016 Velocity Santa Clara

Upload: patrick-meenan

Post on 17-Feb-2017

223 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Machine Learning RUM - Velocity 2016

Using machine learning to determine drivers

of bounce and conversion2016 Velocity Santa Clara

Page 2: Machine Learning RUM - Velocity 2016

Pat Meenan@patmeenan

Tammy Everts@tameverts

Page 3: Machine Learning RUM - Velocity 2016

What we did

Page 4: Machine Learning RUM - Velocity 2016

Get the code

https://github.com/WPO-Foundation/beacon-ml

Page 5: Machine Learning RUM - Velocity 2016

Deep Learning

Weights

Page 6: Machine Learning RUM - Velocity 2016

Random ForestLots of random decision trees

Page 7: Machine Learning RUM - Velocity 2016

Vectorizing the data• Everything needs to be numeric• Strings converted to several inputs

as yes/no (1/0)• i.e. Device Manufacturer– “Apple” would be a discrete input

• Watch out for input explosion (UA String)

Page 8: Machine Learning RUM - Velocity 2016

Balancing the data• 3% Conversion Rate• 97% Accurate by always guessing no• Subsample the data for 50/50 mix

Page 9: Machine Learning RUM - Velocity 2016

Validation Data• Train on 80% of the data• Validate on 20% to prevent

overfitting

Page 10: Machine Learning RUM - Velocity 2016

Smoothing the data• ML works best on normally

distributed data

scaler = StandardScaler()x_train = scaler.fit_transform(x_train)x_val = scaler.transform(x_val)

Page 11: Machine Learning RUM - Velocity 2016

Input/Output Relationships• SSL highly correlated with Conversions• Long sessions highly correlated with not

bouncing

• Remove correlated features from training

Page 12: Machine Learning RUM - Velocity 2016

Training Deep Learningmodel = Sequential()model.add(...)model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"])model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)

Page 13: Machine Learning RUM - Velocity 2016

Training Random Forestclf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None)clf.fit(x_train, y_train)

Page 14: Machine Learning RUM - Velocity 2016

Feature Importances

clf.feature_importances_

Page 15: Machine Learning RUM - Velocity 2016

What we learned

Page 16: Machine Learning RUM - Velocity 2016

Takeaways

Page 17: Machine Learning RUM - Velocity 2016
Page 18: Machine Learning RUM - Velocity 2016

Thanks!