machine learning rum - velocity 2016
TRANSCRIPT
Using machine learning to determine drivers
of bounce and conversion2016 Velocity Santa Clara
Pat Meenan@patmeenan
Tammy Everts@tameverts
What we did
Get the code
https://github.com/WPO-Foundation/beacon-ml
Deep Learning
Weights
Random ForestLots of random decision trees
Vectorizing the data• Everything needs to be numeric• Strings converted to several inputs
as yes/no (1/0)• i.e. Device Manufacturer– “Apple” would be a discrete input
• Watch out for input explosion (UA String)
Balancing the data• 3% Conversion Rate• 97% Accurate by always guessing no• Subsample the data for 50/50 mix
Validation Data• Train on 80% of the data• Validate on 20% to prevent
overfitting
Smoothing the data• ML works best on normally
distributed data
scaler = StandardScaler()x_train = scaler.fit_transform(x_train)x_val = scaler.transform(x_val)
Input/Output Relationships• SSL highly correlated with Conversions• Long sessions highly correlated with not
bouncing
• Remove correlated features from training
Training Deep Learningmodel = Sequential()model.add(...)model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"])model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)
Training Random Forestclf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None)clf.fit(x_train, y_train)
Feature Importances
clf.feature_importances_
What we learned
Takeaways
Thanks!