machine learning - empatika open
TRANSCRIPT
Machine LearningBayram Annakov, Empatika Open
Types of ML
Supervised machine learning
Supervised learning
x
o
xxx
x x
ooo
o o
Classification
Unsupervised learning
Unsupervised learning
oooo
o o
oooo
o o
X1
X2
Users clustering
Process
Baby first steps
• GOAL: better purchase conversion from Trial Emails
• Knowledge: internal Empatika Open
• Whats next?
• Load new data & apply
First results - I’m genius!KNN: 97% score on test set
… & first disappointment
Too many Negative - unbalanced dataset
Balanced: 64%
Better do something: email conversion 2 times less
Start from scratch
+
How: 1. Small dataset (own)
• Time • Value
2. Balanced 3. Don’t hurry
lots of answers
ProcessData
Model 1 Model 2 Model N………
Results 1 Results 2 Results N
Reducing features
size
Scaling
… other data
stuffBest result
Train model(parameters)
………
Results
Test dataset
New dataset
30% better
7% better
Email conversion 2 times better Vs. previous model
416 different inputs
Next level
• Features
• Volume
• Understand Model parameters
• Train model harder (24/7)
• Whole picture: not only 1 score, but Precision, Recall, f1-score, etc.
Lessons & Knowledge source
• Think about features (valuable VS lots VS less) balance
• Models are sensitive to different data
• Model tuning is important, but long road
• Sources:
• O’REILLY: Introduction to Machine Learning with Python
• scikit-learn.org
• Github
Be patient
Process
Data collection & preparation
Modeling
Training
Evaluation
Data preparation!!!
Tasks
Images classification
Rhythmic Gymnastics
Rhythmic Gymnastics
Approach• Collect data
Simple iPhone app that helps draw and export
• Prepare dataImage = Grid. Each cell = 1 (black) or 0 (white)Convert Grid to Line Image = 000100011000011100011…
• Train + Analyze Until satisfied with the score
Prepare dataimport skimageimport numpy
Train and Analyze
1. K-neighbors 78% from sklearn.neighbors import KNeighborsClassifier clf = KNeighborsClassifier(n_neighbors=1) clf.fit(x_train, y_train) clf.score(x_test, y_test)
Train and Analyze
2. K-neighbors + PCA 81% from sklearn.decomposition import PCA pca = PCA(n_components=40, whiten=True) pca.fit(x_train) x_train_pca = pca.transform(x_train) x_test_pca = pca.transform(x_test)
//repeat KNN
May be someone has already solved it?
3. SVM 90% from sklearn import svm classifier = svm.SVC(gamma=0.001) classifier.fit(x_train, y_train)
predicted = classifier.predict(x_test)
Neural networks?
Neuron
Perceptron
Multi-layered (deep)
Problems with images
Too big vectors (200x200x3 = 120,000)
Pixel position matters
Convolution
Pooling (sub-sampling)
CNN
Object recognition
ImageNet
Faces recognition
Eigenfaces
LFW
Recommendation systems
NLP
Bag-of-words
Data is key
Competitive Advantage?
Costs
Why?
CPU vs GPU
Opportunities
Better than Google?
Attributes
Proprietary data sets
Domain-specific tasks
Domain-specific knowledge
So,
Useful links
“The Master Algorithm”
Andrew Ng “AI is new electricity”
fast.ai course
“Introduction to ML with Python”
“Python Machine Learning”
one more thing…
Please donate any sum to any fund
Plans
3 universities in Paris
Crowdfunding
Platform
Not only academics
New Tech
How you can help?Finances
Introductions
Ideas
Expertise
Media
Tech
even frequent flyer miles :)
ThanksLucy Evstratova
+79165884397
Unicore.pro
AlfaBank
4154 8120 0093 9516
Sberbank
4276 3800 1234 3302
Ачворвоы выовпывп ывп ыврп