machine learning - empatika open

Machine LearningBayram Annakov, Empatika Open

Types of ML

Supervised machine learning

Supervised learning

x

o

xxx

x x

ooo

o o

Classification

Unsupervised learning

Unsupervised learning

oooo

o o

oooo

o o

X1

X2

Users clustering

Process

Baby first steps

• GOAL: better purchase conversion from Trial Emails

• Knowledge: internal Empatika Open

• Whats next?

• Load new data & apply

First results - I’m genius!KNN: 97% score on test set

… & first disappointment

Too many Negative - unbalanced dataset

Balanced: 64%

Better do something: email conversion 2 times less

Start from scratch

+

How: 1. Small dataset (own)

• Time • Value

2. Balanced 3. Don’t hurry

lots of answers

ProcessData

Model 1 Model 2 Model N………

Results 1 Results 2 Results N

Reducing features

size

Scaling

… other data

stuffBest result

Train model(parameters)

………

Results

Test dataset

New dataset

30% better

7% better

Email conversion 2 times better Vs. previous model

416 different inputs

Next level

• Features

• Volume

• Understand Model parameters

• Train model harder (24/7)

• Whole picture: not only 1 score, but Precision, Recall, f1-score, etc.

Lessons & Knowledge source

• Think about features (valuable VS lots VS less) balance

• Models are sensitive to different data

• Model tuning is important, but long road

• Sources:

• O’REILLY: Introduction to Machine Learning with Python

• scikit-learn.org

• Github

Be patient

Process

Data collection & preparation

Modeling

Training

Evaluation

Data preparation!!!

Images classification

Rhythmic Gymnastics

Approach• Collect data

Simple iPhone app that helps draw and export

• Prepare dataImage = Grid. Each cell = 1 (black) or 0 (white)Convert Grid to Line Image = 000100011000011100011…

• Train + Analyze Until satisfied with the score

Prepare dataimport skimageimport numpy

Train and Analyze

1. K-neighbors 78% from sklearn.neighbors import KNeighborsClassifier clf = KNeighborsClassifier(n_neighbors=1) clf.fit(x_train, y_train) clf.score(x_test, y_test)

Train and Analyze

2. K-neighbors + PCA 81% from sklearn.decomposition import PCA pca = PCA(n_components=40, whiten=True) pca.fit(x_train) x_train_pca = pca.transform(x_train) x_test_pca = pca.transform(x_test)

//repeat KNN

May be someone has already solved it?

MNIST

http://yann.lecun.com/exdb/mnist/

http://yann.lecun.com/exdb/mnist/

3. SVM 90% from sklearn import svm classifier = svm.SVC(gamma=0.001) classifier.fit(x_train, y_train)

predicted = classifier.predict(x_test)

Neural networks?

Neuron

Perceptron

Multi-layered (deep)

Problems with images

Too big vectors (200x200x3 = 120,000)

Pixel position matters

Convolution

Pooling (sub-sampling)

Object recognition

ImageNet

Faces recognition

Eigenfaces

Recommendation systems

Bag-of-words

Data is key

Competitive Advantage?

CPU vs GPU

Opportunities

Better than Google?

Attributes

Proprietary data sets

Domain-specific tasks

Domain-specific knowledge

Useful links

“The Master Algorithm”

Andrew Ng “AI is new electricity”

fast.ai course

“Introduction to ML with Python”

“Python Machine Learning”

one more thing…

Please donate any sum to any fund

Plans

3 universities in Paris

Crowdfunding

Platform

Not only academics

New Tech

How you can help?Finances

Introductions

Ideas

Expertise

Media

Tech

even frequent flyer miles :)

ThanksLucy Evstratova

+79165884397

Unicore.pro

AlfaBank

4154 8120 0093 9516

Sberbank

4276 3800 1234 3302

Ачворвоы выовпывп ывп ыврп

machine learning - empatika open

Technology