practical machine learning and rails part1
DESCRIPTION
Part 2: http://www.slideshare.net/ryanstout/practical-machine-learning-and-rails-part2TRANSCRIPT
Practical Machine Learning and Rails
Who are we?Andrew Cantino
VP Engineering, Mavenlink @tectonic
Ryan StoutFounder, Agile Productions @ryanstout
This talk will
- have examples
- introduce machine learning
- make you ML-aware
This talk will not
- cover collaborative filtering,
optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...
- give you a PhD
- implement algorithms
What is Machine Learning?
Many different algorithms
that predict data
from other data
using applied statistics.
"Enhance and rotate 20 degrees"
What data?
The web is data.
APIs
LogsDatabases
Streams
Clicktrails
A/B Tests
Browser versions
User decisions
Reviews
Okay. We have data.
What do we do with it?
We classify it.
Classification
Classification
OR
Classification
OR:) :(
Classification
• Documentso Sort email (Gmail's importance filter)o Route questions to appropriate expert (Aardvark)o Categorize reviews (Amazon)
• Userso Expertise; interests; pro vs free; likelihood of paying;
expected future karma
• Eventso Abnormal vs. normal
Algorithms:Decision Tree Learning
Algorithms:Decision Tree Learning
Email contains word "viagra"
Email contains attachment?
Email contains word "Ruby"
P(Spam)=5%P(Spam)=10% P(Spam)=70% P(Spam)=95%
no
no no
yes
yes yes
Labels
Features
Algorithms:Support Vector Machines (SVMs)
Graphics from Wikipedia
Algorithms:Support Vector Machines (SVMs)
Graphics from Wikipedia
Algorithms:Naive Bayes
Graphics from Wikipedia
• Break documents into words and treat each word as an independent feature
• Surprisingly effective on simple text and document classification
• Works well when you have lots of data
Algorithms:Naive Bayes
Graphics from Wikipedia
You received 100 emails, 70 of which were spam.
Word Spam with this word Ham with this word
viagra 42 (60%) 1 (3.3%)
ruby 7 (10%) 15 (50%)
hello 35 (50%) 24 (80%)
A new email contains hello and viagra. The probability that it is spam is:
P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra) = 0.7 * (0.5 * 0.6) / (0.59 * 0.43) = 82%
Algorithms:Neural Nets
Graphics from Wikipedia
Input layer (features)
Hidden layer
Output layer (Classification)
Curse of Dimensionality
http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg
The more features and labels that you have, the more data that you need.
Overfitting• With enough parameters, anything is
possible.
• We want our algorithms to generalize and infer, not memorize specific training examples.
• Therefore, we test our algorithms on different data than we train them on.