practical machine learning and rails part1

21
Practical Machine Learning and Rails

Upload: ryanstout

Post on 02-Dec-2014

5.393 views

Category:

Technology


1 download

DESCRIPTION

Part 2: http://www.slideshare.net/ryanstout/practical-machine-learning-and-rails-part2

TRANSCRIPT

Page 1: Practical Machine Learning and Rails Part1

Practical Machine Learning and Rails

Page 2: Practical Machine Learning and Rails Part1

Who are we?Andrew Cantino

VP Engineering, Mavenlink @tectonic

Ryan StoutFounder, Agile Productions @ryanstout

Page 3: Practical Machine Learning and Rails Part1

This talk will

- have examples

- introduce machine learning

- make you ML-aware

Page 4: Practical Machine Learning and Rails Part1

This talk will not

- cover collaborative filtering,

optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...

- give you a PhD

- implement algorithms

Page 5: Practical Machine Learning and Rails Part1

What is Machine Learning?

Many different algorithms

that predict data

from other data

using applied statistics.

Page 6: Practical Machine Learning and Rails Part1

"Enhance and rotate 20 degrees"

Page 7: Practical Machine Learning and Rails Part1

What data?

The web is data.

APIs

LogsDatabases

Streams

Clicktrails

A/B Tests

Browser versions

User decisions

Reviews

Page 8: Practical Machine Learning and Rails Part1

Okay. We have data.

What do we do with it?

We classify it.

Page 9: Practical Machine Learning and Rails Part1

Classification

Page 10: Practical Machine Learning and Rails Part1

Classification

OR

Page 11: Practical Machine Learning and Rails Part1

Classification

OR:) :(

Page 12: Practical Machine Learning and Rails Part1

Classification

• Documentso Sort email (Gmail's importance filter)o Route questions to appropriate expert (Aardvark)o Categorize reviews (Amazon)

• Userso Expertise; interests; pro vs free; likelihood of paying;

expected future karma

• Eventso Abnormal vs. normal

Page 13: Practical Machine Learning and Rails Part1

Algorithms:Decision Tree Learning

Page 14: Practical Machine Learning and Rails Part1

Algorithms:Decision Tree Learning

Email contains word "viagra"

Email contains attachment?

Email contains word "Ruby"

P(Spam)=5%P(Spam)=10% P(Spam)=70% P(Spam)=95%

no

no no

yes

yes yes

Labels

Features

Page 15: Practical Machine Learning and Rails Part1

Algorithms:Support Vector Machines (SVMs)

Graphics from Wikipedia

Page 16: Practical Machine Learning and Rails Part1

Algorithms:Support Vector Machines (SVMs)

Graphics from Wikipedia

Page 17: Practical Machine Learning and Rails Part1

Algorithms:Naive Bayes

Graphics from Wikipedia

• Break documents into words and treat each word as an independent feature

• Surprisingly effective on simple text and document classification

• Works well when you have lots of data

Page 18: Practical Machine Learning and Rails Part1

Algorithms:Naive Bayes

Graphics from Wikipedia

You received 100 emails, 70 of which were spam.

Word Spam with this word Ham with this word

viagra 42 (60%) 1 (3.3%)

ruby 7 (10%) 15 (50%)

hello 35 (50%) 24 (80%)

A new email contains hello and viagra. The probability that it is spam is:

P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra) = 0.7 * (0.5 * 0.6) / (0.59 * 0.43) = 82%

Page 19: Practical Machine Learning and Rails Part1

Algorithms:Neural Nets

Graphics from Wikipedia

Input layer (features)

Hidden layer

Output layer (Classification)

Page 20: Practical Machine Learning and Rails Part1

Curse of Dimensionality

http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg

The more features and labels that you have, the more data that you need.

Page 21: Practical Machine Learning and Rails Part1

Overfitting• With enough parameters, anything is

possible.

• We want our algorithms to generalize and infer, not memorize specific training examples.

• Therefore, we test our algorithms on different data than we train them on.