machine learning on azure - azureconf
DESCRIPTION
Machine Learning can often be a daunting subject to tackle much less utilize in a meaningful manner. In this session, attendees will learn how to take their existing data, shape it, and create models that automatically can make principled business decisions directly in their applications. The discussion will include explanations of the data acquisition and shaping process. Additionally, attendees will learn the basics of machine learning - primarily the supervised learning problem.TRANSCRIPT
Machine Learning on
AzureSeth Juarez
Analytics Program ManagerDevExpress
@sethjuarez
Questions?#azureconf
on Twitter
Agenda
1)data science2)prediction3)process4)models5)AzureML
data science• key word: “science”• try stuff• it (might not | won’t) work
the first time
• this might work…question
• wikipedia timeresearch
• I have an ideahypothesis
• try it outexperiment
• did this even work?analysis
• time for a better idea
conclusion
machine learning• finding (and exploiting) patterns in data• replacing “human writing code” with “human supplying data”• system figures out what the person wants based on examples• need to abstract from “training” examples to “test” examples•most central issue in ML: generalization
machine learning
•split into two (ish) areas•supervised learning• predicting the future• learn from past examples to predict future
•unsupervised learning• understanding the past• making sense of data• learning structure of data• compressing data for consumption
neat applications
neat applications
9
neat applications• spam catchers• ocr (optical character recognition)• natural language processing•machine translation• biology•medicine• robotics (autonomous systems)• etc…
predictionmaking decisions
11
making decisions
•what kinds of decisions are we making?• binary classification• yes/no, 1/0, male/female
•multi-class classification• {A, B, C, D, F} (Grade),
{1, 2, 3, 4} (Class), {teacher, student, secretary}
• regression• number between 0 and 100, real value
process
data
1clean
transformmaths
2model
3
predict
4
dataClass Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
label (y)play / no play
featuresoutlook, temp, windy
values (x)[Sunny, Low, Yes]
Labeled dataset is a collection of (X, Y) pairs.Given a new x, how do we predict y?
clean / transform / mathsClass Outlook Temp. Windy
Play Sunny Lowest Yes
No Play ? High Yes
No Play Sunny High KindOf
Play Overcast ? Yes
Play Turtle Cloud
High No
Play Overcast ? No
No Play Rainy Low 28%
Play Rainy Low No
? Sunny Low No
need to clean up dataneed to convert to model-able form (linear algebra)
yak shavingAny apparently useless activity which, by allowing you to overcome intermediate difficulties, allows you to solve a larger problem.
I was doing a bit of yak shaving this morning, and it looks like it might have paid off.
http://en.wiktionary.org/wiki/yak_shaving
clean / transform / mathsClass Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
need to clean up dataneed to convert to model-able form (linear algebra)
modelClass Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
predict
PLAY!!!
Class Outlook
Temp. Windy
? Sunny Low No
modelshow do we build them?
19
linear classifiers
• in order to classify things properly we need:• a way to mathematically represent examples• a way to separate classes (yes/no)
•“decision boundary”•excel example•graph example
MODELS
20
linear classifiers
•dot product of vectors• [ 3, 4 ] ● [ 1, 2 ] = (3 × 1) + (4 × 2) = 11• a ● b = | a | × | b | cos θ•When does this equal 0?
•why would this be useful?• decision boundary can be represented using a single vector
MODELS
perceptron…and other linear models
22
linear classifiers
•Frank Rosenblatt, Cornell 1957• let’s make a line (by using a single vector) • take the dot product between the line and the new point• > 0 belongs to class 1• < 0 belongs to class 2• == 0 flip a coin we don’t know
• for each example, if we make a mistake, move the line
MODELS
perceptronpoint demo
perceptron
numerical features
× (dot)
learned vector
+1 / -1
what if….
kernel methodsmodels
kernel methods
=
features….
perceptron
•minimize mistakes by moving w
subject to:
REMINDER
perceptron
•eventually this becomes an optimization problem
subject to:
REMINDER
perceptron
•eventually this becomes an optimization problem
subject to:
REMINDER
perceptron
•eventually this becomes an optimization problem
subject to:
REMINDER
dot product
32
perceptron
•Frank Rosenblatt, Cornell 1957• let’s make a line (by using a single vector) • take the dot product between the line and the new point• > 0 belongs to class 1• < 0 belongs to class 2• == 0 flip a coin we don’t know
• for each example, if we make a mistake, move the line
REMINDER
kernel (one weird trick….)
•store dot product in a table
•call it the “kernel matrix” and “kernel trick”•project into any space and still learn a linear model
MODELS
support vector machines
• this method is the basis for SVM’s• returns a set of vectors (<< n) to make decision•essentially changed the space to make it separable
MODELS
kernels
•polynomial kernel
•RBF kernel
MODELS
1
36
what if….
neural networksmodels
neural networks
neural networks
Play?
h1 ( ¿ ) h2 ( ¿ )
h3 ( ¿ )𝐵1
LINEAR METHODS
• perceptron (what if we can’t make a line?)• svm – change the space• neural networks – change the function
(linear?)
decision treesmodels
decision treesClass Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
decision trees
•how should the computer split?• information gain (with entropy)• entropy measures how disorganized your answer is.• information gain says:• if I separate the answer by the values in
a particular column, does the answer become *more* organized?
decision trees
•calculating information gain:
• – how messy is the answer•– how messy is the answer if we know a?
decision treesdemo
POPULAR MODELS
• support vector machines• neural networks• decision trees
do they work?testing
how well is it doing?
Train Test
Use 80% Use 20%
AzureMLputting it all together
50
process reminder (same on Azure)
data
1clean
transformmaths
2model
3
predict
4
experimentsputting it all together
52
Truthtrue false
Guess
positivenegative
confusion matrix
AzureML WebServicesputting it all together
54
Get started with a free trial
Or, use your existing benefits…
http://aka.ms/AzureConf2014
http://aka.ms/AzureConf-MemberOffers
THANK YOU!!!
AND STAY TUNED FOR THE NEXT SESSIONS!!!!!
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Seth Juarez
Analytics Program Manager, DevExpress
@sethjuarez