adventures in azure machine learning from ne bytes
TRANSCRIPT
Adventures!in Azure Machine
Learning@deejaygraham
derek graham!principal developer @sage!
special product responsibility for Azure bits
Sorry!!live coding!
live portall-ing
Machine learning is not telling
computers what to do but letting them
learn from examples or past experience
Which means you can…
• Analyse historic or current data • Find patterns and trends • Make predictions about future events
Project Adam
Azure Machine Learning
• A "new" cloud-based service from Microsoft • Integrates with existing Cloud technologies • Use ready-made algorithms • Program custom algorithms tuned to your problem • You can evaluate it for free
http://studio.azureml.net
• Browser based • Drag n Drop • Flowchart-y • Example data sets • Use R or Python • Excellent intro wizard
ML Studio
• Import data • Filter and aggregate data • Create machine learning models • Run experiments • Publish finished model
Provides tools to:
The Learning Process• Define a problem you want to solve • Design a solution • Experiment! !
• Identify your data • Train the model with the data • Evaluate against expected results (speed and
accuracy) • Adapt data or algorithm (or both) • Repeat !
• Save the best model • Publish • Run with live data
Proof!
Imagine…• "Business" Software
• Azure hosted
• PaaS
• Multi-tenanted
Open-ended Workflow• Monday Morning Login
• Friday Reports
• In between?
• Weekends?
• Holidays?
Balancing• User demand
• User experience
• Compute resources
• Cost
Scaling• Instances auto-scale based on the CPU% metric
using Azure’s standard scaling model.
• Azure standard scaling is slow
• Once auto scaler notices we need more capacity, the demand has often disappeared!
• Not a good user experience
Experiment• Customer use is not regular...
• ...but, is it predictable?
Hackathon!• Can we build a better autoscaler?
• Spin-up before high demand
• Tear-down when idle
• Better Cost vs UX
Requirements• What will "we" need on a given date or time?
• Do "we" need to take action now to compensate for what will happen in 20 minutes time?
• Number of instances
• Predicted CPU
Best Predictor of Demand?
• Sessions?
• Instance Memory Use?
• Instance CPU?
Table Storage Diagnostics• Too slow
• Purging
• ML queries all or nothing
• ML Data Reader stops after 4GB
• GB !!!!
• ML times-out after ~3 Hours
CPU
Event Hubs• Application log sink
• Low overhead
• Highly scalable
• Time-based
• Disposable
Neural Net Experiments!
• Feed Forward NN
• Written using R libraries
• Good predictor for 10-20 minute window
• Too inaccurate after that
• Best compromise between precision and speed
• Recurrent NN better at forecasting
• RNN execution time too long
• Need to reduce data to optimal subset
Stream Analytics!
• Real-time data analysis
• Fast
• Sql-like syntax
• Range of inputs and outputs
• Interesting development
Anomalies• Dev Process is painful
• Syntax Errors
• “Test” Import Behaviour
• Starting and Stopping and Starting and Stopping
Compromise
Closing the Loop
Publish…• ReST Web service
• Client Worker Role
• Management Service API
…& Be Damned• Too much data crashes model
• Fine in ML Studio
• 500
• Out of memory?
Finished!
Result!
What we learned
Bugs• We were pushing the environment quite hard
• YMMV
• ML studio has bugs
• Parallel tasks !Parallel
• ML portal missing functionality preventing it being production ready
#DevOps• Sharing models is "public" - Gallery
• No export support
• No support (yet) for model deployment
• Still Drag n Drop
• PowerShell for EventHubs and Stream Analytics
Machine Learning• Parallel R processing library would help
• Finding an appropriate solution often requires a data science specialist
• Solution is only as good as your data
• You may need to compromise on accuracy for speed
• Cost
• Hosting
• Each call to the service
References
http://studio.azureml.net/
E-Book• Microsoft Azure Essentials: Azure Machine
Learning
• Download from: https://mva.microsoft.com/ebooks
Titanic• Jennifer Marsman - https://
blogs.msdn.microsoft.com/jennifer/2016/02/19/using-azure-machine-learning-to-predict-who-will-survive-the-titanic/
• Data Science! https://www.kaggle.com/
• Amy Nicholson @AmyKateNicho https://blogs.technet.microsoft.com/amykatenicho/
#DevOps• https://azure.microsoft.com/en-gb/documentation/articles/event-
hubs-programming-guide/
• https://azure.microsoft.com/en-gb/documentation/articles/service-bus-event-hubs-manage-with-ps/
• https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-dotnet-management-sdk/
• https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-monitor-and-manage-jobs-use-powershell/
Questions ?