data science in the cloud with microsoft azure
TRANSCRIPT
![Page 1: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/1.jpg)
Data Science in the cloud withMicrosoft AzureMARTIN THORNALLEYDATA SOLUTION ARCHITECT, MICROSOFT
![Page 2: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/2.jpg)
Introduction
![Page 3: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/3.jpg)
Data Science Definition
“Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics”https://en.wikipedia.org/wiki/Data_science
![Page 4: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/4.jpg)
Data Science Skillset
http://berkeleysciencereview.com/how-to-become-a-data-scientist-before-you-graduate/
![Page 5: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/5.jpg)
The Cloud
Why does the Cloud matter for Data Science?
High capacity and cost effective data storage Flexible, elastic compute capacity Ready to use technologies Choice of Infrastructure or Platform Enables Agile & DevOps Operational reliability and security Pay as you go
![Page 6: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/6.jpg)
Microsoft Azure Cloud Platform
Wide range of services covering Compute, Web & Mobile, Data & Storage, Analytics, Internet of Things & Intelligence plus many more, see http://azureplatform.azurewebsites.net/en-us/
Easy to get started, free to try for 30 days but limited spend, also MSDN licence free credits, see https://azure.microsoft.com/en-gb/free/
Comprehensive documentation and examples Global presence with many recognisable brands fully
committed Huge investment and growing rapidly
![Page 7: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/7.jpg)
Data Science Process
https://azure.microsoft.com/en-us/documentation/articles/data-science-process-overview/
![Page 8: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/8.jpg)
Worked Example
![Page 9: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/9.jpg)
NYC taxis
2013 NYC taxi trips and fares – open but non-trivial dataset 24 CSV files - 12 trip, 12 fare, 1 for each month ~20GB compressed, ~50GB uncompressed, 170+ million records
medallion – vehicle identifier hack license – driver identifier passenger count pickup & dropoff – datetime, longitude, latitude trip – time and distance fare - payment type, fare amount, surcharge, mta tax, tip amount,
tolls amount, total amounthttp://www.andresmh.com/nyctaxitrips/
![Page 10: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/10.jpg)
Predictions
Predict whether a specific journey will result in a tip – binary classification
Predict what class of tip will be for a specific journey – multiclass classification
Predict how much a tip will be for a specific journey – regression
![Page 11: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/11.jpg)
A Data Science Environment
![Page 12: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/12.jpg)
Data Science Virtual Machine
Create Linux and Windows virtual machines in minutes Wide range of configurations - CPU cores, memory, disks,
network speeds Scale to what you need Pay only for what you use Enhance security and compliance Preloaded with full set of tools and utilities from Azure
MarketPlace e.g. SQL Server 2016 Developer edition, Azure SDK, Python, R, Jupyter, etc.
![Page 13: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/13.jpg)
Storage Accounts
Massively scalable cloud storage for your applications Security-enhanced, durable, and highly available across the
globe Industry-leading performance with exabytes of capacity Pay only for what you use Open, multi-platform support
![Page 14: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/14.jpg)
HDInsight
A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service made easy Scale to petabytes on demand Crunch all data—structured, semi-structured, unstructured Skip buying and maintaining hardware Spin up Apache Hadoop, Spark, and R clusters in the cloud Use Excel or your favourite BI tool to visualize Hadoop data Connect on-premises Hadoop clusters with the cloud
![Page 15: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/15.jpg)
Azure Machine Learning
A fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. Powerful cloud based analytics, now part of Cortana
Intelligence Suite Azure Machine Learning Studio includes hundreds of built-in
packages and support for custom code Share your solution with the world in the Gallery or on the
Azure Marketplace
![Page 16: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/16.jpg)
The Process
![Page 17: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/17.jpg)
Preparation & Exploration
Copy data using Azcopy and decompress Inspect files and load in to RStudio Create external Hive tables and load Query over full dataset for further exploration Remove erroneous data e.g. passenger numbers, lat/long Engineer features using Hive
Distance from start to finish using Haversine calculation Binary indicator for tips Tip level based on ranges for multiclass classification
Downsample dataset and save as internal table for Machine Learning
![Page 18: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/18.jpg)
Machine Learning & Deployment
Import Data using Hive Query Build Training Experiments Evaluate model performance Create Predictive Experiments Publish Web Service Test Web Service Call from Excel
![Page 19: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/19.jpg)
Next Steps
To build a fully fledged enterprise solution with regular data ingestion and model execution consider the following: Data Catalog Data Factory Event Hubs & Stream Analytics Power BI Cognitive Services
![Page 20: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/20.jpg)
Conclusion
![Page 21: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/21.jpg)
Summary
Microsoft Azure provides a wide range of technologies for Data Science activities
Platform services reduce the management overhead No capacity limitations and flexible provisioning – pay as you go Choice of Open Source and Microsoft – use the best tool for the
task The tools are well integrated Azure Machine Learning makes it trivial to deploy your models It’s quick and easy to get started
![Page 22: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/22.jpg)
Getting Started
Sign up for freehttps://azure.microsoft.com/en-gb/free/
Create a Data Science VMhttps://azure.microsoft.com/en-us/marketplace/partners/microsoft-ads/standard-data-science-vm/
Visit Cortana Intelligence Gallery
https://gallery.cortanaintelligence.com/
![Page 23: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/23.jpg)
Q&A
![Page 24: Data Science in the cloud with Microsoft Azure](https://reader035.vdocument.in/reader035/viewer/2022062822/5882ea151a28ab33258b7d4b/html5/thumbnails/24.jpg)
Thank You
Martin ThornalleyData Solution Architect, Microsoft
@[email protected]://www.linkedin.com/in/martinthornalley