everyone can do data science — import.io webinar
DESCRIPTION
Everyone can do data science with the help of tools such as: - import.io for visually scraping data from the web - Pandas to wrangle data in Python - BigML to apply machine learning to data. In this presentation , I introduce what machine learning is before moving on to a case study where I show how to build a real estate pricing model. Check out import.io's webinar for the whole thing: http://blog.import.io/post/become-a-data-scientist-in-an-hourTRANSCRIPT
![Page 1: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/1.jpg)
Everyone can dodata science"
import.io webinar 23/9/14!
Louis Dorard (@louisdorard)
![Page 2: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/2.jpg)
![Page 3: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/3.jpg)
![Page 4: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/4.jpg)
US real estate portals:"- Realtor - Zillow - Trulia - …
![Page 5: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/5.jpg)
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
![Page 6: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/6.jpg)
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
![Page 7: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/7.jpg)
Let’s create a real estate pricing model
![Page 8: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/8.jpg)
Fabien Durand (@thefabiendurand)
www.louisdorard.com/guest/everyone-can-do-data-science-importio
![Page 10: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/10.jpg)
Data Science:"- domain knowledge - hacking abilities - machine learning
![Page 11: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/11.jpg)
What the @#?~% is ML?
![Page 12: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/12.jpg)
![Page 13: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/13.jpg)
“Which type of email is this? — Spam/Ham”"-> Classification
![Page 14: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/14.jpg)
![Page 15: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/15.jpg)
“How much is this house worth? — X $” -> Regression
![Page 16: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/16.jpg)
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
![Page 17: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/17.jpg)
![Page 18: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/18.jpg)
ML is a set of AI techniques where “intelligence” is built by
referring to examples
![Page 19: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/19.jpg)
![Page 20: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/20.jpg)
??
![Page 21: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/21.jpg)
(McKinsey & Co.)
“A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics
and machine learning.”
![Page 22: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/22.jpg)
(Bret Victor)
Making ML effortless
![Page 23: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/23.jpg)
![Page 24: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/24.jpg)
HTML / CSS / JavaScript
![Page 25: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/25.jpg)
HTML / CSS / JavaScript
![Page 27: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/27.jpg)
![Page 28: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/28.jpg)
![Page 29: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/29.jpg)
The two phases of machine learning:
• TRAIN a model
• PREDICT with a model
![Page 30: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/30.jpg)
The two methods of prediction APIs:
• TRAIN a model
• PREDICT with a model
![Page 31: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/31.jpg)
The two methods of prediction APIs: • model = create_model(dataset)!
• predicted_output = create_prediction(model, new_input)
![Page 32: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/32.jpg)
from bigml.api import BigML !# create a model!api = BigML()!source = api.create_source('training_data.csv')!dataset = api.create_dataset(source)!model = api.create_model(dataset) !# make a prediction!prediction = api.create_prediction(model, new_input)!print "Predicted output value: ",prediction['object']['output']
http://bit.ly/bigml_wakari
![Page 33: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/33.jpg)
![Page 34: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/34.jpg)
Recap
![Page 35: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/35.jpg)
• Classification and regression
• 2 phases in ML: train and predict
• Prediction APIs make it easy to build models
• Let’s use them on real estate data to predict price from house characteristics
![Page 36: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/36.jpg)
• Encoding domain knowledge
• Making our life easier: restricting data to only 1 city
![Page 37: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/37.jpg)
![Page 38: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/38.jpg)
BigML!
• Look at data
• Split into training and test
• Build model from training
• Evaluate model on test
• Errors: mean absolute error (or percentage?)
![Page 39: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/39.jpg)
![Page 40: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/40.jpg)
Other import.io + BigML use cases:!- Predict ebook rating from description - Predict sales of etsy stores
![Page 41: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/41.jpg)
Talk at #APIconUK!tomorrow in London
![Page 42: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/42.jpg)
ML Algorithm API
Automated Pred. API
Text Classification API
Vertical Pred. API
Fixed-model Pred. API
AB
STRA
CTIO
N
![Page 43: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/43.jpg)
![Page 44: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/44.jpg)
![Page 45: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/45.jpg)
www.louisdorard.com/machine-learning-book 50% off for 24 hours with code “importio”
!
!
@louisdorard
![Page 46: Everyone can do data science — import.io webinar](https://reader035.vdocument.in/reader035/viewer/2022081413/547e4570b4af9fd3158b5682/html5/thumbnails/46.jpg)