![Page 1: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/1.jpg)
From Lab to FactoryLessons learned developing Data Products
Keynote PyCon Colombia
Feb 2017
All opinions my own
![Page 2: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/2.jpg)
Who am I?
Data Scientist at Elevate (Recruitment Startup) and Author@springcoil About 5 years in Machine Learning/Data ScienceOSS contributor (mostly PyMC3)Built Ad-hoc analysis, Data Products and Data Pipelines at
![Page 3: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/3.jpg)
![Page 4: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/4.jpg)
Why talk about this?
![Page 5: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/5.jpg)
Data Moats
![Page 6: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/6.jpg)
Roadmap
What is Data Science?
Type of Data Scientists
Deploying Data Science (tech/culture)
Success stories and recommendations
Bigger idea (Change)
![Page 7: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/7.jpg)
What IS a Data Scientist?
![Page 8: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/8.jpg)
There's a Data Science Spectrum
HT: Sean J. Taylor (Facebook Research Scientist)
![Page 9: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/9.jpg)
Data Science to Value
Data doesn't do anything!
People need to make decisions (strategic/ tactical)
Or the product needs to alter someones decision making (SpotifyDiscover Weekly, Amazon Recommendations, etc)
![Page 10: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/10.jpg)
(Taken from Josh Wills talk - Lab to Factory 2014)or https://hbr.org/2013/04/two-departments-for-data-succe
![Page 11: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/11.jpg)
![Page 12: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/12.jpg)
Expectations and reality
![Page 13: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/13.jpg)
Data is the new oil...
But. Data isn't a product. It needs to be turned into one!
![Page 14: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/14.jpg)
Data Science is hard
Data and technical problems
Skill set (emotional and tech)
Machine Learning Ops
![Page 15: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/15.jpg)
Data Science projects are risky!
Many stakeholders think that data science is just an engineering problem,but research is high risk and high reward
De-risking the project - how? Send me examples :)
http://www.martingoodson.com/ten-ways-your-data-project-is-going-to-fail/
https://github.com/ianozsvald/data_science_delivered
![Page 16: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/16.jpg)
Success with Data Science
1. People
2. Ideas
3. Things
Source: USAF
![Page 17: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/17.jpg)
![Page 18: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/18.jpg)
R and D needs cultural support
![Page 19: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/19.jpg)
Your culture needs to avoid the HiPPO (highest paidpersons opinion) to be data-informed or data-driven.
![Page 20: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/20.jpg)
Good cultures
Data democratization
Clear objectives and metrics against objectives
Financial metrics don't always win - sometimes brand wins
See by Martin Goodsonhttp://bit.ly/cultureanddata
![Page 21: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/21.jpg)
Some data scientistsdo experiments and
build prototypes
![Page 22: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/22.jpg)
Production
![Page 24: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/24.jpg)
Deploying Data Products is hard
Monitoring and Alerting
Deployment
Real world data is tricky (training/ testing)
Interpretability (explaining fraud detection/ad tech)
Feature engineering
Models decay over time
![Page 25: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/25.jpg)
What projects work?
Explain existing data (visualization!)
Automate repetitive/ slow processes
Augment data to make new data (Search engines, ML models)
Predict the future (do something more accurately than gut feel )
Simulate using statistics :) (Sports analytics models)
![Page 26: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/26.jpg)
"You need data first" - Peadar Coyle
Copying and pasting PDF/PNG data Getting data in some areas is hard!!Some tools for web data extractionMessy APIs without documentation :(
![Page 27: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/27.jpg)
Visualise ALL THE THINGS!!(Relay foods dataset - HT Greg Reda)Consumer behaviour at a Fast Food Restaurant per year in the USA
![Page 28: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/28.jpg)
![Page 29: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/29.jpg)
Augmenting data and using API's
![Page 30: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/30.jpg)
Machine Learning
![Page 31: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/31.jpg)
Production data/ Real world
HT: Andrew Ng
![Page 32: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/32.jpg)
Everyone ETLSOnly 1% of your time will be spent modellingData pipelines and your infrastructure matters - Tools such as Spark, Luigi, Airflow or Dask are importantMonitoring of pipelines. Scalability. Machine provision
Eoin Brazil Talk
![Page 33: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/33.jpg)
Success Story (1)1. Ad Tech project (Media company) -
original process took 10 hours per week of analyst time.
2. Broke down process and produced simple predictive model
3. Data type challenges (unicode in Python 2.7) and businessexpectations
4. Now takes less than 30 minutes each week.
![Page 34: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/34.jpg)
Success Story (2)Elevate is a Total Talent Management platform (tools for streamliningrecruitment)
Several models in production - microservices (recommenders, job typeprediction)
Data pipeline (Luigi) necessary for producing features
Lots of challenges about Docker, time, updated data.
Works: MyPy (Python 3.5), code review, testing -
Next: Moving to PySpark for ETL, more pair programming
http://hypothesis.works/
![Page 35: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/35.jpg)
Failure1. Lack of support from tech teams
2. Lack of clearly defined customer
3. No culture of DevOps
4. Management had no clear vision
![Page 36: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/36.jpg)
![Page 37: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/37.jpg)
Lessons learned from Lab to Factory
1. Monitoring - data products need evaluation in production
2. Lack of a shared language between software engineers and datascientists. Pair programming/ Code review are some solutions.
3. To help data scientists and analysts succeed your business needs to beprepared to invest in tooling. (Pipelines, DevOps, Reproducible)
4. Often you're working with other teams who use different languages -so micro services can be a good idea (example C#.net app and Pythonmicroservice)
![Page 38: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/38.jpg)
How to deploy a model?Palladium (Otto Group)AzureFlask Microservice (DIY) and Docker
![Page 39: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/39.jpg)
Dev Ops
![Page 40: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/40.jpg)
![Page 41: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/41.jpg)
We need each other!
![Page 42: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/42.jpg)
Use small data where possible!!
Small problems with clean data are more important - (Ian Ozsvald)Amazon machine with many Xeons and 244GB of RAM is less than 3euros per hour. - (Ian Ozsvald) Blaze, Xray, Dask, Ibis, etc etc - "The mean size of a cluster will remain 1" - Matt Rocklin
PyData Bikeshed
![Page 43: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/43.jpg)
Focus on the real problem
![Page 44: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/44.jpg)
Closing remarks
Dirty data stops projects
Change happens, we need to help businesses adapt or be agile
There are some good projects like Icy, Luigi, etc for transforming dataand improving data extraction
- Google paperon rule for building ML systemshttp://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
![Page 45: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/45.jpg)
Closing Remarks (2)
Stakeholder management is a challenge too
Send me your dirty data and data deployment stories :)
www.peadarcoyle.com
https://leanpub.com/interviewswithdatascientists
![Page 46: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/46.jpg)
A New World
![Page 47: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/47.jpg)
![Page 48: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/48.jpg)
Simulate: Rugby games with MCMC(PyMC3)
![Page 49: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/49.jpg)
What is the Data Science process?
ObtainScrubExploreModelInterpretCommunicate (or Deploy)
![Page 50: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/50.jpg)
Some NLP on the Interviews!
![Page 51: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/51.jpg)
What do Data Scientists talk about?
Based on my Interview series!Dataconomy
![Page 52: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/52.jpg)
A famous 'data product' - Recommendation engines
![Page 53: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/53.jpg)
![Page 54: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/54.jpg)
Creditshttp://cdn.yourarticlelibrary.com/wp-content/uploads/2013/12/080.jpg
https://www.flickr.com/photos/82066314@N06/7624218468/sizes/z/
![Page 55: From Lab to Factory: Creating value with data](https://reader033.vdocument.in/reader033/viewer/2022051707/58ac2a8c1a28abf03a8b66e9/html5/thumbnails/55.jpg)