deep learning in production with the best
TRANSCRIPT
![Page 1: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/1.jpg)
skymind.io | deeplearning.org | gitter.im/deeplearning4j
Deep Learning in ProductionBuilding Production Class Deep Learning Workflows for the Enterprise
Adam Gibson / CTO SkymindAI With the Best / The Internet
![Page 2: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/2.jpg)
Topics• Deep Learning in Production vs Academia
• Data Scientists vs Engineers
• Defining Production
• A solution
![Page 3: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/3.jpg)
Deep Learning in Production vs Academia
![Page 4: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/4.jpg)
Academia/ResearchFocus on accuracy and the latest architectures
Build proof of concepts quickly to validate an assumption
Prototype as many ideas as quickly as possible to come
up with a solution to a problem
Publish often incremental results to increase publications
![Page 5: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/5.jpg)
Current state of researchMostly funded by large consumer companies (Amazon,Google,Facebook,..)
Scant pockets of deep learning academic institutions (CMU,Stanford,NYU,..)
Large focus on audio and vision, somewhat spreading in to natural language processing
Starting to focus more on reinforcement learning and better ways of tuning
![Page 6: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/6.jpg)
People in Deep Learning• Talent still sparse
• Most are in research labs
• Some of them are enthusiasts or startup founders
• Reality: Deep Learning hasn’t hit most of the world yet. It affects alot of people but most aren’t doing it.
![Page 7: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/7.jpg)
Industry (MOST Companies doing data science)● Most use linear regression and random forest● Prototyping happens in python - these are data scientists● Data Engineers hold the keys to the cluster (write code in java)● Most problems are simple - analytics, churn prediction, maybe
recommendation engines or price forecasting● Deep Learning is seen as overkill - no gpus in your cluster
![Page 8: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/8.jpg)
Data Scientists vs Engineers
![Page 9: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/9.jpg)
Data Scientists
• Math or stats background - know r or python
• Often a beginning coder - may have started in sql and moved up to analytics
• Know basic machine learning - problems are focused on replacing excel spreadsheets or solving business problems
![Page 10: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/10.jpg)
Data Engineers
• Computer Science background
• Builds data pipelines and knows how to setup production systems
• Doesn’t really know machine learning that well - usually willing to learn
• Usually closer to the product team - may port python algorithms to java depending on level of ability
![Page 11: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/11.jpg)
The hybrid
• Been in the game a while knows CS and stats
• Knows SQL, machine learning, and how to operate a spark cluster
• Can formulate problems and figure out what projects to tackle next
• Either understands business objectives or can implement machine learning algorithms themselves
![Page 12: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/12.jpg)
Most companies
• 2 separate teams
• Data scientists use python/r and sql, experiment with data and come up with new models (very little machine learning)
• Data engineers use java (sometimes .net) and work on terabytes of data - most time spent writing integrations and data pipelines
![Page 13: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/13.jpg)
Startups● Tend to employ generalists● Usually 3-5 people who can sort of do both. Startups aren’t usually ready to
hire specialists● Sometimes have a product where something like deep learning is needed● Usually ruby or python stack, not many users or scale● Usually just want something simple to setup● Not much need for compiled languages or scale yet - this comes later
![Page 14: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/14.jpg)
Defining Production
![Page 15: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/15.jpg)
Defining “Production”● Varying degrees of scale● Not everyone has terabytes of data● Mysql and outsourced cloud services are “machine learning” for most startups● Many will start out with scikit learn and flask, maybe add python based deep
learning later. This is “good enough” - this is also what you see the most tutorials for
● Larger companies care more about other things - security,scale, and return on investment for projects. These companies use java
● If you’re google you use c++ or facebook you use your own version of php
you wrote and maintain
![Page 16: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/16.jpg)
Hardware
• GPUs have very little market penetration
• Deep Learning also has very little market penetration (despite the marketing)
• Most of the world is cpus (this is changing very slowly)
• Startups are fine with cloud - on prem data centers are usually dell or hp servers with red hat or ubuntu on them
![Page 17: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/17.jpg)
Typical stack
• Web based product (go,ruby,python,scala,java,mix)
• Storage (1 or more sql databases, elasticsearch/solr)
• Cloud infrastructure or on prem (bare metal)
• Machine Learning - ???
![Page 18: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/18.jpg)
Machine Learning at startups
• Random 1 off scripts for analysis
• Random 1 off notebooks
• 1 off ETL pipelines written in java
• 1 or more models tied to a rest api that talks to your product stack
![Page 19: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/19.jpg)
Machine Learning at big companies
• Random 1 off scripts for analysis
• Random 1 off notebooks
• Large numbers of separate data bases and applications run by different teams
• Multiple disconnected apis
• Some models connected to a spark or hadoop cluster
![Page 20: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/20.jpg)
Challenges in Production
• Serving user traffic (latency)
• Data access (connecting everything together)
• Large amounts of time spent on data pipeline code
• Unclear metrics of success for the data team
• Lack of innovation or “too much” eg: “chase the shiny new thing”
![Page 21: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/21.jpg)
Challenges of Deep Learning in Production
• Same problems as machine learning
• Hard to interpret models
• Requires specialized hardware
• Not a lot of best practices
• Lack of expertise (machine learning is hard enough)
![Page 22: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/22.jpg)
Closing the gap
![Page 23: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/23.jpg)
Establish some best practices• Kaggle is a good start for this - start with “somewhat real” problems
• Use higher level tools - keras, otherwise easy to get lost in weeds
• Consider having a real world goal - eg: if you’re in real estate figure out how to use a simple cnn (not the latest algorithm) for image search
• Depending on need consider integration with hadoop/spark
• Lastly - don’t treat deep learning as special. It’s still a subfield of machine learning
![Page 24: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/24.jpg)
Going to production
• Sometimes python is enough for simple stuff
• Data Engineering teams should consider java/scala based solutions (disclaimer: highly opinionated here)
• Follow same workflow - prototype in python port to production
• Overall - scope to a core problem where deep learning is worth it
![Page 25: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/25.jpg)
Newer hardware
• Prototype on cloud infrastructure on a toy problem
• Try out this “GPU thing” and see what might be involved
• Learn the trade offs of cpus and gpus - don’t believe the marketing
• Buy new hardware as needed
![Page 26: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/26.jpg)
In closing
• Use something open source to start off with
• Use something *supported* keep an eye on open source activity
• Don’t just believe the research. Papers are not your company. Do due diligence
![Page 27: Deep learning in production with the best](https://reader034.vdocument.in/reader034/viewer/2022051706/58f9a949760da3da068b6d17/html5/thumbnails/27.jpg)
Thank you!Please visit skymind.io/learn for more information