deep learning in production with the best

27
skymind.io | deeplearning.org | gitter.im/deeplearning4j Deep Learning in Production Building Production Class Deep Learning Workflows for the Enterprise Adam Gibson / CTO Skymind AI With the Best / The Internet

Upload: adam-gibson

Post on 21-Apr-2017

1.273 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Deep learning in production   with the best

skymind.io | deeplearning.org | gitter.im/deeplearning4j

Deep Learning in ProductionBuilding Production Class Deep Learning Workflows for the Enterprise

Adam Gibson / CTO SkymindAI With the Best / The Internet

Page 2: Deep learning in production   with the best

Topics• Deep Learning in Production vs Academia

• Data Scientists vs Engineers

• Defining Production

• A solution

Page 3: Deep learning in production   with the best

Deep Learning in Production vs Academia

Page 4: Deep learning in production   with the best

Academia/ResearchFocus on accuracy and the latest architectures

Build proof of concepts quickly to validate an assumption

Prototype as many ideas as quickly as possible to come

up with a solution to a problem

Publish often incremental results to increase publications

Page 5: Deep learning in production   with the best

Current state of researchMostly funded by large consumer companies (Amazon,Google,Facebook,..)

Scant pockets of deep learning academic institutions (CMU,Stanford,NYU,..)

Large focus on audio and vision, somewhat spreading in to natural language processing

Starting to focus more on reinforcement learning and better ways of tuning

Page 6: Deep learning in production   with the best

People in Deep Learning• Talent still sparse

• Most are in research labs

• Some of them are enthusiasts or startup founders

• Reality: Deep Learning hasn’t hit most of the world yet. It affects alot of people but most aren’t doing it.

Page 7: Deep learning in production   with the best

Industry (MOST Companies doing data science)● Most use linear regression and random forest● Prototyping happens in python - these are data scientists● Data Engineers hold the keys to the cluster (write code in java)● Most problems are simple - analytics, churn prediction, maybe

recommendation engines or price forecasting● Deep Learning is seen as overkill - no gpus in your cluster

Page 8: Deep learning in production   with the best

Data Scientists vs Engineers

Page 9: Deep learning in production   with the best

Data Scientists

• Math or stats background - know r or python

• Often a beginning coder - may have started in sql and moved up to analytics

• Know basic machine learning - problems are focused on replacing excel spreadsheets or solving business problems

Page 10: Deep learning in production   with the best

Data Engineers

• Computer Science background

• Builds data pipelines and knows how to setup production systems

• Doesn’t really know machine learning that well - usually willing to learn

• Usually closer to the product team - may port python algorithms to java depending on level of ability

Page 11: Deep learning in production   with the best

The hybrid

• Been in the game a while knows CS and stats

• Knows SQL, machine learning, and how to operate a spark cluster

• Can formulate problems and figure out what projects to tackle next

• Either understands business objectives or can implement machine learning algorithms themselves

Page 12: Deep learning in production   with the best

Most companies

• 2 separate teams

• Data scientists use python/r and sql, experiment with data and come up with new models (very little machine learning)

• Data engineers use java (sometimes .net) and work on terabytes of data - most time spent writing integrations and data pipelines

Page 13: Deep learning in production   with the best

Startups● Tend to employ generalists● Usually 3-5 people who can sort of do both. Startups aren’t usually ready to

hire specialists● Sometimes have a product where something like deep learning is needed● Usually ruby or python stack, not many users or scale● Usually just want something simple to setup● Not much need for compiled languages or scale yet - this comes later

Page 14: Deep learning in production   with the best

Defining Production

Page 15: Deep learning in production   with the best

Defining “Production”● Varying degrees of scale● Not everyone has terabytes of data● Mysql and outsourced cloud services are “machine learning” for most startups● Many will start out with scikit learn and flask, maybe add python based deep

learning later. This is “good enough” - this is also what you see the most tutorials for

● Larger companies care more about other things - security,scale, and return on investment for projects. These companies use java

● If you’re google you use c++ or facebook you use your own version of php

you wrote and maintain

Page 16: Deep learning in production   with the best

Hardware

• GPUs have very little market penetration

• Deep Learning also has very little market penetration (despite the marketing)

• Most of the world is cpus (this is changing very slowly)

• Startups are fine with cloud - on prem data centers are usually dell or hp servers with red hat or ubuntu on them

Page 17: Deep learning in production   with the best

Typical stack

• Web based product (go,ruby,python,scala,java,mix)

• Storage (1 or more sql databases, elasticsearch/solr)

• Cloud infrastructure or on prem (bare metal)

• Machine Learning - ???

Page 18: Deep learning in production   with the best

Machine Learning at startups

• Random 1 off scripts for analysis

• Random 1 off notebooks

• 1 off ETL pipelines written in java

• 1 or more models tied to a rest api that talks to your product stack

Page 19: Deep learning in production   with the best

Machine Learning at big companies

• Random 1 off scripts for analysis

• Random 1 off notebooks

• Large numbers of separate data bases and applications run by different teams

• Multiple disconnected apis

• Some models connected to a spark or hadoop cluster

Page 20: Deep learning in production   with the best

Challenges in Production

• Serving user traffic (latency)

• Data access (connecting everything together)

• Large amounts of time spent on data pipeline code

• Unclear metrics of success for the data team

• Lack of innovation or “too much” eg: “chase the shiny new thing”

Page 21: Deep learning in production   with the best

Challenges of Deep Learning in Production

• Same problems as machine learning

• Hard to interpret models

• Requires specialized hardware

• Not a lot of best practices

• Lack of expertise (machine learning is hard enough)

Page 22: Deep learning in production   with the best

Closing the gap

Page 23: Deep learning in production   with the best

Establish some best practices• Kaggle is a good start for this - start with “somewhat real” problems

• Use higher level tools - keras, otherwise easy to get lost in weeds

• Consider having a real world goal - eg: if you’re in real estate figure out how to use a simple cnn (not the latest algorithm) for image search

• Depending on need consider integration with hadoop/spark

• Lastly - don’t treat deep learning as special. It’s still a subfield of machine learning

Page 24: Deep learning in production   with the best

Going to production

• Sometimes python is enough for simple stuff

• Data Engineering teams should consider java/scala based solutions (disclaimer: highly opinionated here)

• Follow same workflow - prototype in python port to production

• Overall - scope to a core problem where deep learning is worth it

Page 25: Deep learning in production   with the best

Newer hardware

• Prototype on cloud infrastructure on a toy problem

• Try out this “GPU thing” and see what might be involved

• Learn the trade offs of cpus and gpus - don’t believe the marketing

• Buy new hardware as needed

Page 26: Deep learning in production   with the best

In closing

• Use something open source to start off with

• Use something *supported* keep an eye on open source activity

• Don’t just believe the research. Papers are not your company. Do due diligence

Page 27: Deep learning in production   with the best

Thank you!Please visit skymind.io/learn for more information