methods and resources distributed deep learningsergey/slides/n17_aiukrainedistributed.pdf · 10...

35
Distributed Deep Learning: Methods and Resources Sergey Nikolenko Chief Research Officer, Neuromation Researcher, Steklov Institute of Mathematics at St. Petersburg September 23, 2017, AI Ukraine, Kharkiv Maxim Prasolov CEO, Neuromation

Upload: others

Post on 28-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Distributed Deep Learning:Methods and Resources

Sergey NikolenkoChief Research Officer, Neuromation

Researcher, Steklov Institute of Mathematics at St. Petersburg

September 23, 2017, AI Ukraine, Kharkiv

Maxim PrasolovCEO, Neuromation

Page 2: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Outline

● Bird’s eye overview of deep learning

● SGD and how to parallelize it

● Data parallelism and model parallelism

● Neuromation: developing a worldwide marketplacefor knowledge mining

Page 3: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

● 10 years ago machine learning underwent a deep learning revolution

● Neural networks are one of the oldest techniques in ML

● But since 2007-2008, we can train large and deep neural networks(in part due to distributed computations)

● And now deep NNs yield state of the art results in many fields

Page 4: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

What is a deep neural network

● A deep neural network is a huge composition of simple functionsimplemented by artificial neurons

● Usually linear combination followed by nonlinearity, but can beanything as long as you can takederivatives

● These functions are combined into acomputational graph that computes the loss function for the model

Page 5: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Backpropagation● To train the model (learn the weights),

you take the gradient of the lossfunction w.r.t. weights

● Gradients can be efficiently computedwith backpropagation

● And then you can do (stochastic)gradient descent and all of itswonderful modifications, fromNesterov momentum to Adam

Page 6: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

FEEDFORWARD NETWORKS

CONVOLUTIONAL NETWORKS

RECURRENT NETWORKS

Gradient descent is used for all kinds of neural networks

Page 7: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Distributed Deep Learning: The Problem

● One component of the DL revolutionwas the use of GPUs

● GPUs are highly parallel (hundreds of cores)and optimized for matrix computations

● Which is perfect for backprop (and fprop too)

● But what if your model does not fit on a GPU?

● Or what if you have multiple GPUs?

● Can we parallelize further?

Page 8: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

What Can Be Parallel

● Model parallelism vs. data parallelism

● We will discuss both

● Data parallelism is much more common

● And you can unite the two:

[pictures from (Black, Kokorin, 2016)]

Page 9: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Examples of data parallelism

● Make every worker do its thing andthen average the results

● Parameter averaging: average w from all workers○ but how often?○ and what do we do with advanced SGD variants?

● Asynchronous SGD: average updates from workers○ much more interesting without synchronization○ but the stale gradient problem

● Hogwild (2011): very simple asynchronous SGD,just read and write to shared memory, lock-free;whatever happens, happens

Page 10: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Examples of data parallelism

● FireCaffe:○ DP on a GPU cluster○ communication through

reduction trees

Page 11: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Model parallelism

● In model parallelism, different weights are distributed

● Pictures from the DistBelief paper (Dean et al., 2012)

● Difference in communication:○ DP: workers exchange weight updates ○ MP: workers exchange data updates

● DP in DistBelief: Downpour SGDvs. Sandblaster L-BFGS

● Now, DistBelief has been completely replaced by...

Page 12: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Distributed Learning in TensorFlow

● TensorFlow has both DP (right) and MP (bottom)

● Workers and parameter servers

● MP usually works as a pipeline between layers:

Page 13: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

First specify the structure of the cluster: Then assign (parts of) computational graph to workers and weights to parameter servers:

Example of Data Parallelism in TensorFlow

Page 14: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Interesting variations

● (Zhang et al., 2016) – staleness-aware SGD: add weights depending on the time (staleness) to updates

● Elephas: distributed Keras that runs on Spark

● (Xie et al., 2015) – sufficient factor broadcasting:represent and send only u and v

● (Zhang et al., 2017) – Poseidon: a new architecture withwait-free backprop and hybrid communication

Page 15: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

● Special mention: reinforcement learning; async RL is great!● And standard (by now) DQN tricks are perfect for parallelization:

○ experience replay: store experience in replay memory and serve them for learning

○ target Q-network is separate from the Q-network which is learning now, updates are rare

Distributed reinforcement learning

Page 16: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Gorila from DeepMind: everything is parallel and asynchronous

Page 17: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Recap

● Data parallelism lets you process lots of data in parallel, copying the model

● Model parallelism lets you break down a large model into parts

● Distributed architectures are usually based on parameter servers and workers

● Especially in reinforcement learning, where distributed architectures rule

● And this all works out of the box in TensorFlow and other modern frameworks

● But how is it relevant to us? Isn’t that for the likes of Google and/or DeepMind?

● Where do we get the computational power and why do we need so much data?

Distributed deep learning works

Page 18: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

Chris

NOT ENOUGH LABELED DATA FOR NEURAL NETWORK TRAINING

“BOTTLENECK” OF AUTOMATION OF EVERY INDUSTRY:

Page 19: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

IMAGE RECOGNITION IN RETAIL

Page 20: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

170.000 OBJECTS

ABOUT 40 BLN IMAGES PER YEAR

TO AUTOMATE THE RETAIL INDUSTRY

MUST BE RECOGNIZED ON THE SHELVES

IMAGE RECOGNITION IN RETAIL BY ECR RESEARCH:

OSA HP CONTRACTED NEUROMATION TO PRODUCE LABELED DATA AND TO RECOGNIZE

30% OF THE COST IS COMPUTATIONAL POWER.

MORE THAN 7 MLN EURO REVENUE

Page 21: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

ChrisLABELED PHOTOS ARE REQUIRED TO TRAIN IMAGE RECOGNITION MODELS

MORE THAN

1 BLN

WHERE CAN WE GET THIS HUGE AMOUNT OF LABELED DATA?

Page 22: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

Chris

1 MAN = 8 HOURS x 50 IMAGES, x $0.2 PER IMAGE

YEARS OF MECHANICAL WORK

1 BLN LABELED PHOTOS = $240 MLN

DATA LABELING HAS BEEN MANUAL WORK TILL NOW

Page 23: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

WE KNOW HOW TO GENERATE SYNTHETIC LABELED DATA FOR DEEP LEARNING

Page 24: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

● Labeled data with 100% accuracy

● Automated data generation with no limits

● Cheaper and faster than manual labor

SYNTHETIC DATA: A BREAKTHROUGH INDEEP LEARNING

BUT REQUIRES HUGE COMPUTATIONAL POWERTO RENDER DATA AND TRAIN NEURAL NETWORKS

Page 25: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8per DAY

$3-4per HOUR

The AI industry is ready to pay miners for their computational resources more than they can ever get from mining Ether.

KNOWLEDGE MINING IS MORE PROFITABLE. DEEP LEARNING NEEDS YOUR COMPUTATION POWER!

WE CAN BRIDGE THIS GAP

GPU

x6GPU

Page 26: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

BLOCKCHAIN + DEEP LEARNING

Page 27: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

NEUROMATION PLATFORM

TokenAI

will combine in one place all the components necessary to build deep learning solutions with synthetic data

THE UNIVERSAL MARKETPLACE OF NEURAL NETWORK DEVELOPMENT

Page 28: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

Page 29: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

● the network must be geographically distributed and keep track of massive amounts of transactions

● the payment method for completed work should be highly liquid and politically independent

● network nodes have to understand the model of “mining” a resource for a bounty: transparency is required to build trust

● transactions need to be transparently auditable to prevent fraud and mitigate dispute

NEUROMATION PLATFORM will be extending Etherium with TokenAI.

Blockchain is the only technology that can realistically accomplish this.Extending Ethereum instead of building our own blockchain is an obvious first step.

Neuromation needs to quickly deploya massive network of computation nodes (converted from crypto miners).

Page 30: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

BITCOIN OR ETHER MINING

AMAZON DEEP LEARNING

$7-8 USDper DAY

$3-4 USDHOUR

DEEP LEARNING RESEARCH GRANTS

● for R&D teams and start-up’s of DL/ML industry

● In cooperation with frontier institutions

WE ARE OPEN FOR [email protected]

● 1000 GPUs pool (+100,000GPU are coming)

Page 31: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

VAST APPLICATIONS OF SYNTHETIC DATA

NEUROMATION LABS:

RETAIL AUTOMATION LAB

PHARMA AND BIOTECH LAB

synthetic data for:

ENTERPRISE AUTOMATION LAB

synthetic data for:

● +170 000+ items for the Eastern European Retail Market only

● about 50 euro per object● contract for >7mln euro in

revenue

● medical imaging (classify tumors and melanomas)

● health applications (smart cameras)

● training flying drones, self-driving cars, and industrial robots in virtual environments

● manufacturing and supply-chain solutions

(live)

Page 32: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

OUR TEAM:

Maxim PrasolovCEO

Fedor SavchenkoCTO

Sergey NikolenkoChief Research Officer

Denis PopovVP of Engineering

Constantine GoltsevInvestor / Chairman

Andrew RabinovichAdviser

Yuri KundinICO Compliance

Adviser

Aleksey SpizhevoiResearcher

Esther Katz VP Communication

Kiryl TruskovskyiLead Researcher

Page 33: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

OCTOBER 15th, 2017 Presale of TOKENAI starts

NOVEMBER, 2017 Public sale starts

UNKNOWN DATE Secret cap is reached, and token sale ends in 7 days

Jan 1st, 2018 Token sale ends (if secret cap is not reached)

ICO ROADMAP

Page 34: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

● Tokens Minted: 1,000,000,000 ● Issued in ICO: Up to 700,000,000

● Reserve: from 300,000,000

● Price per Token: 0.001 ETH

ICO DETAILS

Jurisdiction: Estonia, EU * Neuromation is fully compliant with Estonian crowdfunding legislation

Page 35: Methods and Resources Distributed Deep Learningsergey/slides/N17_AIUkraineDistributed.pdf · 10 years ago machine learning underwent a deep learning revolution Neural networks are

KNOWLEDGE MINING - A NEW ERA OF DISTRIBUTED COMPUTING

THANK YOU!

neuromation.io