ersatz meetup - deeplearning4j demo

{Deep Learning

Machine Perception and Its Applications

Adam Gibsondeeplearning4j.org // blix.io // zipfian academy

Deep Learning is a subset of Machine Learning

Machine Learning is a subset of Artificial Intelligence

AI is nothing more than a collectionof algorithms that repeatedly optimize themselves.

Deep learning is pattern recognition, a way for machines to classify what they perceive.

DL is a subset of AI

Deep learning algorithms are called neural nets. They are mathematical models.

They mirror the neurons of the human brain.

In the brain, sets of neurons learn to recognize certain patterns or phenomena, like faces, birdcalls or grammatical sequences.

These models have names like: Restricted Boltzmann Machine Deep-Belief Net Convolutional Net Stacked Denoising Autoencoder Recursive Neural Tensor Network

Deep learning’s algorithms

Deep learning understands numbers, so anything that can be converted to numbers is fair game:

Digital media. Anything you can see or here. DL can analyze sights, sounds and text.

Sensor output. DL can work with data about temperature, pressure, motion and chemical composition.

Time-series data. DL handles prices and their movement over time; e.g. the stock market, real estate, weather and economic indicators.

What DL can handle

Recommendation engines: DL can identify patterns of human behavior and predict what you will want to buy.

Anomaly detection: DL can identify signals that indicate bad outcomes. It can point out fraud in e-commerce; tumors in X-rays; and loan applicants likely to default.

Signal processing: Deep learning can tell you what to expect, whether its customer lifetime value, how much inventory to stock, or whether the market on the verge of a flash crash. It has predictive capacity.

What can you do with it?

Faces can be represented by a collection of images.

Those images have persistent patterns of pixels.

Those pixel patterns are known as features; i.e. highly granular facial features.

Deep-learning nets learn to identify features in data, and use them to classify faces as faces and to label them by name; e.g. John or Sarah.

Nets train themselves by reconstructing faces from features again and again, and measuring their work against a benchmark.

Facial recognition

Facial reconstructions…

Deep learning networks learn from the data you feed them.

Initial data is known as the training set, and you know what it’s made of.

The net learns the faces of the training set by trying to reconstruct them, again and again.

Reconstruction is a process of finding which facial features are indicative of larger forms.

When a net can rebuild the training set, it is ready to work with unsupervised data.

How did it do that?

Nets measure the difference between what they produce and a benchmark you set.

They try to minimize that difference. They do that by altering their own

parameters – the way they treat the data – and testing how that affects their own results.

This test is known as a “loss function.”

No really, how did it do that?

Learning looks like this.

Facebook uses facial recognition to make itself stickier, and to know more about us.

Government agencies use facial recognition to secure national borders.

Video game makers use facial recognition to construct more realistic worlds.

Stores use it to identify customers and track behavior.

What are faces for?

Sentiment analysis is a form of Natural-Language Processing.

With it, software classifies the affective content of sentences, their emotional tone, bias and intensity.

Are they positive or negative about the subject in question?

This can be very useful in ranking movies, books, media and just about anything humans consume.

Including politicians.

Sentiment Analysis & Text

By reading sentiment, you read many things.

Corporations can measure customer satisfaction.

Governments can monitor popular unrest.

Event organizers can track audience engagement.

Employers can measure job applicant fit. Celebrities can gauge fame and track

scandal.

Who cares what they say?

Recurrent neural net Restricted Boltzmann machine (RBM) Deep-belief network: A stack of RBMs Deep Autoencoder: 2 DBNs Denoising Autoencoder (yay, noise!) Convolutional net (ConvNet) Recursive neural tensor network (RNTN)

A Neural Nets Taxonomy

Two layers of neuron-like nodes. The first layer is the visible, or input,

layer The second is the hidden layer, which

identifies features in the input This simple network is symmetrically

connected. “Restricted” means there are no visible-

visible or hidden-hidden connections; i.e. all connections happen *between* layers.

Restricted Boltzmann Machine (RBMs)

A deep-belief net is a stack of RBMs. Each RBM’s hidden layer becomes the

next RBM’s visible/input layer. In this manner, a DBN learns more and

more complex features A machine vision example: 1) Pixels are

input; 2) H1 learns an edge or line; 3) H2 learns a corner or set of lines; 4) H3 learns two groups of lines forming an object, maybe a face.

The final layer of a DBN classifies feature groups. It groups them in buckets: e.g. sunset, elephant, flower.

Deep-belief net (DBN)

A deep autoencoder consists of two DBNs. The first DBN *encodes* the data into a

vector of 10-30 numbers. This is pre-training.

The second DBN decodes the data into its original state.

Backprop happens solely on the second DBN

This is the fine-tuning stage and it’s carried out with reconstruction entropy.

Deep autoencoders will reduce any document or image to a highly compact vector.

Those vectors are useful in search, QA and information retrieval.

Deep Autoencoder

Autoencoders are useful for dimensionality reduction.

The risk they run is learning the identity function of the input.

Dropout is one way to address that risk. Noise is another. Noise is the stochastic, or random,

corruption of the input. The machine then learns features despite

the noise. It “denoises” the input. A stacked denoising encoder is exactly

what you’d think. Good for unsupervised pre-training, which

initializes the weights.

Denoising Autoencoder

ConvNets are a type of RBM. The difference is they’re asymmetric. In an RBM, each node in the visible layer

connects to each node in the hidden layer.

In a ConvNet, each node connects to the node straight ahead of it, and to the two others immediately to the right and left of it.

This means that ConvNets learn data like images in patches.

Each piece learned is then woven together in the whole.

Convolutional Net

Recursive nets are top-down, hierarchical nets rather than feed-forward like DBNs.

RNTNs handle sequence-based classification, windows of several events, entire scenes rather than images.

The features themselves are vectors. A tensor is a multi-dimensional matrix, or

multiple matrices of the same size.

Recursive Neural Tensor Net

RNTNs & Scene Composition

RNTNs & Sentence Parsing

ersatz meetup - deeplearning4j demo

Engineering