three classes of deep learning networks

Post on 12-Apr-2017

167 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

THREE CLASSES OF DEEP LEARNING

NETWORKS

Deep learning refers to a rather wide class of machine learning techniques and architectures, with the hallmark of using many layers of non-linear information processing that are hierarchical in nature.

Deep learning or hierarchical learning, has emerged as a new area of machine learning research.

The techniques developed from deep learning research have already been impacting a wide range of signal and information processing work.

Introduction to Deep Learning

Deep learning has various closely related definitions or high-level descriptions:

DEFINITION-1: A class of machine learning techniques that exploit many layers of non-linear information processing for supervised or unsupervised feature extraction and transformation, and for pattern analysis and classification.

Basic Definition:

NOTE: The deep learning that we discuss in this monograph is about learning with deep architectures for signal and information processing. It is not about deep understanding of the signal or information.

Depending on how the architectures and techniques are intended for use, e.g., synthesis/generation or recognition/ classification, one can broadly categorize most of the work in this area into three major classes:

1) Deep networks for unsupervised or generative learning.

2) Deep networks for supervised learning. 3) Hybrid Deep networks.

Categorization:

Unsupervised learning refers to no use of task specific supervision information (e.g., target class labels) in the learning process.

Deep networks in this category can be either generative or non generative in nature.

Most of them are generative in nature(with examples of RBMs, DBNs, DBMs, and generalized de-noising auto-encoders ).

Among the various subclasses of generative or unsupervised deep networks, the energy-based deep models are the most common.

Deep networks for unsupervised or generative learning:

Deep belief network (DBN): Probabilistic generative models composed of multiple layers of stochastic, hidden variables. The top two layers have undirected, symmetric connections between them. The lower layers receive top-down, directed connections from the layer above. Boltzmann machine (BM): A network of symmetrically connected, neuron-like units that make stochastic decisions about whether to be on or off.Restricted Boltzmann machine (RBM): A special type of BM consisting of a layer of visible units and a layer of hidden units with no visible-visible or hidden-hidden connections.

Key Terms to be noted:

Deep Belief Network Representation

Boltzmann machine

Restricted Boltzmann Machine

Deep neural network (DNN): A multilayer perceptron with many hidden layers, whose weights are fully connected and are often (although not always) initialized using either an unsupervised or a supervised pre-training technique. (In the literature prior to 2012, a DBN was often used incorrectly to mean a DNN.) Deep auto-encoder: A “discriminative” DNN whose output targets are the data input itself rather than class labels; hence an unsupervised learning model. When trained with a de-noising criterion, a deep auto-encoder is also a generative model and can be sampled from.

DEEP AUTOENCODER

Another prominent type of deep unsupervised models with generative capability is the deep Boltzmann machine or DBM.A DBM contains many layers of hidden variables, and has no connections between the variables within the same layer.

When the number of hidden layers of DBM is reduced to one, we have restricted Boltzmann machine(RBM).

Another representative deep generative network that can be used for unsupervised (as well as supervised) learning is the sum–product network or SPN.

An SPN is a directed acyclic graph with the observed variables as leaves, and with sum and product operations as internal nodes in the deep network.

Recurrent neural networks (RNNs) can be considered as another class of deep networks for unsupervised(as well as supervised) learning, where the depth can be as large as the length of the input data sequence.

RNN is used to predict the data sequence in the future using the previous data samples.

Deep generative graphical models are a powerful tool in many applications due to their capability of embedding domain knowledge.

However, they are often used with inappropriate approximations in inference, learning, prediction, and topology design, all arising from inherent intractability in these tasks for most real-world applications.

Conclusion to unsupervised learning:

Finally, dynamic or temporally recursive generative models based on neural network architectures can be found in for human motion modeling, and in for natural language and natural scene parsing

Many of the discriminative techniques for supervised learning in signal and information processing are shallow architectures such as HMM’s and conditional random fields (CRFs).

A CRF is intrinsically a shallow discriminative architecture, characterized by the linear relationship between the input features and the transition features.

Deep-structured CRFs have been developed by stacking the output in each lower layer of the CRF, together with the original input data, onto its higher layer.

Deep networks for supervised learning

Other major existing discriminative models in speech recognition based mainly on the traditional neural network or MLP architecture using back-propagation learning with random initialization.

It argues for the importance of both the increased width of each layer of the neural networks and the increased depth.

A new deep learning architecture, sometimes called deep stacking network (DSN), together with its tensor variant and its kernel version, are developed that all focus on discrimination with scalable, parallelizable, block-wise learning relying on little or no generative component.

DEEP STACKING NETWORK

Another type of discriminative deep architecture is the convolutional neural network (CNN), in which each module consists of a convolutional layer and a pooling layer. These modules are often stacked up with one on top of another, or with a DNN on top of it, to form a deep model

The weight sharing in the convolutional layer, together with appropriately chosen pooling schemes, endows the CNN with some “invariance” properties .

CONVOLUTIONAL NEURAL NETWORK

Nevertheless, CNNs have been found highly effective and been commonly used in computer vision and image recognition. More recently, with appropriate changes from the CNN designed for image analysis to that taking into account speech-specific properties, the CNN is also found effective for speech recognition

The Time-delay neural network (TDNN) developed for early speech recognition is a special case and predecessor of the CNN when weight sharing is limited to one of the two dimensions, i.e., time dimension, and there is no pooling layer

The model of hierarchical temporal memory (HTM) is another variant and extension of the CNN. The extension includes the following aspects:

(1) Time or temporal dimension is introduced to serve as the “supervision” information for discrimination (even for static images);(2) Both bottom-up and top-down information flows are used, instead of just bottom-up in the CNN; and(3) A Bayesian probabilistic formalism is used for fusing information and for decision making

A typical HTM network is a tree-shaped hierarchy of levels that are composed of smaller elements called nodes or columns.

A single level in the hierarchy is also called a region.

Higher hierarchy levels often have fewer nodes and therefore less spatial resolvability.

Higher hierarchy levels can reuse patterns learned at the lower levels by combining them to memorize more complex patterns.

An example of HTM hierarchy used for image recognition

The learning architecture developed for bottom-up, detection-based speech recognition proposed and developed further since 2004, notably using the DBN–DNN technique, can also be categorized in the discriminative or supervised learning deep architecture category.

The term “hybrid” generally indicates result of something combined.

For this category, it refers to the deep architecture that either comprises or makes use of both generative and discriminative model components.

In the existing hybrid architectures, the generative component is mostly exploited to help with discrimination, which is the goal of the hybrid architecture.

Hybrid Deep Networks

The optimization viewpoint where generative models trained in an unsupervised fashion can provide excellent initialization points in highly nonlinear parameter estimation problems.

The regularization perspective where the unsupervised-learning models can effectively provide a prior on the set of functions representable by the model.

Why generative model helps?

DBN, a generative, deep network for unsupervised learning discussed earlier, can be converted to and used as the initial model of a DNN for supervised learning with the same network structure, which is further discriminatively trained or fine-tuned using the target labels provided.

We can consider this as a DBN-DNN model, where model trained using unsupervised data helps to make the discriminative model effective for supervised learning.

Another example of the hybrid deep network is developed in, where the DNN weights are also initialized from a generative DBN but are further fine-tuned with a sequence-level discriminative criterion, which is the conditional probability of the label sequence given the input feature sequence, instead of the frame-level criterion of cross entropy commonly used.

It can be shown that such a DNN–CRF is equivalent to a hybrid deep architecture of DNN and HMM i.e., DNN-HMM.

Another example of hybrid deep networks is the use of generative models of DBNs to pre-train deep convolutional neural networks.

Like the fully connected DNN discussed earlier, pre-training also helps to improve the performance of deep CNNs over random initialization.

Pre-training DNNs or CNNs using a set of regularized deep auto-encoders , including de-noising auto-encoders, contractive auto encoders, and sparse auto-encoders, is also a similar example of hybrid deep networks.

THANK YOU

top related